Data from Instagram posts with the hashtag #100daysofpractice
.
The file posts.zip (40 MB) contains data from 450,000 Instagram posts with the hashtag #100daysofpractice
.
posts.zip
contains two files:
posts.csv
, which contains the posts data, andmetadata.txt
, which contains the details about its generation.
posts.csv
is in the CSV format (everything quoted with "
, separated by ,
).
The fields therein and a short explanation are:
post-id
: post’s unique IDshortcode
: a short string that can be used to access the post in a web browser (see below for instructions)taken_at_timestamp
: the date when it was postedowner-id
: a unique ID representing the user who posted it; this data was anonymized for privacy reasons, therefore this is not the real user IDis_video
:1
if it is a video,0
otherwiseedge_liked_by-count
: number of likesedge_media_to_comment-count
: number of commentsvideo_view_count
: number of viewscomments_disabled
:1
if comments were disabled,0
otherwise__typename
:GraphImage
if it is an image post,GraphVideo
for video one, orGraphSidecar
for a post with more than one mediahashtags
: hashtags from the comments
To access the post in a web browser using a shortcode, just paste it after https://www.instagram.com/p/
.
For instance the first post with the hashtag #100daysofpractice
has the shortcode BTrwiUuh8vV
.
Hence you may access it with the link https://www.instagram.com/p/BTrwiUuh8vV.
It was posted by the creator of the hashtag, @violincase
, the violin virtuosa Hilary Hahn.
You may replicate the process by which the data was obtained through the following steps:
git clone git@github.com:bits4waves/100daysofpractice-dataset.git cd 100daysofpractice-dataset/ python3 -m venv venv source venv/bin/activate python -m pip install requests python -m pip install -r requirements.txt cd data/posts make
To the extent possible under law,
Bits4Waves
has waived all copyright and related or neighboring rights to
100daysofpractice-dataset.