Skip to content

rafaelbeirigo/100daysofpractice-dataset

Repository files navigation

whose most prominent words are 100daysofpractice, music, practice, and violin.

Data from Instagram posts with the hashtag #100daysofpractice.

The file posts.zip (40 MB) contains data from 450,000 Instagram posts with the hashtag #100daysofpractice.

posts.zip contains two files:

  • posts.csv, which contains the posts data, and
  • metadata.txt, which contains the details about its generation.

posts.csv is in the CSV format (everything quoted with ", separated by ,). The fields therein and a short explanation are:

  • post-id: post’s unique ID
  • shortcode: a short string that can be used to access the post in a web browser (see below for instructions)
  • taken_at_timestamp: the date when it was posted
  • owner-id: a unique ID representing the user who posted it; this data was anonymized for privacy reasons, therefore this is not the real user ID
  • is_video: 1 if it is a video, 0 otherwise
  • edge_liked_by-count: number of likes
  • edge_media_to_comment-count: number of comments
  • video_view_count: number of views
  • comments_disabled: 1 if comments were disabled, 0 otherwise
  • __typename: GraphImage if it is an image post, GraphVideo for video one, or GraphSidecar for a post with more than one media
  • hashtags: hashtags from the comments

To access the post in a web browser using a shortcode, just paste it after https://www.instagram.com/p/. For instance the first post with the hashtag #100daysofpractice has the shortcode BTrwiUuh8vV. Hence you may access it with the link https://www.instagram.com/p/BTrwiUuh8vV. It was posted by the creator of the hashtag, @violincase, the violin virtuosa Hilary Hahn.

You may replicate the process by which the data was obtained through the following steps:

git clone git@github.com:bits4waves/100daysofpractice-dataset.git
cd 100daysofpractice-dataset/
python3 -m venv venv
source venv/bin/activate
python -m pip install requests
python -m pip install -r requirements.txt
cd data/posts
make

CC0
To the extent possible under law, Bits4Waves has waived all copyright and related or neighboring rights to 100daysofpractice-dataset.

About

Data from Instagram posts with the hashtag #100daysofpractice.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages