Code for the analysis of SHS-YT, a dataset of videos crawled from YouTube based on seed songs in SHS100K-Test.
We recommend using our conda environment. Install and activate by:
conda env create -f env.yml;
conda activate shs-yty
- data directory
data
contains our annotated datasetSHS-YT
and the benchmark sets (SHS-YT
combined with the songs corresponding songs for cliques fromSHS100K-Test
). Other datasets are Da-Tacos and SHS100K. - the subdir
data/annotations
contains expert and worker comments - the subdir
data/preds
contains the square similarity matrix per model - the subdir
features
contains a sample of the audio features figs
contains the GUI of the MTurk experimentdocumentation
contains descriptions for our classes
To download and extract relevant features for the CSI task, you can use this repository: https://github.com/progsi/YTFeatureExtractor
For example, to download the large benchmark dataset SHS-SEED+YT
saved to BENCHMARK_CSV_PATH
, run
python extract_list.py --listfile BENCHMARK_CSV_PATH -i YOUR_DATA_DIR
This directory contains different notebooks for analysis of data.
benchmark.ipynb
benchmarking the datasets (Table 5
in the paper)statistics.ipynb
basic stats, KDEs etccuration_analysis.ipynb
more profound analysis of amiguity annotationspairs_analysis.ipynb
contains analyses fromTable 6
,Table 7
andFigure 5
from the paper and some additional analyses.
Uncertainty | Applies for | Description |
---|---|---|
Song: Difficult Cover | Version | Strong changes in melody, harmony, timbre and rhythm which are expected in cover song identification. During annotation stronger changes of these characteristics make the classification for a human annotator difficult, especially if the annotator does not know the song. |
Song: Drum-Only | Version & Non-Version | Only the drum track. Typically either isolated by automatic sound source separation, covered by a drummer or programmed in a drum engine. |
Song: Instrumental | Version & Non-Version | A version without the vocal track. Typically an karaoke version or a backing track. Might be generated by automatic sound source separation. |
Song: Mashup/Remix | Version & Non-Version | A song which contains samples from the query song. The samples might be whole sections (typically the chorus) or just very short melodic lines. |
Song: Medley | Version & Non-Version | A song which contains (typically sections of) multiple songs. One of the songs is (a section of) the query song. |
Song: Same Artist | Non-Version | A different song but it is from the same artist. |
Song: Same Genre | Non-Version | A different song but it is from the same genre. |
Song: Similar | Non-Version | A different song but it is musically similar in terms of melody, harmony, timbre, rhythm etc. |
Song: Single Instrument | Version & Non-Version | A song which includes only a stem of a single harmonic instrument. The instruments which are apparently occuring most are the piano and the guitar. Typically, either someone covers the query song by playing itself or the stem performance is programmed (eg. piano roll representation). |
Song: Slowed/Spedup | Version & Non-Version | The query song but sped-up or slowed down. |
Song: Vocal-Only | Version & Non-Version | Only the vocal stem of the query song. Either isolated automatically by sound source seperation or an acapella cover. |
Video: In-Background | Version & Non-Version | The query song appears in the background with foreground noise such as crowd noise or speech or mixed noise (eg. in a movie or show scene). |
Video: Low Fidelity | Version & Non-Version | The query song is presented with low fidelity. |
Video: Multiple Songs | Version & Non-Version | Multiple songs beside the query song are contained in the video. Typical examples are concert performances or tributes. |
Video: Similar Metadata | Non-Version | A rather obvious non-cover-song of the query with rather similar metadata (especially song title and artist name), which might confuse the annotator. |
Video: With Non-Music | Version & Non-Version | The query song is contained in the video but it is interrupted by (and/or) preceded by (and/or) preceding non-music noise. |
Placeholder: No Music | No Music | Placeholder class for videos which do not contain any music. |
Placeholder: Non-Ambiguous | Version & Non-Version | Placeholder class for songs which were not perceived ambiguous. |
Placeholder: Unavailable | All | Placeholder class for unavailable videos on YouTube at the time of curation. |