This directory hosts various word corpus annotated files in Serbian language, transliterated into Serbian Cyrillic script. Main information about each source is given below.
List of source files.
URL: https://github.com/reldi-data/SETimes.SRPlus/blob/master/set.sr.plus.conllu
Transliterated file:
Note: The resource is An extended and updated version of the original SETimes.SR annotated corpus The original (base) SETimes.SR annotated corpus is here.
License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) (see the bottom of URL)
URL: https://github.com/UniversalDependencies/UD_Serbian-SET
Transliterated files:
- sr_set_cyr-ud-dev.conllu (development)
- sr_set_cyr-ud-test.conllu (test)
- sr_set_cyr-ud-train.conllu (training)
License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
URL: https://www.clarin.si/repository/xmlui/handle/11356/1372
Transliterated files:
License: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Transliteration was performed by simple Python script connlutrans.py
in this directory. Only sentences, word forms and lemmas were transliterated.
connlutrans.py -i <input_file> -o <output_file>
connlutrans.py -i <input_directory>/* -o <output_directory>
Output files will be placed in output directory, with infix -cyr
right before the extension.