INEL Corpora General Transcription and Annotation Principles
Краткое описание
INEL (“Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”) is a long-term research project (2016–2033), whose primary goal is to create digital annotated corpora of several languages of Northern Eurasia, making possible typologically aware corpus-based grammatical research. As of December 2020, full versions have been released for two corpora, Kamas and Dolgan; two intermediate versions have also been published for Selkup, while the complete Selkup corpus is scheduled to appear by the end of 2021. Evenki is another currently running subproject, for which the corpus is also to appear at the same time. In this paper, we outline the basic principles of transcription and some aspects of other annotations (such as translations and annotations of code switching), common for all the INEL corpora.