Project Log

From Kurdî Wikibase

Data re-modeling

One of the major ongoing issues in creating Kurdî Wikibase is the lack of the language codes sdh and hac respectively for Southern Kurdish and Gorani on Wikibase. To tackle this, we use ku as the language code for all the varieties, but point to an item describing the variety using dct:language. This can be further refined once language codes are added to Wikibase.

In addition to language codes, we also face specific challenges related to the lexicographical data of the sources, particularly in orthographic normalization using Latin vs. Perso-Arabic scripts and spelling variations. This is of importance to LOD technologies given that duplicated entries, i.e. several entries that describe the same lexeme, should be avoided. Therefore, we verify and unify scripts among the resources to conform with the orthographies that are widely used, e.g. ë for a glottal stop, is replaced with ´. Moreover, some of the headwords in the selected resources contained punctuation marks, which are removed.

Usage examples on Kurdî Wikibase are attached to a sense, and described with their English translation, while on Wikidata, usage examples are attached to a lexeme, qualified with their subject sense. On the one hand, that modeling corresponds to the OntoLex source where senses point to usage examples via ontolex:usage and, on the other, this makes the upload process more convenient, since a usage example attached to a lexeme could not be qualified with the URI of its subject sense until that sense would get an identifier, which doesn't happen until the item data is written on the Wikibase. When transferring to Wikidata, we attach usage examples to lexemes (and not to senses) using wd:P5831, indicating the object sense in as qualifier, which is the strategy favoured by the Wikidata community.

Upload process

Kurmanji Dictionary

The upload was done on March 20, 2023, using this script.

See a sample Kurmanji lexeme:

Southern Kurdish Dictionary

The upload was done on March 21, 2023, using this script.

See a sample Southern Kurdish lexeme:

Sorani and Hawrami Dictionaries

The upload was done on March 21, 2023, using this script.

See a sample Sorani lexeme:

See a sample Hawrami lexeme: