Project Log: Difference between revisions

(Created page with "==Data re-modeling== (tbd) ==Upload process== ===Kurmanji Dictionary=== The upload was done on March 20, 2023, using [https://github.com/dlindem/wikibase/blob/6c4697e77fa56e03d3a3ad7eb1f15871fae779b8/kurdish/ttl-to-wikibase.py this script]. See a sample Kurmanji lexeme: https://kurdi.wikibase.cloud/wiki/Lexeme:L433 ===Southern Kurdish Dictionary=== The upload was done on March 21, 2023, using [https://github.com/dlindem/wikibase/blob/b6901b572b9ff1897d16a70f3dca84...")
 
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
==Data re-modeling==
==Data re-modeling==


(tbd)
One of the major ongoing issues in creating Kurdî Wikibase is the lack of the language codes '''sdh''' and '''hac''' respectively for Southern Kurdish and Gorani on Wikibase. To tackle this, we use '''ku''' as the language code for all the varieties, but point to an item describing the variety using '''dct:language'''. This can be further refined once language codes are added to Wikibase.
 
In addition to language codes, we also face specific challenges related to the lexicographical data of the sources, particularly in orthographic normalization using Latin vs. Perso-Arabic scripts and spelling variations. This is of importance to LOD technologies given that duplicated entries, i.e. several entries that describe the same lexeme, should be avoided. Therefore, we verify and unify scripts among the resources to conform with the orthographies that are widely used, e.g. '''ë''' for a glottal stop, is replaced with '''´'''. Moreover, some of the headwords in the selected resources contained punctuation marks, which are removed.
 
Usage examples on Kurdî Wikibase are attached to a sense, and described with their English translation, while on Wikidata, usage examples are attached to a lexeme, qualified with their subject sense. On the one hand, that modeling corresponds to the OntoLex source where senses point to usage examples via '''ontolex:usage''' and, on the other, this makes the upload process more convenient, since a usage example attached to a lexeme could not be qualified with the URI of its subject sense until that sense would get an identifier, which doesn't happen until the item data is written on the Wikibase. When transferring to Wikidata, we attach usage examples to lexemes (and not to senses) using '''wd:P5831''', indicating the object sense in as qualifier, which is the strategy favoured by the Wikidata community.


==Upload process==
==Upload process==
Line 22: Line 26:


See a sample Sorani lexeme: https://kurdi.wikibase.cloud/wiki/Lexeme:L14443
See a sample Sorani lexeme: https://kurdi.wikibase.cloud/wiki/Lexeme:L14443
See a sample Hawrami lexeme: https://kurdi.wikibase.cloud/wiki/Lexeme:L20002
See a sample Hawrami lexeme: https://kurdi.wikibase.cloud/wiki/Lexeme:L20002
Bureaucrats, emailconfirmed, Administrators
3,549

edits