Main Page

From Kurdî Wikibase

Welcome to Kurdî Wikibase, a Wikibase Cloud instance.

Project Goals

Goal of this project is to transfer Kurdish lexical data from Ontolex TTL sources to this Wikibase, and subsequent transfer to Wikidata, as an enrichment of existing Wikidata lexemes, or creation of new Wikidata lexemes. Open curation tasks (see below) can be done on this Wikibase. If you want to contribute, please register.

Publications

Lindemann, David, Sina Ahmadi, Anas Fahad Khan, Francesco Mambrini, Federicia Iurescia, Marco Carlo Passarotti (2023): ‘When OntoLex Meets Wikibase: Remodeling Use Cases’, in CEUR Workshop proceedings. Wikidata’23: Wikidata workshop at ISWC 2023. Available at: https://ceur-ws.org/Vol-3640/paper14.pdf.

Datasets integrated into Kurdî Wikibase

Lexical data from four different sources have been uploaded to this Wikibase. New lexemes can be created using the link in the side pane.

Northern Kurdish (also known as Kurmanji, kmr)

Over 4,000 headwords are provided in Northern Kurmanji in the Latin-based script. Headwords are defined with part-of-speech tags, grammatical gender, and glosses based on distinct senses in Northern Kurdish and English. Usage examples are also provided in some cases. This dictionary is described at Q18.

Central Kurdish (also referred to as Sorani, ckb)

Over 5,000 headwords are provided in Central Kurdish (Sorani) written in the Latin-based script. This script, unlike Northern Kurmanji, is not much used by Central Kurdish speakers; the Perso-Arabic-based script is mostly used for this variant. Entries are described with part-of-speech tags, glosses in English and, sometimes, usage examples. Grammatical gender is not present in Central Kurdish. This dictionary is described at Q34.

Southern Kurdish (sdh)

The Southern Kurdish resource contains over 11,000 headwords, the highest number among the selected resources. The headwords are written in both Perso-Arabic and Latin-based scripts and are described with glosses in Persian and other varieties of Kurdish. Such varieties include words from Kurdish varieties along with Laki and Luri languages. This dictionary is described at Q32.

Gorani (also known as Hawrami, hac)

In comparison to the other resources, The Gorani resource is the smallest one containing around 1,000 headwords written in the Latin-based script and described with part-of-speech tags, grammatical gender, glosses in English and a few usage examples. Similar to Central Kurdish, this language is mostly written in the Perso-Arabic-based script of Kurdish. This dictionary is described at Q38.

Project Log

See Project Log page.

SPARQL Queries

See some queries on SPARQL Queries page, to explore what is on Kurdî Wikibase.

Community tasks

See the Curation Tasks page.

Issue with language codes

Not all language codes used in our Ontolex sources are currently available on Wikibase/Wikidata. See https://phabricator.wikimedia.org/T325688.