Skip to content

Project Goals

Elmurod Talipov edited this page Jul 20, 2018 · 2 revisions

Short term goals

  • Implement database abstraction library for managing word database.
  • Create word database with the following Part-Of-Speech tags:
# Uzbek English Tag
1 От Noun NOUN
2 Феьл Verb VERB
3 Сифат Adjective ADJ
4 Равиш Adverb ADV
5 Олмош Pronoun PRON
6 Сон Numeral NUM
8 Боғловчи Conjunction CONJ
9 Юклама Particle PART
10 Ундов сўз Interjection INTJ
11 Тақлид сўз Onomatopoeic words X
12 Модал сўз Modal words AUX
13 Кўмакчи Postposition ADP

This may not be fully compatible with the Universal Dependencies POS tags.

  • Perform statistical analysis on the usage of words, generate a table using PDF books and news feed.
  • Implement stemming rules and a table within the database (for the fast reference), table for entity names.
  • Implement basic natural language analysis tool that would provide functionalities such tokenizing, parsing, stemming, POS tagging, named entity recognition, etc.

Expected deliverable: The tool tahlih gets an input text in Uzbek language and generates parsed (tokenized, stemmed, POS tagged) output.

Mid term goals

  • Manually (Semi-automatically) generate Uzbek Treebank in CoNLL-U format that can be contributed to Universal Dependencies (UD) Framework.
  • Feed Uzbek Treebank to SyntaxNet, and perform analysis, training, improving
  • Implement initial applications with Uzbek NLU project: Telegram Q&A Bot, Twitter Bot, news summarizer.

Long term goals

  • TBD

Clone this wiki locally