Machine Translation Infrastructure for Turkic Languages (MT-Turk)


Alkim E., ÇEBİ Y.

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, vol.16, no.3, pp.380-388, 2019 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 16 Issue: 3
  • Publication Date: 2019
  • Journal Name: INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.380-388
  • Keywords: Rule-based machine translation, Turkic languages, semi-language specific interlingua and disambiguation by suggestions
  • Dokuz Eylül University Affiliated: Yes

Abstract

In this study, a multilingual, extensible machine translation infrastructure for grammatically similar Turkic languages "MT-Turk" is presented. MT-Turk infrastructure has multi-word support and is designed using a combined rule-based translation approach thatunites the strengths of interlingual and transfer approaches. This resulted in achieving ease of extensibility by adding new Turkic languages. The new language can be used both as destination and as source language achieving two-way extensibility. In addition, the infrastructure is strengthened with the ability of learning from previous translations and using the suggestions of previous users for disambiguation. Finally, the success of MT-Turk for three Turkic languages -Turkish, Kirghiz and Kazan- is evaluated using BiLingual Evaluation Understudy (BLEU) metric and it is seen that the suggestion system improved the success by 43.66% in average. Although the lack of linguistic resources affected the success of the system negatively, this study led to the introduction of an extensible infrastructure that can learn from previous translations.