An in-depth look at machine translation for South Slavic languages

Machine translation has been a bit of a buzz word lately. When Google announced that they were launching their Neural Machine translation platform, it made headline news, which those of you know about our industry know never happens! When machine translation first came out, translators were paranoid they’d be out of their jobs, but now, MT has been widely accepted by the translation community. Instead of losing out to AI, many translators have found themselves doing a lot more post-editing work. Some languages have been proven to be more successful than others when it comes to machine translation, so we’ve done a bit of digging and found some statistics to give you an insight into how machine translation works for Slavic languages.

Eastern European languages are famous for being difficult. Their grammar makes it tough for human translators, let alone a machine. Machine translation has a terrible reputation in this part of the world, but is it really justified?

To take a look at this in more detail, we looked at the work of Maja Popic, Mihael Arcan and Filip Klubicka, who teamed up to research machine translation for South Slavic languages.

MT between closely related languages has been researched, but it has not been touched upon as much as different language combinations e.g. English to Chinese. Although all South Slavic languages are still rather under-resourced and under-investigated, in the last decade several MT systems have been built between these languages and English.

The team therefore researched machine translation between the South Slavic languages and found that scores are rather high for translation between Serbian and Croatian and lower for translations involving Slovenian. However, it should be noted that the scores really weren’t that high if you take into account just how similar the languages are. Translation into Serbian is worse than into the other two languages and that translation into Slovenian is better than into the other two languages. For the Serbian-Croatian language pair most errors were lexical, whereas there is a rather low number of inflectional and ordering errors. Considering that the main differences between the languages are lexical, this is understandable. As for translating from and into Slovenian, lexical errors are also predominant and much more frequent. Furthermore, the amount of ordering and inflectional errors is not negligible. This is consistent with the differences between Slovenian and the other languages, but not enough to work out why there were so many errors.

So, in conclusion, machine translation is a very viable option for translating between the south slavic languages, however you do need to ensure they are post edited. If you’d like help with that, then feel free to get in touch.

You can find an original article here: http://www.aclweb.org/anthology/W16-4806

Dodaj odgovor

Vaš e-naslov ne bo objavljen. * označuje zahtevana polja