Food for translation engines: parallel corpora
For most translation engines, such as machine translation, to convert texts from one language to another, they need a base of data called a parallel corpus. A parallel corpus contains a large set of simultaneous translations in various languages. machine translation engines use such data to decide the equivalence between languages.
For this process to be accurate, huge amounts of parallel translations are needed, preferably, from multiple genres—such as novels, films, official proceedings, news reports, and even social media.
Benefits of machine translation
The benefits of machine translation are of course, its speed and convenience. Yet, inaccurate translations are not uncommon, especially in languages that have less support in terms of having a good corpus. In such cases, one often runs into translations that either are awkward or, sometimes, do not make any sense.
Limitations of machine translation
Machine translation’s benefits are not universal to the world’s many languages. Whether a language becomes available on machine translation or other engines depends on a few factors like the number of speakers, how much the language is used in official proceedings (which increases the availability of its translations), and the affluence of its people group.
The number of speakers of a language no doubt plays a role in shaping their online presence on machine translation and other sites. Yet, although the top translated languages have hundreds of millions of speakers, the online presence of those ranking in the middle do not necessarily correlate to the number of speakers.
This is because other factors like the scope of influence and affluence of the country that uses the language matter too. For example, official EU languages like Greek and Swedish, which only have a few million (less than 20) speakers, have a much better presence on machine translation and translated Wikipedia pages than many languages with the same or larger speaker population, like Bhojpuri (51 million speakers). Bhojpuri was among the 24 languages made available on machine translation only recently, in 2022. Furthermore, the translation model it uses is not backed by the usual parallel corpora; instead, it uses Google’s new “Zero-Shot” monolingual model that Google itself describes as “impressive but imperfect”.
Conclusion – machine translate for rough translations
In conclusion, machine translation engines can be useful for personal use and rough translations. It is often used when traveling, travel planning, or when browsing or shopping online on foreign websites. In these cases, basic translations are often all you need, and not much is at stake.
However, machine translations can be inaccurate or even unavailable, especially if the target language has a poorer online presence. Hence, if you need an accurate or certified translation, it is best to approach a trusted translation company where real human language experts can assure you an accurate translation.