Machine translation has been around for many years. However, it wasn’t until Google, Microsoft and others began developing machine translation that it grew into a serious competitive alternative to human translation. As a result, machine translation has made more progress in the last 10 years than the previous 50 years. Today, machine translation is used to produce billions of words daily and is fast closing in on human translation quality.
At the heart of the improvement in machine translation quality is artificial intelligence. The first significant factor was the move from rules-based machine translation, to statistical machine translation and most recently to neural machine translation (NMT). In short, NMT uses deep learning to decipher the meaning of text instead of producing a word-for-word translation based on rules or a statistical analysis alone. One of the main improvements is that these translation systems are not static. As algorithms and machine learning improve, machine translation systems also improve. As these machine translation systems improve so do user translations. You can almost see the improvements in real-time.
Right now, AI is in its infancy in how it will improve machine translation. Companies like Amazon, Meta, IBM and others are making significant investments in resources and budgets in machine translation and I expect that we will see rapid innovation as a result.
Where are We Today
Today, most users of machine translation systems employ several implementation strategies. Many simply pick the best system for their needs that meets their security requirements and accept the translation results as is. This is ideal for content that is extremely time sensitive and does not require a perfect translation. No human or team of human translators can match the speed of machine translation and, in most cases, machine translation requires human editing.
Some machine translation systems invite users to change and post-edit their translations and store them in a central repository called a “translation memory”. These translation memories are then used in conjunction with the machine translation to produce translations. For large volumes of content, machine translation + translation memory + human post-editing will generate the best quality and outperform a team of human translators not using machine translation.
Meanwhile, enterprises like ebay employ more of a software strategy to reduce and even eliminate human editing. Their goal is to produce publishable translations without the need to post-edit the machine translation. Using AI, these systems use large volumes of previous translations to customize or “train” the machine translation software on their content. Training machine translations is the accepted practice of improving machine translation quality on a systematic level. Training a machine translation system requires the user to gather previously translated content in very large quantities, organize the content in an appropriate file format (usually a tmx or xliff file) and upload it to a machine translation system. The machine translation system then uses the content to customize the translation experience specifically for that user, or groups of users.
While you can purchase data sets to train machine translation software, it’s expensive, and only the largest producers of content train their machine translation software. Because it requires a significant amount of content to produce customized translation it remains a significant barrier to producing customized translations.
What’s Coming Next
Currently, you typically train a machine translation engine at the beginning of the adaptation process. It can be a one-time event (or more) and requires a large amount of content. This process is cumbersome and simply not feasible for most translation buyers. The next wave of AI-led development will include the ability to use terminology glossaries to interactively improve machine translations and create custom translations. This will lead to tangible quality gains for users that know what their translation needs are and can plan ahead.
A study conducted by General Motors found that 49% of translation errors were caused by incorrect terminology. Machine translation systems will substitute translations with custom translations from a terminology glossary. So instead of editing a translation after it’s been generated, users will pre-edit the machine translation systems with the terms the user wants the system to use and eliminate the need to post-edit the machine translation.
In addition to using glossaries to produce custom translations, more and more systems will offer on-the-fly machine translation training. This means that the machine translation will perform machine learning to produce custom translations during the actual translation request instead of beforehand.
What will the Future Bring?
AI will lead the way in the next generation of machine translation tools with lots of opportunities and problems to solve. I expect that future machine translation systems will require less legacy content to train machine translation systems to produce custom translations. AI will be able to train systems in real-time and produce translations based on more than just linguistic considerations. Future algorithms will take white space availability, cell size, screen size and many other factors into consideration when generating future translations. Software developers will continue to solve challenges at scale that human translators just can’t do.
In the near future, AI will drastically change the jobs of translators and interpreters alike while creating new jobs requiring different skills. While, machine translation software users will benefit greatly from the proliferation of AI in machine translation.
##
Rick Woyde
rickw@pairaphrase.com