Note: This blog post was originally written in Japanese for our Japanese website. We use our machine translation platforms to translate and make automatic corrections, and then partially edit to fit the content in English. The original Japanese post can be found here.
As mentioned in Part 1 and Part 2 of this series "The Surprising Relationship Between ChatGPT and Machine Translation," the key technologies that have made today's high-performance generative AI (possible emerged during the development of machine translation. Technically speaking, while neural machine translation (NMT) is broadly divided into an encoder and a decoder, generative pre-trained transformer (GPT) is essentially the decoder portion extracted on its own.
Generative AI is, in a sense, a component of machine translation, yet it is still capable of translating on its own. This is because today's generative AI has undergone pre-training on vast amounts of text (which is why they are called large language models, or LLMs), and that training data includes text in a wide variety of languages.
As NMT is specifically trained for the purpose of translating, it is not surprising that it can translate. On the other hand, we could say it is merely a coincidence that LLMs, which are not trained specifically for translation, are able to do so. However, we cannot immediately assume that an ability is inferior just because it was acquired incidentally. So, how should we evaluate the translation capabilities of LLMs compared to NMT?
In 2023, when ChatGPT first drew significant attention, several studies were conducted to compare the translation performance of LLMs and NMT. These studies show that while LLMs excel in fluency, NMT maintains an advantage in accuracy. In other words, translations produced by generative AI sound plausible when read in isolation, but they contain more errors compared to those produced by machine translation.
NMT learns from pairs of source and target sentences, but such parallel data often contains a fair amount of noise. This can include mistranslations, typos, differences in sentence segmentation between the source and target, or intentional additions or omissions. This noise is what causes unnatural translations to be generated by NMT.
In contrast, because LLMs learn from monolingual sentences, their training data (while it may contain typos) is free from noise caused by the translation process itself. Therefore, they are less likely to generate unnatural translations than NMT, and the overall fluency of the output tends to be higher.
On the other hand, LLMs are not trained specifically for translation tasks. In other words, they are not trained to faithfully reflect the content of the source text in the target text. Consequently, mistranslations (where the meaning changes), omissions (where source content is missing), and hallucinations (where content not present in the source is added) occur more frequently than with NMT. As a result, accuracy decreases.
The "weakness" of LLMs can also be a strength
However, current LLMs have improved in performance compared to those in the past. Still, an improvement in translation fidelity is not necessarily expected; on the contrary, there is a possibility that it is actually decreasing. Studies on summarization rather than translation have also pointed out that LLMs lack accuracy, and it has been reported that this problem tends to worsen with newer models.
However, the characteristic of LLMs having low fidelity can be an advantage in some cases. When the source text contains metaphors or colloquial expressions, typos or grammatical errors, or is cut off or missing content, NMT often fails by trying to translate too literally. In contrast, we have found that LLMs have a high ability to interpret the writer's intent and translate accordingly.
Therefore, we believe LLMs are suitable for translating entertainment fields such as games, manga, and anime, as well as user-generated content and text transcribed from audio. On the other hand, in situations where accuracy is more important than fluency, traditional machine translation is more reliable than generative AI.
Furthermore, NMT has the advantage when translation speed is a priority. LLMs with translation capabilities comparable to NMT typically lag far behind NMT in terms of speed. Additionally, translation using LLMs can have issues stemming from linguistic bias in the training data. We will discuss this further in a separate post.
If NMT excels in accuracy and LLMs excel in fluency, combining the two can deliver a high-performance translation system. To achieve this, we’ve equipped our AI translation platform, XMAT, with an AI-powered post-editing feature. For organizations not using XMAT, we also offer customized translation services that combine machine translation with AI post-editing on a per-project basis.
Learn more about our AI post-editing approach here, where generative AI (LLMs) enhances and refines machine translation (NMT) output.
With Kawamura International’s AI post-editing services, you benefit from the strengths of both machine translation and generative AI, along with a final quality check by our experienced professional translators. We go beyond simply offering machine translation or generative AI tools. Our team provides end-to-end solutions designed to streamline your translation workflow and identify the most effective approach for your needs.
If you’re facing translation or localization challenges, or just want to explore how we can help, feel free to reach out to us.