Monday, June 24, 2013

Translation Hierarchy - The idea

The most accurate way to get translations is to hire someone who is fluent in both the source and target languages to do the translating. Unfortunately, this is very expensive and doesn’t scale well; After all, there aren’t that many people fluent in many languages. The other extreme is machine translation, like what Google does with Google Translate. This option is extremely cheap (free actually), but it’s not very high quality.  What is really needed is something in between.

The best way solve this problem is use a hierarchy. All pages on a domain would get translated by machine translation. Each machine translated page would come with an estimate of how accurate the page is. One way to do that is to train several language models on subsets of the training data. Each model does a translation and the confidence of the translation is the amount by which they all agree.

Pages that had low confidence translations would then be sent to native speakers of the target language.  It’s usually fairly easy to see if a page is poorly translated without knowing the source text.  One benefit of using target language speakers is that they don’t need special training to perform their task and can be fairly cheap. If the native speaker finds any problems, they would then escalate the translation to humans (intermediate translators).

The escalated translation is handed to two translators with some (but non-expert level) experience. If both translators independently come up with the same translation, that text is used. If they disagree then it’s escalated again to expert level translation. Only a single expert is needed to translate a page. By using experts only in the rare cases when they are needed, costs can be kept low while quality is kept high.

No comments:

Post a Comment

Be kind.