Why settle for extremely expensive translations or inaccurate machine translations? With Translation Hierarchy, you get the best of both worlds: affordable translations at near expert accuracy.
Showing posts with label Translation Hierarchy. Show all posts
Showing posts with label Translation Hierarchy. Show all posts
Saturday, June 29, 2013
Thursday, June 27, 2013
Translation Hierarchy - The fun part
I like what can be done with Mechanical Turk-like systems. There is also a fun optimization problem here: how do you best choose the amount of work done at each level. Also, expert translations can be given as feedback to intermediate translators, which could improve their skills.
Wednesday, June 26, 2013
Translation Hierarchy - The growth potential
I think that there are hundreds of thousands of domains that would want to have proper translations. I think the cost of a translation would be around $500 on average and that the revenue could turn out to be $10 million a year.
Tuesday, June 25, 2013
Translation Hierarchy - The monetization
Let’s take an example of the domain of a large city, say San Francisco. Suppose there are eight thousand pages within the domain. If the machine translation confidently translates 95% of the pages, then 400 pages go to native speakers at a cost of 25 cents per page. If half of those need to be re-translated, then 200 pages get translated by humans at a cost of one dollar per page. Of those pages, say 40 translations have disagreement and go to an expert translator. If the cost of an expert translation is $5 per page, then the total cost is $500 for a high quality translation of the entire domain. The domain could also specify which pages on their domain need to be of higher quality than others. Some very important documents may be critical to have translated perfectly where other general information pages may be fine with a few small errors.
Monday, June 24, 2013
Translation Hierarchy - The idea
The most accurate way to get translations is to hire someone who is fluent in both the source and target languages to do the translating. Unfortunately, this is very expensive and doesn’t scale well; After all, there aren’t that many people fluent in many languages. The other extreme is machine translation, like what Google does with Google Translate. This option is extremely cheap (free actually), but it’s not very high quality. What is really needed is something in between.
The best way solve this problem is use a hierarchy. All pages on a domain would get translated by machine translation. Each machine translated page would come with an estimate of how accurate the page is. One way to do that is to train several language models on subsets of the training data. Each model does a translation and the confidence of the translation is the amount by which they all agree.
Pages that had low confidence translations would then be sent to native speakers of the target language. It’s usually fairly easy to see if a page is poorly translated without knowing the source text. One benefit of using target language speakers is that they don’t need special training to perform their task and can be fairly cheap. If the native speaker finds any problems, they would then escalate the translation to humans (intermediate translators).
The escalated translation is handed to two translators with some (but non-expert level) experience. If both translators independently come up with the same translation, that text is used. If they disagree then it’s escalated again to expert level translation. Only a single expert is needed to translate a page. By using experts only in the rare cases when they are needed, costs can be kept low while quality is kept high.
Sunday, June 23, 2013
Translation Hierarchy - The motivation
Getting good translations is hard. Google tries to do it, but it rarely makes great translations. As more non-English speakers come onto the web, proper translations of web text become even more important.
Subscribe to:
Posts (Atom)