Meta showcases AI translation of unwritten language Hokkien

Source: Tech at Meta
Story flagged by: Jared Tabor

Meta’s Zuckerberg Reveals First Speech to Speech AI Translation System (With Hokkien)

Building an AI speech translation system for Hokkien was no easy task. These tools are usually trained on large quantities of text. But for Hokkien, there is no widely known standard writing system. Furthermore, Hokkien is what’s known as an underresourced language, which means there isn’t much paired speech data available in comparison with, say, Spanish or English. Also, with few human English-to-Hokkien translators, it was difficult to collect and annotate data to train the model.

To get around these problems, Meta researchers used text written in Mandarin, which is similar to Hokkien. The team also worked closely with Hokkien speakers to ensure that the translations were correct. “Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English — both with human annotators and automatically,” said Meta researcher Juan Pino. “They then added the paired sentences to the data used to train the AI model.”

The researchers will make their model, code, and benchmark data freely available to allow others to build on their work. While the model is still a work in progress and can currently translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between many languages is possible.

Comments about this article



Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search