On November 9, 2021, Airbnb announced that it had deployed Translation engine, which allows users to automatically read translations of reviews and descriptions in over 60 languages without having to click a translate button. Bucking the current paradigm, the interface provides users with a view original language button instead.
Marco Trombetti, CEO of Translated – who has worked with the home rental platform for three years and provided Airbnb with human and machine translation – told Slator: “What’s unique is the fact that for the first time, the two are very symbiotic and integrated. Every fix from the localization team instantly improves machine translation.
Airbnb runs on ModernMT, the open source project led by Translated, co-founded by Fondazione Bruno Kessler, the University of Edinburgh and the European Commission. ModernMT is essentially an adaptive neural machine translation system with a range of applications, including IP and life science translations.
“Translated initially provided the basic pre-trained models [for Airbnb’s Translation Engine]said Trombetti, who is continually improving based on the fixes made by the thousands of linguists who have worked on Airbnb content over the past few years. As previously mentioned, Airbnb has “translated” by humans over 100 million words in 2019, before the pandemic.
According to Airbnb’s press release, “Translation Engine improves the quality of over 99% of Airbnb listings,” according to a study he commissioned from a review company for machine translation in the top 10 languages of the platform.
Trombetti said Airbnb had commissioned personalized reviews of the platform’s content through “independent, untranslated parties.” However, he said the over 99% quality improvement is in line with Translated’s internal ratings. “Translated performs monthly reviews of our ModernMT models using our trained Airbnb linguists,” said Trombetti.
He added that while “many other companies have experimented with pre-translation, with a small subset of their content, usually reviews, to my knowledge this is the first time this has been done for everyone. content and in particular at this scale “.
He pointed out that visitors to the site will not only be able to read the content in their own language, but also find what was previously inaccessible to them. “It’s not just about removing a button; it’s about allowing everyone to explore in a new way, ”said Trombetti.
UGC: Complex for AI
Asked about the challenge of eliminating data points from user-generated content (UGC) versus training engines on content created by professional writers or linguists, Trombetti said, “UGC is complex. for AI because everyone has a different style. “
It’s not like training a custom model on very narrow terminology ”- Marco Trombetti, CEO, Translated
He explained that because UGC content is often written by non-native speakers and, most likely, non-professional content writers, “AI needs a lot of flexibility to learn how to translate well. It’s not like training a custom model on very narrow terminology.
Trombetti added: “The indirect challenge with UGC is the scale. The UGC scale can often be a million times larger than the content produced by localization teams; and volume peaks are much more unpredictable.
In addition, he noted that 10 times lower latency is also required to be able to integrate machine translation into the production infrastructure. Therefore, “in human translation, the quality of engineering is really not an issue. For UGC machine translation, however, “this is the essential asset.”
In addition to that, there is the commercial element. The CEO of Translated said: “When you run UGC, you are a horizontal department. You have to interact with many divisions and stakeholders. Thus, the level and complexity of discussions increase. [Airbnb Head of Localization] Salvatore Giammarresi’s leadership, empathy and ability to interact with senior management made it all possible.