Sign up for our newsletter and get the latest big data news and analysis.

Machine Translation: The Combination of Machine Learning and Human Intelligence

Vasco PedroIn this special guest feature, Vasco Pedro, CEO and Co-Founder of Unbabel, discusses the importance of machine translation for natural languages and how it currently lacks the quality companies demand for their content. Dr. Pedro’ company is Unbabel, the Y Combinator-backed startup that combines crowdsourced human translation and machine learning to deliver fast translation services to businesses with human tone and nuance. Vasco previously worked for Google helping to develop technology for data computation and language at scale, and served as a research faculty member at the Technical University of Lisbon. Vasco holds a PhD in Language Technologies from Carnegie Mellon University in the field of computational semantics. Additionally, Vasco is a Fulbright Scholar, mentor, and advisor to a number of startups on top of being a serial entrepreneur.

About 75% of the world does not speak English, not even as a second language. Inevitably, this results in most companies not having the ability to communicate with their entire audience. Why? Because translation is hard to do and even harder to scale, and pretty much every company that wants to be global has to do it, one way or another.

What might not be obvious is that the problem of these communication barriers is getting worse. When it comes to language, the Internet is diverging. In 1998 the Internet was mostly English. Today, the Internet has expanded massively with millions of new pages being created each day. Now, there are actually multiple Internets out there and English only represents 30% of the online content. You and I may live in an English Internet, but there is a Chinese Internet, an Arabic Internet, and a Russian Internet, as well as many others.

For companies, this is a risk, as they have more competition for their consumer’s attention, competition that exists in the consumer’s native language. The need to translate increases. Socially, the divergence of the Internet produces both opportunities, more people have access to content, and challenges, as the communities in each language grow they also become isolated. This foster’s the view that, “It’s possible to stay just inside my own language, inside my own world.”

So a big problem is getting bigger, why has no one has solved it?

To put it simply, Machine Translation today lacks the quality companies demand for their content. Even Google spends million of dollars each year in professional translation services which is extremely expensive and not scalable. There are only about 500,000 professional translators in the world, even if they all worked 24 hours per day they could not possibly translate all of the new content that is created on a daily basis.

The translation industry has been a laggard in adopting technology possibly because the technologists who create and believe in Machine Translation are mostly not translators themselves. Most companies in the industry grew from small translation agencies into large translation agencies, and have prefered to stick to tried and true human powered methods. Some companies have made it easier to assign content to a translator, created Translation Management Systems, and Translator Marketplaces.

However, as we’ve learned from the Industrial Revolution, it doesn’t work to gather hundreds of workers in a large factory and have them assemble one unit of product at a time. The benefit is in breaking down the process into small pieces, organizing them sequentially in order to create an assembly line, and then producing units at scale.

Such a process is extremely difficult to apply to language. Language is potentially the most raw expression of our intelligence. Computers have the ability to understand the bricks– the words, grammar, and short phrases. It’s the ability for computers to put these components together in a sequence that makes sense that is the hard part. To be able to do this and understanding translation to this degree the computers would need to have the ability to understanding the human mind. This is the piece that will enable us to crack Artificial Intelligence.

Many of the tasks done by factory workers in the Industrial Revolution are today done by machines inside fully automated factories, supervised by humans who are experts in production.

In the future, translation will also be done fully by machines. Translators will supervise them, will teach them new things as language evolves, and also perform advanced adaptation tasks such as modifying the offer of a company from one country to another according to local tastes.

In the meantime, the only possible solution is to combine the two things we have today: artificial Intelligence and a crowd of translators working online. As of now, this is the only way to translate content fast, reliably, and at scale. In doing this, we are preparing both machines and humans for the future we are already working towards.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: