Debuts Service for Acquiring Advanced Multilingual Machine-Learning Training Data

Print Friendly, PDF & Email

Gengo, a leader in expert, high-scale crowdsourced translation services, is taking aim at the growing need for high-quality multilingual data to train tomorrow’s advanced AI (artificial intelligence) based systems. launched an on-demand platform that provides developers of machine-learning systems access to a wide array of multilingual services delivered by Gengo’s fast and efficient crowdsourced network of 25,000+ vetted contributors.

Without diverse, well-labeled data, machine-learning algorithms simply can’t advance beyond basic capabilities. The algorithms that businesses increasingly depend on to identify relationships, develop understanding, make decisions, and predict outcomes can only be as good as the training data they’re given,” said Matthew Romaine, founder and CEO of Gengo. “AI and machine-learning developers are starving for data — and cannot build AI applications without it. Our speed and quality surpass any other service that offers multilingual data for machine-learning training.”

The trusted source for high-scale crowdsourced multilingual data

With 10 years of know-how in providing large data sets at scale, Gengo is a trusted partner for some of the biggest names in the technology sector. Since 2008, global companies such as Airbnb, BuzzFeed, Ctrip, Expedia, Etsy, eBay, Facebook, Nike, Salesforce, Sony, and TripAdvisor have turned to Gengo for assistance with mission-critical language services. Gengo has developed a reputation for fast turnaround of challenging translation tasks:

  • Translates one million words per week of language pairs (source language and a target language)
  • Completes customer orders within three hours (on average)
  • To date, has translated more than 950 million words for 65,000+ customers

Accelerates commercialization of tomorrow’s AI systems

The platform builds on the firm’s proven multilingual translation platform to offer data-curation services for both text and speech, including sentiment analysis, transcription, and content summarization. Equipped with this data, software developers at global technology companies can now accelerate the training of their AI systems and deliver more sophisticated products to market, faster.

Charly Walther, VP of product and growth for, and a former product manager in Uber’s Advanced Technologies Group, contends that what distinguishes is its ability to apply multilingual expert crowds to solve the difficult 1% of edge cases that other services simply overlook. “If you’re building an AI-based system that centers on language, accuracy is paramount. We can provide highly accurate data for challenging cases that involve sentiment, dialects, slang, and edge cases where context matters—and we can do this both at scale and at speed,” he said.

Addresses the urgent need for rich, multilingual data to train AI systems can quickly source, clean up, and label data for machine-learning algorithms. By harnessing a highly skilled, multilingual crowd of 25,000+ fully vetted contributors, stands in stark contrast to other services available today: it can deliver millions of data points, at high quality, across 37 languages in just a few days.

Advanced services available include sentiment analysis, content moderation, or any kind of content evaluation service such as entity extraction, search engine training, chatbot training, and more. Examples of tasks that can be submitted to the platform include:

  1. Content generation — translation, transcription, copywriting, content summarization, chatbot training data.
  2. Content categorization — classification of content into appropriate categories including keyword tagging, and categorization for images, product descriptions or websites. Extraction of particular words or phrases to determine whether content is positive, negative or neutral. These services are ideal for content moderation, sentiment analysis, product categorization, image and video tagging, and data annotation.
  3. Content assessment and analysis — review of sponsored listings against a set of specific guidelines determined by the client as well as scoring the quality of machine-translated segments or fixing errors to produce natural, error-free translations. Applications include ad reviews, machine translation quality evaluation, audio speech analysis, and sales call analysis.

Working with Gengo gave us access to a large network of skilled crowd workers across 37 languages. This enabled us to collect a wide range of high-quality training data for AI development,” said CrowdWorks CEO Koichiro Yoshida. “I believe Gengo plays a key role in the development of natural language processing AI systems, especially for monolingual countries like Japan.”


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind