Yandex Data Factory Launches Automatic Image Moderation for Better Control of User-Generated Content

Print Friendly, PDF & Email

Yandex_logoMachine learning and data analytics experts Yandex Data Factory, today announced the launch of Automatic Image Moderation – a new service that uses machine learning and computer vision to automate image analysis and classification. With access to Yandex Data Factory’s data scientists, who helped CERN’s Large Hadron Collider to design a model that can allow savings of up to $4m per annum in data storage costs, the service enables customized, cost-effective and scalable content moderation for websites with large volumes of user-generated images.

Building on advanced computer vision technologies from parent company, Yandex – Russia’s leading search provider – Yandex Data Factory’s automatic service can identify in real-time what an image contains and whether user-generated content complies with site guidelines. This may entail whether the image is appropriate for the service, contains faces or text, or whether it is unique. Taking into account that the definition of “inappropriate” varies from country to country, and service to service, Yandex Data Factory builds customized models, training the algorithms on a specific website’s data set to learn the status quo, and offers high quality of recognizing “harmful” categories. Using Yandex’s proprietary global web index, Yandex Data Factory’s Automatic Image Moderation can also identify if an image has been duplicated across the web, helping to ensure that dating service websites, for example, are not devalued by spam, fake profiles or fraudulent activity.

Yandex Data Factory’s Automatic Image Moderation has already helped an international online dating service slash its image moderation costs, while improving the process’s overall accuracy and reducing fake profiles.

Websites that deal with user-generated content on a daily basis are painfully aware of the issues and difficulties surrounding content moderation,” said Jane Zavalishina, CEO of Yandex Data Factory. “But typically these services mitigate the risk by hiring human moderators, sometimes supported by crowdsourced APIs, who manually sort through user-generated content and images. This approach is costly, time-consuming, and inefficient as it cannot be easily scaled to meet load demand. Available automatic image classification APIs are generic and cannot be readily tailored to a website’s specific issues or requirements. By using custom machine learning service to automate image moderation, these websites are able to tackle their individual image-related needs in a highly-reliable and cost-effective way.”

Using crowdsourcing for moderation also involves exposing the actual images that the web-site owns, to external people, which may reveal sensitive content. In case of automatic moderation, Yandex Data Factory offers the possibility to avoid transferring sensitive images to external servers. Only image signatures are shared and a full on-premises deployment option is available.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind