Making a Case for the First Open Source Platform for Synthetic Data

Print Friendly, PDF & Email

In this special guest feature, Yashar Behzadi, Ph.D., CEO and Founder of Synthesis AI, discusses on the importance of a community like OpenSynthetics in developing more capable AI models. Yashar is an experienced entrepreneur who has built transformative businesses in AI, medical technology, and IoT markets. He has spent the last 14 years in Silicon Valley building and scaling data-centric technology companies. Yashar has over 30 patents and patents pending and a Ph.D. from UCSD with a focus on spatial-temporal modeling of functional brain imaging.

As AI becomes more prevalent in day-to-day life, the need for more robust and powerful AI models has accelerated. These models require large amounts of diverse and high-quality data to ensure they perform robustly across situations. Obtaining this data is arduous and expensive.  

Additionally, computer vision traditionally relies heavily on supervised learning in which humans label key attributes in an image. However, this method has significant disadvantages. Hand-labeled data is labor-intensive, costly, time-consuming, and prone to human error and bias. Humans are also limited in their ability to label key data attributes such as the 3D position of an object or its interactions with the environment. 

Enter synthetic data. 

Synthetic data is computer-generated image data that models the real world. Technologies from the visual effects industry are coupled with generative neural networks to create vast, diverse, and photorealistic labeled image data. A synthetic data set is made artificially rather than through real-world data, allowing for training data to be developed at a fraction of the cost and time of current approaches. 

Synthetic data is available on-demand, reducing the cost and speed-to-market of computer vision models and products, allowing practitioners to experiment, and reducing time spent collecting and annotating data. 

The emerging technology is gaining steam. According to Gartner, 60% of the data used to develop AI and analytics will be synthetically generated in just a couple of years. Additionally, MIT Tech Review just listed synthetic data as one of the top breakthrough technologies of 2022. 

The demand for synthetic data is rising, and while more resources are becoming available to help educate the broader community, more can be done. To accelerate adoption, practitioners need a place where they can learn, discuss and share the latest innovations in synthetic data. 

OpenSynthetics is the first open-source platform for synthetic data to help educate the broader machine learning and computer vision communities on the emerging technology. OpenSynthetics is a centralized hub for datasets, papers, code, and resources on synthetic data, aiming to bring together researchers and practitioners in academia and industry to foster a collaborative community to propel the technology forward. 

Through OpenSynthetics, AI/ML practitioners, regardless of experience, can share tools and techniques for creating and using synthetic data to build more capable AI models. Whether an individual or organization is beginning their synthetic data journey or fully utilizing it in production systems, they will have access to content relevant to their needs and experience. 

OpenSynthetics welcomes researchers and practitioners across academia and industry to join the community. By contributing and participating, we will build a knowledge base to help grow the understanding and adoption of this emerging technology.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 –

Speak Your Mind