Sign up for our newsletter and get the latest big data news and analysis.

Video Highlights: Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store

Our friend Mike Del Balso, the co-founder and CEO of Tecton who created the Uber Michelangelo machine learning platform, and customer Geoff Sims, a Principal Data Scientist at Atlassian, gave the talk below at the recent Spark + AI Summit.

Mike does the intro and then introduces Geoff at at minute 21:30. Here’s the high-level summary:

  • Before Tecton: In house feature store, observed to be 95-99% accurate, with 2-3 FTEs supporting the service
  • After Tecton: Tecton feature store independently validated to be 99.9% accurate
  • 225,000 improved customer experiences per day purely as a result of the Tecton feature store

Productionizing real-time ML models poses unique data engineering challenges for enterprises that are coming from batch-oriented analytics. Enterprise data, which has traditionally been centralized in data warehouses and optimized for BI use cases, must now be transformed into features that provide meaningful predictive signals to our ML models. Enterprises face the operational challenges of deploying these features in production: building the data pipelines, then processing and serving the features to support production models. ML data engineering is a complex and brittle process that can consume upwards of 80% of our data science efforts, all too often grinding ML innovation to a crawl.

Based on experience building the Uber Michelangelo platform, and currently building next-generation ML infrastructure for Tecton.ai, the presentation shares insights on building a feature platform that empowers data scientists to accelerate the delivery of ML applications. Spark and DataBricks provide a powerful and massively scalable foundation for data engineering. Building on this foundation, a feature platform extends your data infrastructure to support ML-specific requirements. It enables ML teams to track and share features with a version-control repository, process and curate feature values to have a single source of centralized data, and instantly serve features for model training, batch, and real-time predictions.

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: