APIs: The Real ML Pipeline Everyone Should Be Talking About

Print Friendly, PDF & Email

In this special guest feature, Rob Dickinson, CTO, Resurface Labs, suggests that to achieve greater success with AI/ML models, through accurate business understanding, clear data understanding, and high data quality, today’s API-first organizations must shift towards real-time data collection. Rob’s built all kinds of databases and data pipelines. Keeping the end result in mind, Rob builds data architectures that focus on the consumption of data, whether it’s blazing fast queries against very large datasets or finding the needle in a haystack. Ultimately delivering better data access across all purposes and teams. Years at Intel, Dell, and Quest Software, framed his passion for customer input, and to find elegant ways to architect and build scalable software.

Whether data scientist or CEO, everyone hungers for more data. It’s not just a matter of volume, and not simply an exercise in “data viz,” today’s algorithm-driven organizations want insights as fast as possible — those business markers that AI and machine learning teams strive to deliver on.

You can’t do effective machine learning without having the Big Data, so organizations must learn to harness the millions (billions?) of daily interactions they have inside and outside their walls. APIs offer an existing and logical pipeline to get data into modelling and analytics processes.

To achieve success with AI and ML models, here are a few API-driven principles around business understanding, data comprehension, and data quality.

Machine learning begins with data access

Did Amazon raise the bar too high? The e-commerce giant blazed the path towards making services visible to everyone through APIs and now, every CEO, CFO, and CMO wants to rule them all. But without the scale and resources of Big Tech, data scientists are forever told “the data is coming” by IT teams, leading to C-suite executives boxed in by assumptions and guesswork rather than empowered by real-world patterns.

This is especially painful for organizations building out their API strategy at the same time as their AI and ML expertise. It’s often a lose-lose race between the teams responsible for infrastructure and the data scientists needing more information now.

For non-Amazon organizations, three principles are fundamental to the success of data analytics:

  • Accurate business understanding — The ability to map business needs to specific and measurable problem statements that eventually become goals for the models.
  • Clear data comprehension — The ability to gather, explore, and understand business data, including the identification of patterns, anomalies, and outliers.
  • High data quality — The ability to validate and clean data for completeness and correctness.

Additionally, with a greater focus on data access, come the safeguards that all organizations must face, such as implementing privacy and security standards. These processes will only get more complex over time, and restrict how the ML pipeline operates, incurring significant change and compliance overhead the longer a company waits to get it right.

The chances of success in these areas are higher when the barriers to collecting data are lowered, and when the data accurately represents the real-world scenarios being modeled. APIs contain this information already, it’s just a matter of knowing how to capture, store, and secure it.

Fueling the ML pipeline with the right data

Real-time behavioral data is the pathway towards better business understanding and comprehension. It cannot be overstated that any biases or errors in models are not overcome by looking at the model itself; they can only be mitigated by looking at the original source data.

For example, the success or failure of AI-based personalization engines can only be determined by understanding how customers behave and by adjusting the recommender model with those observations. With a higher level of observability in the business, using current and complete API data raises the ability to bootstrap AI systems more effectively and improve the accuracy of predictions.

To achieve success in real-time API data collection, organizations must:

  1. Dispel the myth that spending time on data labeling results in more accurate results. The value of timely (or as close as possible to it) datasets far outweighs the advantages of metadata perfection, especially when segmentation rules are built into the process from the beginning.
  2. Build privacy and security into the ML pipeline, using tools that understand modern privacy concepts natively. By treating these safeguards as a priority from the beginning –at data capture – businesses can avoid the frustrations that come with ad-hoc policies bolted onto generic datastores later.
  3. Adopt automation that eliminates the need for specialists to learn skills outside their domain. For data scientists, this means relying on DevOps teams to deliver real-time data in ways that can be explored; for DevOps, this means relying on tools that can capture and store the details of every API call in a persistent and secure manner and easily shared with other teams.

Ultimately, shifting to real-time API data collection to train, validate, and iterate AI and ML models leads to more timely results and fewer gaps filled by assumptions and guesswork. By arming teams with the skills and tools that connect APIs to data science and DevOps, models will be better able to deliver on the promises of accurate business knowledge, clear data understanding, and high data quality.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind