Using Machine Learning with Health Data: The Challenges and Pitfalls

Print Friendly, PDF & Email

Applying Machine Learning (ML) to physiological data poses several challenges. While ML can be effectively used to model well-defined systems, applying it to a system as complex as the human body dictates a much more careful approach.

The bottom line is that the human body is complex and subtle, and oversimplifying – as common sense sometimes impels us to do – can be hazardous to your health” (Andrew Weil).

Clinicians understand when a chronically-ill patient requires attention by monitoring vital signs, as well as hundreds of other features. The human body is composed of several systems that affect each other, each with its own objectives and control mechanisms. In our models, each system can be modeled with a vector of features and vital signs that describes its state and a control model (such as a feedback loop). For example, the brain controls levels of CO2 in the blood by increasing or decreasing respiration.

With our team of ML experts and biomedical engineers, we identified several key areas of focus in our quest to model physiological processes. In some of these areas, the conclusions we reached were counter-intuitive.

Need for traceable conclusions

When someone’s life is on the line, clinicians need to understand how data is manipulated to get to a certain result. Clinicians we spoke with were not satisfied with a solution that only gives a final answer – they wanted to know the underlying factors that led to a specific conclusion. This finding reinforced our focus on mathematically modeling those processes and obtaining a deep understanding of how the system works. We forewent looking for black-box solutions, even if that meant slightly lower overall accuracy. This method is the only way to ensure traceability of every decision down to the vital sign trends that generated it.

Tailored data acquisition

When we started Spry Health, there was no great database of physiological signals we could leverage to create our models. In order for any training to work, we had to have both a sufficient amount of data and sufficient information in our signal. We came up with a specific sensor configuration and held our own clinical trials with hundreds of patients to gather the necessary threshold of data for our models.

We iterated through over 20 revisions of hardware to create a device that works across many different demographics. We understood early on that achieving high signal quality across different patient physiques is a huge challenge. We created an adaptive, wide area sensor to adapt to those physiological differences. For instance, the location of the radial artery and its dynamic properties varies widely across people. We had to find solutions to accurately acquire a signal from it across all types of patients.

Simple is better

It might sound obvious, but when data is pre-processed properly, simple models yield excellent results. Simple models worked better for us when we created machine learning algorithms to predict continuous vital signs as well as the state vector of the patient’s physiology. Correct physiological modeling makes a huge difference and turns a poorly engineered feature into a very relevant one. Physiological modeling can also identify where a feature might be relevant and where it might not. That’s one of the reasons we use an ensemble of simpler models and shy away from deep learning.

Utilize expertise, not only data

Another reason why we avoid deep learning is because a doctor can perform much better than any deep learning model. The performance of any learning model, human or machine, depends on the amount of data used to train it. Publicly available datasets for ML are fragmented and relatively small. They cannot compare to the extensive training physicians receive in medical school and the decades spent in research or care delivery. One day, ML datasets might catch up, but until then, we see a lot of value in incorporating known medical and physiological rulesets and indicators, as well as expert opinions, in our models.

Health data requires high level of customization

Any implementation of ML algorithms in the real world requires at least some level of customization. In the case of health data, the level of required customization is very high for 3 reasons: the inherent complexity of the human body, the accessibility and relevance of data sources, and integration into the existing healthcare system. Companies like Spry Health are hard at work solving these problems: already we are seeing the light at the end of the tunnel. As more data becomes available, the obstacles to using ML and AI will disappear. Right now, we still need to tread carefully.

About the author

Elad Ferber is the CTO and Co-founder of Spry Health and a graduate of Stanford Graduate School of Business. In a previous life, Elad was the Chief of Engineering for SpaceIL, a nonprofit start-up, where he led the team as it grew to over 50 people. Elad also served as an active duty officer in the Israeli Air Force, where he designed and led the development of innovative aviation systems.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind