In this special guest feature, Devavrat Shah, professor in MIT’s Department of Electrical Engineering and Computer Science, discusses how organizations must become efficient and able “data science machines” in order to succeed in our new digital world. Devavrat Shah, co-director of the Data Science: Data to Insights course, director of the SDSC, and a core faculty member at the IDSS. He is also a member of MIT’s Laboratory for Information and Decision Systems (LIDS) and the Operations Research Center (ORC).
The big data revolution is set to explode over the next few years, promising to transform the way we work and do business. Every company in every industry around the globe will generate, collect and store an increasing amount of data in order to remain competitive. This reality will require organizations to invest in new tools and technologies. However, the real key to success lies in human talent. Success relies on having a team of professionals with the knowledge and skills to effectively convert the increasing volume, velocity, and variety of data into meaningful insights on a massive scale. In other words, organizations must become efficient and able “data science machines” in order to succeed in our new digital world. The question is how to get there.
A recent survey by Gartner revealed almost 60% of technology professionals thought their organizations were not prepared for the necessary changes to bring about a digital business approach. A shortage of technical skills was the number one problem. It’s clear the demand for data scientists is rising, but the supply isn’t keeping pace.
Part of the problem is that what the industry calls a “data scientist” today, is really a combination of several different roles. Being a data scientist in the modern world now requires a unique combination of skills ranging from data engineer to statistician to business analyst. To address this, data scientists must evolve and grow by investing in ongoing education, specifically through institutions with multidisciplinary programs that include elements from engineering, mathematical sciences, and social sciences.
Aside from building knowledge and know-how in the various disciplines and sub-disciplines, data scientists must keep pace with what is happening in other industries outside their area of expertise. Examining broad trends and recognizing patterns can help determine which tools and technologies are a priority to learn about. For example, computer languages such as Python are gaining traction in the big data and analytics space after being leveraged successfully in the scientific computing community. Data scientists will soon need to be well-versed on the ecosystem and be familiar with other tools, perspectives, and approaches, so they can identify which methods and models are most appropriate for their use case.
Another a critical puzzle piece is knowing the common pitfalls of data science – what they are and how to avoid them. Learn from the mistakes of others. For example, many organizations fail to capture the correct data they need to make the right decisions. Or high-stake decisions made based on insights that are skewed by inaccurate or incomplete data. Case in point: a consumer facing company makes a drastic strategic change based on large number of negative reviews while neglecting to consider that consumers are simply more likely to provide feedback when its negative, which means the information was analyzed out of context.
Data scientists of the future will be challenged to not only identify which data to collect and use in analysis, but also how to develop data streams that in some cases may not exist or haven’t been collected. For example, how to find and incorporate that data from consumers with positive reviews who simply weren’t compelled to submit their feedback online. Data science is all about having the data you need. A fundamental challenging facing data scientists is figuring out not just what data to collect, keep, and use, but also what data is missing and where to find it.
In the coming years, big data will become bigger, faster, and more complex. This will in turn, fuel the need for more intricate predictions and computations at scale, sparking the need for next generation data scientists with more than the ability to collect, manage, and utilize all sorts of data – all in real time. While much of the analysis will be algorithmic and automated, in almost every case, humans will still be needed to pull together common threads and draw conclusions that inform important business decisions. For that reason, it is extremely important for organizations to make professional training a priority so the data scientists of today are prepared for tomorrow.
Sign up for the free insideBIGDATA newsletter.