Sign up for our newsletter and get the latest big data news and analysis.

The Rise of the Citizen Data Scientist: Detente in the Era of Data Wars

To have made it as a Data Scientist is the completion of the ascendance to the pinnacle of the analytics Everest. Notwithstanding the fact that there is no consistent or widely accepted definition of who a Data Scientist is, the designation is highly sought after by employers. At a high level, Data Scientists are usually called in to perform the mathematical, statistical and computational wizardry to address a variety of business problems that organizations grapple with regularly. There are too many to list, but a small sample includes financial fraud, manufacturing optimization, customer journey evaluation, image recognition and smart home management.

Data Science is hard and doing it well is harder still. Resources are in short supply, even in the tech epicenters like the San Francisco Bay Area, and they are expensive as ever-increasing demand puts Data Scientist wages well above the median across most professions. Data Science is not a profession that one gets anointed into after graduating with a degree. Good Data Scientists are the ones who spend a lot of time in the trenches in various aspects of the business to gain an understanding of a complex operating environment to realize the nature of challenges and opportunities facing the organization. In short, to become a good Data Scientist requires wide ranging hands on experience and that takes time.

Does that mean businesses and other organizations with pressing challenges are lost? Should the fact that a Data Scientist is not available imply they’re doomed to analytics purgatory? Are they not entitled to deliver an optimal transactional experience to their customers and stakeholders, while getting a good return on their investment? Fortunately, there are answers to these questions that amount to a call to arms for organizations that cannot afford a Data Scientist so they can regain their analytics mojos.

Buy a solution that takes the mystery out of Data Science work

There are a few trends that show organizations are innovating fast to deliver self-service analytics solutions. Primarily these solutions promise to reduce the developing mass of work that is Data Science – to something that is better defined and more manageable.

First, many advanced analytics solutions provide users with pre-built algorithms that require no ground up coding.  For example, a text analytics algorithm that extracts important information from unstructured notes need not be written from scratch. A pre-built version can be modified to refer to the right data sources and pick the right parameters, so users don’t have to spend their time figuring out how to code the algorithm and can use their business knowledge to ensure the algorithm is applied in the right contexts.

A second area of development has made solutions more user-friendly. These enable non-Data Scientists to conduct complex analytics projects:  the rise of Natural Language Processing (NLP)-like capabilities in solutions. NLP is teaching computers to accept input in the natural, spoken language of humans – eliminating the communications barrier between man and machine.

Recognize that Data Science is not one monolithic enterprise

We in the industry sometimes think of Data Science as being one task, albeit an important one that is performed to deliver critical insights. Nothing could be farther from the truth. Data Science is an agglomeration of tasks that can be broken down into multiple steps. While it is a bit of a digression to go into all the various components of Data Science, suffice to say that some tasks like business problem definition and data cleansing lend themselves to quicker training than others like model building.

Given the dearth of data science resources that can conduct high value analytical modeling, it helps to understand how to optimize the Data Scientist to focus on these tasks and parse out the others to people who have overlapping skills. For example, a database engineer can curate the information needed to answer a business question as well as be responsible for all pre-processing needed before the data are analyzed. Once the logical components of the Data Science profession are recognized,  some of these tasks can be allocated to non-Data Scientists while the truly advanced analytics tasks are parceled to those qualified to perform them best.

Manage Analytics Literacy as a strategic asset

Training and education programs geared toward ensuring ever increasing levels of analytics literacy is a prerequisite in enterprises. A 2016 TDWI report discussed the impressive challenge that was obtaining and retaining top analytics talent. Given the acute shortage of quality resources, the sooner there is an investment in data science programs within the organization the greater the analytics literacy quotient. It isn’t always necessary to invest in highly structured Data Science programs that teach students how to code with mind-boggling levels of complexity. Investing in  non-traditional sources of learning that enable just about anyone to leverage their innate sense of curiosity, ask the right questions and deliver a logically defensible set of answers from available data can go a long way towards ensuring that organizations can compete with their peers when it comes to being data driven.

The era of the Citizen Data Scientist

Becoming a “true” Data Scientist is no short order. As TDWI’s Fern Halper rightly says “…from knowledge of statistics and advanced math to computer science and development. They also need to be critical thinkers who know how to communicate and who understand the business.” So the next best thing is to create certain internal best practices that nurture and develop “Citizen Data Scientists.”

Citizen Data Scientists are typically business-savvy individuals who have a vast knowledge of the challenges and opportunities confronting the organization but are not experts in the areas of statistics and data computation. These are individuals who can implement predictive models, interpret the results of those implementations and figure out ways to effectively operationalize critical insights.  These individuals are easier to develop and retain compared with attracting the archetypal Data Scientist.

The advent of the era of the Citizen Data Scientist is not to be considered a threat to the Data Scientist in the organization. If anything it helps validate the assertions made by Data Scientists about the power of advanced analytics. One of the bigger challenges that Data Scientists have traditionally faced is the ability to get buy-in from organizational leadership. Having more people like the Citizen Data Scientists attests to the unbridled power of analytics. The potential impact of this role on the organization will convince the C-suite of its importance.

Citizen Data Scientists have staked a middle ground in organizations where they aren’t too technical or too focused on business operations. It is precisely this strategic position that ensures credible access to the organizational leadership teams and enables them to garner a greater ability to influence organizational decision-making by showcasing the power of fact-based insights.

Delivering Citizen Data Science Best Practices

Here are a few high level best practices that are worth considering for delivering on the promise of the Citizen Data Scientist:

Small is beautiful: Analytics projects often fail or face spontaneous combustion because someone conceives of a massive problem to solve. It is one thing to cure world hunger, an entirely admirable exercise, and quite another to start by curing hunger in your own corner of the world. For advanced analytics to succeed it is best to break problems into smaller constituent pieces and resolve each one before tying them all together into one giant narrative.

Automation: Creating a framework wherein insights can be automatically derived, communicated and operationalized is truly half the work. Many analytics solutions vendors have applications that can be customized for specific business use cases such as fraud monitoring, customer churn tracking, manufacturing optimization and more.  In addition to creating purpose built applications, organizations are creating solutions that automate workflows. Open source tools are increasingly integrated into the process to automate the analytics workflows and make them repeatable for widespread adoption.

Multi-genre analytics: Gone are the days when a single technique is enough to deliver solid insights. With the proliferation of data sources and types, the need to thread multiple analytic techniques becomes indispensable to deftly interrogate data. For example, when doing customer churn analysis it becomes necessary to look at various channels of customer interaction from call centers to in stores to online channels. Each has its own idiosyncratic structure, so text and sentiment analytics must be deployed to look at activity in the call center, log parsing to look at data from web servers, and SQL to slice and dice data from CRM systems. Once this initial analysis on each data silo is complete, all the information needs to be ingested into a predictive model to determine a score for each customer.

There are  many more best practices that will help ensure the analytics richness of organizations, particularly in the context of enabling the widest constituency of individuals in the business. These three are the first steps that can be taken to realize the promise of what advanced analytics can deliver for the organization. The constraints have so far been on getting the right resources and on the inexplicable insistence on hiring seasoned data scientists to resolve business challenges. With the advent of the Citizen Data Scientists, the hope and the vision is for organizations to be able to better navigate the complexities of advanced analytics and effectively deliver those insights that have the greatest positive impact on the operation without having to put further constraints on hard to obtain human and other analytics resources.

About the Author

Sri Raghavan is a Senior Global Product Marketing Manager for Teradata Aster, with more than 20 years of experience developing products, marketing and sales initiatives that drive the performance and profitability of organizations across the Big Data Applications, Financial Services, Healthcare, and Management Consulting industries. Sri has a history of supervising data science and analytics projects across industries and big data programs to effectively align technology with business goals and financial objectives. Sri has built, trained and supported top-performing global IT teams and has presented and demonstrated a variety of analytic functionality in conferences across the U.S. and overseas.


Sign up for the free insideBIGDATA newsletter.



  1. Dan Yarmoluk says:

    The citizen data scientist is also referred to “data translator” by McKinsey.

    It’s an imperative and complementary to the vast skill set a data scientist has, and the citizen or translator should be the bridge to answer business questions.

Leave a Comment


Resource Links: