Organizing Data Science Teams For Strong ROI

Miles Johnson

In this special guest feature, Miles Johnson and Sam Hochgraf of IBB Consulting Group, discuss how to build small, highly-specialized teams of experts that can work collaboratively to support the data science pipeline. Miles Johnson is a principal consultant and Sam Hochgraf is a consultant with IBB Consulting Group, which helps media companies and service providers plan and execute big data initiatives, operationalize analytics teams and develop differentiated analytics strategies.

Look no further than the sports world for proof that just having all-star talent doesn’t always guarantee success. They must work together cohesively, have a solid strategy and be organized in a way that plays to each member’s strengths.

Data science teams are no different. Companies are tirelessly hiring the best and brightest to support big data efforts. This is crucial. But to get the best ROI out of your team and achieve maximum effectiveness for your organizations, there are other key considerations.

Every company’s needs will vary. But many share common goals that can be more effectively achieved with the right structure and roles. In IBB Consulting’s experience, a business-facing data scientist supported by a data science product manager and a lean team of domain experts maximizes return on data science investment. Consider the following roles, responsibilities and processes for your data science team.

Data Analyst

Your data analyst leverages deep vertical experience and an intimate understanding of the data that leads to the right questions. This role is responsible for the “ah-ha” insights that drive enterprise-wide decision making.

Effective data science teams have analysts with deep ETL and relational database experience. As data volume grows, a solid foundation in SQL means analysts can pivot easily to newer distributed tools like Hive and Spark SQL. NoSQL experience is useful, but data doesn’t become valuable to analysts until it’s accessible with a tool that has the look and feel of a conventional relational database.

Proficiency with an open source scripting language like Python or R is critical in deadline-driven situations where an analysis just “has to work.” Data analysts with the skillset to quickly prototype a new dashboard or view, even using Microsoft Excel, are able to engage with stakeholders more effectively.

Sam Hochgraf

Data Management & QA

The role of Data Management and QA is to evaluate the integrity and usability of new data sources and account for privacy policy and boundaries on data usage. This role manages enterprise risk and establishes protocols and definitions for knowledge share and reuse that enable the data science team to iterate quickly without questioning how data may be used.

In many organizations the scope of the QA team is too broad to provide targeted analytics testing. Because a single data quality issue in a base dataset can render results of derived models meaningless, it’s important to have specialized QA that understand the data and collaborate with data scientists and engineering throughout the modeling process.

Working knowledge of relational databases is essential because mainstream statistical software requires relational dataset input. The prevalence of NoSQL databases and RESTful API data sources means that data management and QA resources should also be familiar with JavaScript Object Notation (JSON). A working knowledge of APIs and HTTP is useful for media companies with a fleet of digital properties.

Data Engineering

Data engineering builds the infrastructure that supports a media company’s data science pipeline. Depending on analytics maturity, this may include a fault-tolerant ingest pipeline for streaming and file-based data, near real-time analytics and data processing, integrations with third party analytics providers and advertising, marketing and CRM systems.

We suggest looking for engineers that have architected and built distributed, service-oriented data platforms. Experience using open-source frameworks like Hadoop, Spark and Kafka is advantageous, but the rapid pace of innovation makes an aptitude for learning emerging technologies almost as valuable as a mastery of mainstream solutions. Language skills vary by technology stack, however, engineers able to pivot between backend and frontend development are especially valuable for making insights actionable.

Teams that maximize efficiency of build-test iterations compress deployment timelines for new analytics and data science features, inherently increasing the ROI of these initiatives. Consider engineers that evangelize unit testing and frameworks, including test-driven development and continuous integration so as to incorporate as much testing into the development process as possible.

Data Scientist

In addition to expert knowledge of statistics and statistical computing, data scientists with some programming and data engineering experience (e.g., Python, SQL, Hive) can drive agile data science teams. Agility is a prerequisite to the test and learn culture that has proven to drive rapid product innovation. Although most data-munging should be delegated to data engineering, the ability to cleanse data and create base datasets reduces dependencies and decreases turnaround time for POC and experimental work.

One of the key qualities to look for in a data scientist is a strong business acumen. This brings benefits that extend through to the rest of the team, and makes all the difference between good and great ROI. Data scientists that understand the business are able to ask the right questions of the data, which drive the right analyses, thereby perpetuating the virtuous cycle. Further, they are effective communicators and storytellers that can convey difficult concepts in ways that are easily consumed by business stakeholders. This is important because an insight isn’t actionable until it’s in the hands of a business operator empowered to effect change.

Data Science Product Manager

An agile data science product manager has two primary responsibilities: prioritize projects to maximize ROI and remove obstacles so that the data science team can operate quickly and efficiently.

Effective product managers set KPIs around data science initiatives to measure and accelerate ROI. An ROI-centric data science product manager does not release a feature “into the wild” until there is a way to measure its impact. This role collaborates with subject matter experts to identify quantitative measures of success and failure, and then ensures that ongoing investment is aligned with projects that are driving value.

Productive data science product managers set and manage executive expectations so that the data science team can focus on results while simultaneously managing a product pipeline and being responsive to changing business needs.

Veteran data analysts or individuals with experience in data management or operations are strong candidates because they usually have excellent communication skills and understand the challenges presented by web scale data. These individuals often have past experience collaborating with IT and engineering departments that they can leverage to rapidly advance the data science agenda.

Success By Design

Team structure and composition, coupled with a proven approach for prioritizing high-value data science initiatives can improve the ROI and output of a data science program. Consider how this approach may be modified in the context of your organization’s culture and goals to realize the full potential of your data science investment.

Sign up for the free insideBIGDATA newsletter.

Organizing Data Science Teams For Strong ROI

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Organizing Data Science Teams For Strong ROI

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC