What to Ask Yourself when Hiring a Data Scientist

Print Friendly, PDF & Email

In this special guest feature, Aria Haghighi, VP of Data Science at Amperity, discusses several important questions to ask yourself when hiring a data scientist. Hiring data scientists is hard. They’re hard to find since there are fewer trained than can meet demand, and it’s challenging to properly interview and vet them (especially the first in your organization). Aria is responsible for leading the company’s world-class data science team to expand core capabilities in identity resolution. He has more than 15 years of technology experience playing key advisory and leadership roles in both startup and enterprise companies. Most recently, Aria was Engineering Manager at Facebook where he was responsible for leading the Newsfeed Misinformation team, which uses machine learning and natural language processing to improve the integrity of content on the platform and tackle the prevalence of fake news, hoaxes, and misinformation. He has also held leadership and technical roles at some of the world’s biggest tech companies including Apple, Microsoft and Google.

Hiring data scientists is hard. They’re hard to find since there are fewer trained than can meet demand, and it’s challenging to properly interview and vet them (especially the first in your organization). These challenges have been written about in several places, but I’ve often found a different set of concerns that have been  less discussed, that are at least as challenging. 

What Kind of Data Scientist Should You Be Hiring?

Data science is a broad field, encompassing individuals who spend all of their time iterating on large-scale production machine learning systems to people who primarily do offline work and visualization. They can report up through to VP Engineering, VP Product, or their own organization (up to a VP Data Science or Chief Data Officer).

At the risk of oversimplifying, the two extremes types of data scientist that exist are:

  • Product focused: Less emphasis on novel algorithms, and better understanding of business objectives and finding novel insights in data and opportunities to achieve them.  This type of data scientist should have stronger product intuition, curiosity about data patterns, and a general understanding of business value. This person may be weaker on production engineering dimensions. 
  • Technical:  More emphasis on cutting-edge technical approaches to problems and deeper technical awareness. Aware of state-of-the-art research for common problems (Image recognition, or customer-feedback oriented ranking). Typically more likely to work on production systems.

To be clear, neither is a better choice than the other. I think the key question to ask is “how confident are you in the area you want to make a data science investment.”  If you need someone to understand your organization and their first task is to figure out what investment will yield the best business value, you probably want a product-oriented data scientist. If you’re very confident in the area of investment (you know you need good handwriting recognition as part of your product, or some tweak to a standard problem like image recognition), you’re probably best served by someone on the more technical end. 

Obviously, organizations need both types, in the same way some of your engineering team, typically on the front end,  should have good product sense, whereas it’s more crucial to have technical depth elsewhere (typically back-end systems). But for most hiring organizations, they’d usually be better served by one or the other at a particular point in time. 

How Much (Data) Engineering Do You Expect?

Deriving value from data science typically requires a large investment in logging, cleaning, and organizing data throughout your organization. Often times, the data science role is the last one to hire once you feel like you’ve done these prerequisites. This problem is the heart of the role mismatch which is a common complaint from data scientists — the sense  you were hired to analyze or solve cool problems with data, but you spend nearly all your time logging, cleaning, and organizing data. The root of this mismatch is that organization’s without data scientists rarely have an understanding of the infrastructure they need in order for algorithms to really be the most valuable area of focus.

The good news is, there are many T-shaped data scientists who out of sheer necessity have done a lot of the data engineering and quality work. The reality is there is usually no free lunch with skills from potential hires, and some willingness and breadth to do data engineering, usually comes at the sacrifice of some depth. More importantly, organizations need to be honest with themselves about where they are in the AI hierarchy of needs and adjust both their role description and interview process accordingly. If you need strong data engineering skills to lay a foundation before data science work, that’s totally fine, but you should be (a) transparent about that need (b) test and vet that skill set during the interview if that is a good chunk of the job for the next 12-18 months. Doing both of these will ensure a better ongoing fit between your organization and your new data science hire.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind