In this special guest feature for our Data Science 101 channel, Smita Adhikary of Big Data Analytics Hires shares her thoughts about how the data science community has changed over the years – many useful tips for those just entering the field. Smita Adhikary is a Managing Consultant at Big Data Analytics Hires- a talent search and recruiting firm focused primarily on Data Science and Decision Science professionals. Having started her career as a ‘quant’ more than a decade ago building scorecards and statistical models for banks and credit card companies and having spent many years in management consulting, she has witnessed from very close quarters the transformation brought about by the advent of “Big Data” in the skill-sets desired in ‘quants’. Like most ‘quants’ she holds a Masters in Economics and like a lot management consultants an MBA from Kellogg School of Management.
Once upon a time, when I started my career in Banking, we used to build to scorecards. We were the much revered econometricians using the past to predict the future. We would ferociously debate the merits of logit vs. probit over steaming cups of coffee in office corridors; sometimes even indulge in critiquing Heckman’s two-stage estimation when discussing payment projection models for Collections. We were given the luxury of an observation period from history in which the customers’ profiles were assumed to be frozen, and the dependent variable came from a non-overlapping performance window following the observation period. The rules of engagement were set and neither the lender (i.e., the merchant) nor the customer could interact in the interim to change the outcomes of the game. The only player determining the outcome of the game (e.g., pay/default) was the customer, once s/he was approved by the lender. In order to build these scores, we would patiently wait for 6-12 months from the time the customers were acquired to gather sufficient length of performance. This information would then be used to predict the likelihood of ‘good’ and ‘bad’ based on the customers’ profiles from the time of acquisition.
Life was good. But then something happened – Big Data happened! It completely changed how the merchant and the customer would interact. Forever.
Cut to the digital era, the customers and the merchant have now started interacting in a dynamic setting where nothing is frozen anymore. There is no so called ‘performance period’. The customers are free to navigate the merchant website any which way they fancy. Also, the merchant can display content and place offers dynamically based on how a given customer interacts with his website. To make matters more complicated purchase decisions are not necessarily made on the first visit itself. Internet savvy customers now have all the information at their fingertips to land themselves the best deal. They typically go through the AIDA (Attention-Interest-Desire-Action) journey when contemplating a purchase. In this scenario, the customer’s site navigation on the day of the purchase is mere execution of a decision that has been made even before the customer lands on the site – the customer has been on the site before; the customer is aware of what is on offer; the customer knows exactly how to get to the page on the site where they can choose the product they desire. In fact, the pages visited on the day of the purchase are often not causal to the purchase, just simply correlated.
So why am I subjecting you to this drivel?
The point I am trying to simply make is that in the new world the focus has dramatically shifted from prediction to classification. The selling and buying is now all happening in a real-time environment where the two players are interacting with each other, and repeatedly. The merchant has the leverage to influence the customer’s behavior through customized offers based on behavioral segmentation and contextual targeting. In essence, the dependent variable here is really immaterial. All the merchant wants to understand is ‘who the customer’ is and that will determine what offer to place. On a more technical note, since the customers are now visiting the merchant’s site several times the independence of each record ceases to exist – a mortal blow to the much beloved logistic regression. And then fall Caesar? Enter Machine Learning.
All of a sudden the whole world of analytics is now talking about Support Vector Machines, Random Forests, Bagged Regressions et al. – everything is about classification; everything is about adaptively learning and self-evolving algorithms that augment the understanding of the customer with every successive digital footprint.
This makes sense. This is all good. But let us spare a moment to think what this has entailed for the analytics job market.
Now we have a clear demarcation of the “Predictive Modelers” and the “Data Scientists”. The former have been kind of relegated to traditional banking, insurance, telecom companies where static scores and optimization based solutions are still pursued (but not sure for how much longer!). The latter define the sought after (just like we were eons ago) whiz-kids ruling the roost at the cool tech companies, and presumably changing the world. This paradigm shift has drastically changed the ‘skills’ requirement in job descriptions: when screening candidates employers are now specifically looking for ‘Python, R and machine learning’, as against ‘SAS, regression, optimization’ in the days of yore. The seriousness of their intent is cemented by the fact that they are willing to dish out startling salaries (typically, 40% or higher than econometricians with comparable levels of education and experience) for the new age skills. Believe you me, the employers are not kidding around – if you got the chops, they will pay. What I find most amazing in the current context is the fact that the ‘gold standard’ of analytic excellence, as far as perception goes, has now become much smaller companies – the new-age cool tech startups. Nobody pays much attention anymore if you have a behemoth like Bank of America or a Chase or an Oracle or even a McKinsey on your resume.
The discourse above surely seems to paint a rather bleak picture for the Predictive Modelers out there. In reality, what I realized having lately devoted fair amount of time and research on the topic, and having helped people through it as they consider career changes, is that there is a lot of help available on the world wide web (and for FREE!) if one makes the commitment to learn. A seasoned econometrician can quickly become an expert on machine learning by simply enrolling in the fantastic courses offered at Coursera by professors from Johns Hopkins, Stanford and others. The legendary professors Dr. Hastie and Dr. Tibshirani have even made their classic book on the subject available for free download. Just think – Elon Musk launched a company that now builds rockets at 1/5th the cost at which NASA builds them(never mind the recent explosion) … but where did he start? by reading books on rocket science! And here you have a chance to learn from the best. So, rather than feeling depressed, I implore you to make this monumental next move in your life. It’s time to turn a new page in our careers. We will talk on the other side!
Sign up for the free insideBIGDATA newsletter.