Analysis Paralysis – an Overtold Cliché or a Case in Point?

Print Friendly, PDF & Email

Smita AdhikaryIn this special guest feature for our Data Science 101 channel, Smita Adhikary of Big Data Analytics Hires highlights how data scientists sometimes tend to get bogged down in the “how” of a problem rather than the “why” of it, and end up delivering highly predictive, yet essentially meaningless models for the business. Smita Adhikary is a Managing Consultant at Big Data Analytics Hires- a talent search and recruiting firm focused primarily on Data Science and Decision Science professionals. Having started her career as a ‘quant’ more than a decade ago building scorecards and statistical models for banks and credit card companies and having spent many years in management consulting, she has witnessed from very close quarters the transformation brought about by the advent of “Big Data” in the skill-sets desired in ‘quants’. Like most ‘quants’ she holds a Masters in Economics and like a lot management consultants an MBA from Kellogg School of Management.

Prologue

A few days ago I ran into an old colleague, let’s call him Jack, who seemed all kinds of distressed. “I just don’t get it! It’s the best possible model you can get with such nasty data. What more can they expect?” The cause of this distress? People “who do not know the A-B-C of modeling,” let’s call them the “business” people, had the audacity to shoot down his model. This was the surmise of the situation: he had been handed a rather sparse omni-channel attribution modeling sample by his client (only 70 weeks of data amounting to just as many data points). Tapping into his superlative repertoire (read: bootstrapping, bagged regression, principal component analysis) he delivered a final model with an 86% R-square with each variable included being statistically significant. Einstein famously said “insanity is doing the same thing over and over again and expecting different results.” We data scientists are not insane, that’s why we keep trying different things till we have tortured the data enough to sing to our tune (some call it the final model).

Rude Awakening and Introspection

Cometh Judgement Day, however, his client did not quite buy into his findings. Why? Because Jack’s model was attributing 87% contribution to sales to a single channel (Paid Search); not to mention contribution of social media was significant but negative. Client’s reaction? “This doesn’t make much sense in our context. We know PPC is strong but never that dominant for our business. Also, I can understand that social is weak, but how can it have a negative contribution?” The statistical fit in this case, however strong, was belying commonsense and market realities.

Me: “Why do you think PPC alone is so dominant?”

Jack: “Well they have been steadily ramping up PPC over the last 4 months.”

Me: “Just PPC?”

Jack: “No the same with social and natural search.”

He showed me some time series plots. Now I have been away from the thick of such things for some time, but somewhere “Ice-cream sales and death by drowning” had stuck with me. Careful not to overstep my boundaries and offend him, I put it mildly – “Spurious correlation?” Turned out it was much more than that.

As far as the spurious correlation argument went – it was clear that the impressions and clicks from PPC, social, and natural were all moving in the same direction and appeared seemingly correlated. But anyone in the media business would know that this was due to higher simultaneous investment in these channels (purely the media manager’s discretion), and not necessarily any indication of an organic relationship between the channels. Then how did Jack’s model not catch this collinearity? Blame it on linearity – the interactions of the variables may have been non-linear.

However, what was more concerning about the model was the way the adstock functions for each channel was optimized (adstock captures the carry over effect of a media channel. In it’s simplest form, if you generated 100,000 impressions from a display ad in a particular week and assumed an adstock factor of 50%, you would expect 50,000 of these impressions still effective in the following week. In other words, even if you did not run a display campaign in a particular week, 50% of the people who had seen the ad in the previous week will still remember it from the previous week as though they were exposed to it in the current week). Jack’s model had applied different carry over factors in an exponential decay form for each channel. In this pursuit of sophistication, he ended up with a model that was almost impossible to decode as far as channel interactions went.

Thinking Afresh

I think we can all agree that perhaps multivariate regression should not have been the chosen method in this case. A better alternative may have been Random forests because its tree-based algorithm moves away from linear curve fitting. Furthermore, the candidate variables for every tree node are selected randomly from the full set of variables, thereby giving each factor “a fair chance to play.” This feature defacto decorrelates the trees. The final ensemble model bags the individual trees to reduce bias and variance. Application of this method did get our Jack random forest variable importance indices which placed natural search at par with PPC; social was weaker but still significant.

Epiphany

Many of us data geeks have been in Jack’s shoes at one time or another. We get so sucked into the complexity of the problem and get so close to it, we end up losing sight of the bigger picture. In doing so, we greatly harm how we are perceived by the business people who readily form an opinion that these nerds do know a lot of funny looking equations but have no idea how a business makes money. And this brings us to a discussion around the perils of “analysis paralysis.” Data scientists like Jack consider this to be a highly derogatory term often used by “insecure MBA-types who’ve built their careers ‘selling’ other people’s creativity via fancy decks”. And this is largely a true assessment. However, to be truly bipartisan one must acknowledge that incidents such as the one cited above can often create situations where serious doubts are cast on our understanding of the business side. I have seen data scientists go truly overboard trying to fit the best model on tricky data, ending up building highly predictive but essentially meaningless models with cubic transformations, square roots etc. in classic examples of overfitting. I mean are we somewhat missing the plot here: focusing on the “how” way more than the “why?”

Epilogue

So, what could the data scientists consider?

Beyond a certain level of accuracy, the predictive performance of models enter a “fat zone” where potentially a large number of “good” models can exist. So, is it worth compromising “response time” to gain 2% incremental accuracy? In my experience, it is way more valuable to focus on the narrative construction once you have attained that level of fit. Get all the stakeholders involved. Lay out the entire road map of how the model will be implemented and what returns can be expected in the short and long term from it – and, above all keep it simple and anecdotal. Consider the example of a great movie, say Goodfellas, when you watch it do you really think about the camera placements, screenplay techniques etc.? No. Because it is not your concern. You just wanted a great movie experience, and if the makers of the film could not meet that expectation nothing else matters really. If you consider your management/clients much the same way, your positioning and the value that they can see in you will exponentially increase. I will leave you with this parting thought: if you cannot have a chat with the bar-tender about that model you built today you are probably not doing full justice to your capabilities; and as far as the beta-gammas are concerned, most of you have done that to death in Carnegie Mellon and Stanford anyway.

 

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*

Comments

  1. Extremely well written.