Book Review: Weapons of Math Destruction by Cathy O’Neil

Print Friendly, PDF & Email

Normally the books I review for insideBIGDATA play the role of cheerleader for our focus on technologies like big data, data science, machine learning, AI and deep learning. They typically promote the notion that utilizing enterprise data assets to their fullest extent will lead to the improvement of people’s lives. A good example is the review I wrote early last year on “The Master Algorithm.” But after reading “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy,” by Cathy O’Neil, I can see that there’s another important perspective that should be considered.

According to O’Neil, Weapons of Math Destruction or WMDs can be characterized by three features: opacity, scale, and the damage the model causes. WMDs can be summarized in the following ways:

  • An algorithm based on mathematical principles that implements a scoring system that evaluates people in various ways.
  • A WMD is widely used in determining life-affecting circumstances like the amount of credit a person can access, job assessments, car insurance premiums, and many others.
  • A common characteristics of WMDs is that they’re opaque and unaccountable in that people aren’t able to understand the process by which they are being scored and cannot complain about them if they’re wrong.
  • WMDs cause destructive “feedback loops” that undermine the algorithm’s original goals, which in most cases are positive in intent.

The book includes a compelling series of case studies that demonstrate how WMDs can surface as a result of applying big data technologies to everyday life. Here are some examples:

  • Algorithms used by judges to make sentencing decisions based on recidivism rates.
  • Algorithms that filter out job candidates from minimum-wage jobs.
  • Micro-targeting algorithms used in politics that allow campaigns to send tailored messages to individual voters.
  • Algorithms to assess teacher performance based on student standardized test scores.
  • Algorithms that use demographics data to determine online ads, judge our creditworthiness, etc.

Here is how O’Neil characterizes WMDs in the book:

The math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domains: mathematicians and computer scientist. Their verdicts, even when wrong or harmful, where beyond dispute or appeal. The they tended to punish the poor and the oppressed in our society, while making the rich richer.”

O’Neil certainly has the math street cred to write this book. She is a data scientist and author of the popular blog mathbabe.org. She earned a Ph.D. in mathematics from Harvard and taught at Barnard College before moving to the private sector and working for the hedge fund D. E. Shaw. O’Neil started the Lede Program in Data Journalism at Columbia and is the author of a very nice book that I like to recommend to newbie data scientists, “Doing Data Science.”

O’Neil’s main premise is that many algorithms are not inherently fair just because they have a mathematical basis. Instead, they amount to opinions embedded in code. But as a special kind of model, a WMD hides its foundational assumptions in an impenetrable black box. Models like these obscure the source and kind of their input information and model parameters, they rely on proxy data instead of directly observable inputs, and they create invisible feedback loops that make their effects nearly inescapable. The mathematicians who create the algorithms often are unaware of the biases introduced, and sometimes the opaque and powerful algorithms are in effect “secret laws.” O’Neil makes the case that laws protecting consumers and citizens in general are behind the times with respect to the digital age and need to be updated.

As a quant on Wall Street, O’Neil was eager to put her math skills to use. But soon she realized that the hedge fund she was working for was betting against people’s retirement funds, and she became deeply disillusioned. She felt the way math was being used was immoral and once she left Wall Street, she joined the Occupy movement. O’Neil is uniquely qualified to talk about the social and political implications of this kind of math given her deep knowledge of modeling techniques and her insider’s perspective for how companies are using them.

I recently experienced a WMD myself. I attended a local Meetup group here in Los Angeles that was all about the virtues of big data and machine learning. One speaker, a representative from a large, well-known electronic gaming company advised the attendees about how he used data to affect the behavior of gamers. The WMD nature became clear when this data science team manager proudly exclaimed how his company developed technology to “addict kids to their game products” using psychological techniques coupled with algorithms. My take-away was thinking something wasn’t quite right, and it wasn’t until I read O’Neil’s book that I understood he was admitting to using a WMD.

As a data scientist myself, reading O’Neil’s book was eye opening. I reflected back to all the data science projects I’ve worked on and wonder how many of them have evolved to become WMDs. Moving forward, I intend to do a deeper dive on projects I may engage. I think all data scientists should do the same. In light of potential harm, it might be a good idea for data scientists to take a “do no harm”oath, sort of like a Hippocratic oath for data.

My only concern with the book is that it was never really explained that mathematics, as an objective science, cannot itself be blamed for contributing to social inequality. In reality, organizations intent on maximizing profits might inappropriately choose to misuse mathematical models but big data and/or algorithms are not inherently predisposed to perpetuating inequality. The tool is not to blame, it’s the user of the tool.

As a supplement to this book review, here is a nice presentation by the author as one of the “Talks at Google” series where she sounds an alarm on the mathematical models that pervade modern life and threaten to rip apart our social fabric.

 

I think all data scientists should read Weapons of Math Description in order to add an important filter for judging how their work may be misused. The book recently became available in paperback on September 15, 2017 and includes a special afterward that looks at the failure of algorithms used by news outlets to accurately predict the 2016 presidential election results and the role of Facebook’s algorithms in helping Russian intelligence agencies to spread “fake news” to American voters in an attempt to sway the election (17 U.S. intelligence agencies concluded that Russia interfered in the presidential election).

Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideBIGDATA. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

 

Sign up for the free insideBIGDATA newsletter.

 

Speak Your Mind

*

Comments

  1. Interesting Review. How does the “Weapons of Math Description” book compare to say “Data Science and Predictive Analytics (https://www.springer.com/us/book/9783319723464)? It Appears as is the former is a bit more philosophical, where as the latter is a bit more practice-oriented. Any thoughts? Thanks for your comments and review.

    • I really love Spring books, my bookshelf is full of ’em! Alas, they no longer provide review copies for editors like me to write reviews. Sad, but true. — Daniel

  2. The really huge question here is “who gets to define whether an algorithm is unbiased, and based on what quality of evidence?”.

    Some critics, schooled in today’s politics, make a priori assumtions that any divergence from a socially preferred output MUST be due to unfair bias. Guilty until proven innocent.

    Since most machine learning technologies cannot explain their internals (a bunch of weights derived empirically that magically work), this provides an easy bogeyman, a black hole of information processing which a company and its data scientists couldn’t explain no matter how much they wanted to, and thus into which the motivated reasoner can project whatever they want.

    Suppose a neural network is trained to use as set of inputs (eg: entries from a form) to predict recidivism, combined with subsequent arrest records). and suppose that after training this network is more accurate than almost all trained humans in predicting recidivism. Further suppose that the algorithm is completely unaware that the people it evaluates can even be sorted by some [population group] – that concept doesn’t exist in its universe. The network has absolutely no stereotypes or even statistical knowledge about (population groups).

    But we know about [population groups], and we can track that data without showing it to the network, and then look at the results sorted by [population group].

    If some [population groups] at a statistical level show higher recidivism rates than other groups, then a network without [population group] bias *should* estimate recidivism at proportionately higher rates too; and vice versa.

    But looking through some social justice lenses, there will be people who decry this relatively unbiased assessment – just because it (accurately) predicts more recidivism among some [population groups] than others.

    Think about how you’d retrain a network to satisfy this kind of social justice outcome. First off, you’d have to introduce [population group] as an input.

    You could modify the training data – for example, pretending that 20% of recidivists from one [population group] didn’t actually get arrested, or that 10% more of some other [population group] did as compared to real data. If you tweaked (falsified) the training inputs sufficiently, you could probably get equal prediction probabilities across [population groups].

    Or you could deliberately include [population group] as an input to the network, but alter the training feedback so that it’s not learning to make the best prediction of recidivism, but of [population group] adjusted recidivism. For example, you give a smaller reinforcing boost for correctly predicting recivism for members of one [population group] than for others, and a larger negative adjustment for incorrectly predicting recidivism. The network could be trained to use the [population group] specifically to deliberately underestimate some outcomes and overestimate others, depending on [population group].

    Basically, you’d be training the network to be less connected to the real world probabilities of recidivism, in order to come up with the results you want. But if you just want to cook the books, it would be simpler to just run the network as it, then apply “social justice adjustments” to the outputs, raising the scores for some [population groups] and lowering it for others.

    Does it matter? Well, does it matter to anybody if we release those more likely to commit further crimes versus those less likely? Yes, it matters to the communities into which you release them.

    We should care about everybody, but perhaps we should care about the majority of each [population group] who are not criminals but potential criminal prey, at least as much as we do about the criminal minority from that [population group].

    One could argue that we care about all groups, we should release people (via pretrial bond or early parole) in order of least likely to re-offend – to the best of our ability to predict. If a more accurate, and less biased, method comes along, we should consider using it to, on average, improve outcomes.

    Of course, this prediction score should be only one factor; a human judge may include more subjective factors into account as well.

    However, not such tool will ever be perfect, so it will be easy to cherry pick anecdotes which were in error. “Argument by anecdote” is viscerally satisfying, but fairly meaningless since we can find anecdotes to support any conclusion we desire.

    In that light, the article mentions that one cannot appeal the output of an algorithm, even if it’s wrong. Think that out. How would it be wrong? Suppose it predicts your probability of recidivism (or credit default or whatever) at 60%, given the inputs it has. How can you prove that it’s factually wrong? For example, that your statistical likelyhood is really only 30%?

    Of course, if you do whatever was predicted, one could argue that the algorithm was wrong because the “true” probability in your case was 100%; and if you do not, the “true” probability was 0%. But that’s a fundamental misunderstanding of prediction and statistics.

    In practice, the only way one could show that the algorithm was “wrong” (or at least suboptimal) in some meaningful sense, would be to create a more accurate algorithm or analysis; of show that humans are available who can do better.

    These are the types of things we can discuss, without even knowing the internal weights of some neural network. We can control what inputs it gets, and train it for neutral outputs (did recidivate vs did not). If so, it will in general be unbiased, when compared to the real world inputs.

    And that lack of bias will cause it to be criticised by some folks who want the output to be biased away from the real world, to fit their beliefs and agenda about the real world. We need to guard against having our work “adjusted” to be less honest.

    On the positive side, there are real dangers in trusting machine learning, and it’s important that we open that discussion. But we have to discuss it rationally, without political biases injected into the evaluation process. If we as a society want to inject political bias, it should be done openly at the decision making point – like a law which automatically adds or subtracts some amount from the (unbiased) recidivism score of criminals, based on their [population group].

    The field will turn into a nightmare if we are supposed to force the models themselves to inject a deliberate bias for political purposes, and then pretend that we did not.

    Sometimes when we analyze real world data, it may accurately reflect a world which diverges from what we would personally prefer it to be – or what our models of society might assume. As scientists, or data scientists, our goal needs to be to accurately reflect the real world, and if the powers that be want to inject a “justice bias” in interpreting our results, then let them do so openly and honestly.

    • If an image algorithm contains few images of Black men, would it be a surprise that it frequently misidentifies Black men?
      As a so called Data Scientist, you of all people should know better. The lack of representative data of a population seems to be problematic in statistical analysis as long as it doesn’t reveal bias.

      YOU Sir are not just part of the “Big Data” problem, you ARE the problem!

      – Signed a Black Data Scientist unafraid to face the truth about “Big Data”.

    • Hello Dee and thank you for your comment. Yes, the misclassification of images of people of color is a prime example of how machine learning can yield bias in decision making. This is one of the examples from O’Neil’s book. Fortunately, this situation and many others is being discussed more and more these days. Solutions are coming, albeit too slowly.

  3. Steve Rinsler says

    Just found this review, triggered by a current NY Times article (30 Jul 2021) on the issues in regulating AI.

    Looking at the comments, it occurs to me that a particular set of issues are a “proper” articulation/specification of the problem of interest/concern, how you encode the question for AI to produce something relevant to that problem and how you plan to make use of that response.

    I will try to see how/if those issues are addressed in the book.

    • Stephen Rinsler says

      Addendum.
      As someone with an amateur interest in analyses, I wonder if Bayesian techniques are routinely used in AI program development and use to reveal impact of possible biased (eg, erroneous) assumptions?