Sign up for our newsletter and get the latest big data news and analysis.

Python Madness

The Final Four championship cast North Carolina against Kansas in the men’s tournament and UConn against South Carolina in the women’s. Both games showcase bona fide heavyweights, but the context is the real story: “a changing of the guard” in the women’s league and “the potential for late bloomers” in the men’s. 

The Python MVP March Madness 2022 tournament also has arrived at its championship game, with echoes of the real NCAA. Two weeks ago, the tournament began with 32 Python packages matched up in a head-to-head, lose-or-go-home tournament play. The community voted, round-by-round, and delivered NumPy and pandas to the final game.

The Popularity of Python in Data Science

Post-game pontificating is human nature and is always part of the fun. So, what do these MVP- tournament results tell us about the Python community?

  1. Data science has captured the heart of Python. Beautifulsoup won a nail-biter in the first round over scikit-learn, but in all other cases, the data-science-leaning projects outperformed expectations. When matplotlib beat the popular web framework flask, the message was heard loud and clear.
  1. Python’s brand now values excitement over utility. That’s a big change! Thirty years ago, Python was developed as the successor to the ABC programming language, arguably to be used as an alternative to awk and Perl scripts. As a result of its embrace of extendability, however, Python has evolved towards front-of-the-house business value and cutting-edge use cases. With pandas scoring an upset over the requests package at the same time as NumPy beat pip, it’s clear that Python wants the spotlight, not the switchboard.

NumPy versus pandas

The MVP tournament’s final game casts two early favorites against one another. Like the NCAA tourney, these are modern Python’s “blue-bloods”, so it’s anyone’s guess who will win. 

Experts may cast pandas as a slight favorite, but like all recent elections, turnout will likely matter. The bookies may simply be reflecting the attitude expressed by the Python community in the tournament thus far. Pandas is the younger of the two packages, having started in 2008 and hitting its 1.0 release in 2020. NumPy, by contrast, started in 1995, with its 1.0 release in 2006. The Python brand is increasingly aligned with themes of innovation, so youth may prove advantageous.

Further, pandas may be seen as higher-up-the-stack than NumPy. The spirit that provided NumPy its wins over pip and pytest may cost it a championship ring. In a direct matchup, NumPy’s pure array operations and linear algebra may fail to capture the imaginative spirit of the current Python community. Though a reliable and vital soldier, NumPy may simply not be exciting enough to hold the title of Python MVP 2022! 

Pandas is a higher-level data manipulation layer. Its popular DataFrame and Series are constant company for the data scientist, seemingly the provider of answers and the introducing agent to new and powerful friends. If you hang out with pandas all day, it’s hard not to throw your vote that way.

That said, one can’t rule out the impact of personality on the voting in the tournament’s final game. NumPy is personified by its most notable founder Travis Oliphant (@teoliphant), and pandas by Wes McKinney (@wesm). And, of course, it may be that the community uses this championship bout to weigh in on other projects in these players’ portfolios, surreptitiously expressing views on Mr. Oliphant’s Anaconda and Numba and Mr. McKinney’s Arrow

It’s a championship game worthy of the name. Cast your vote for Python MVP here!

About the Author

Pete Goddard, CEO of Deephaven Data Labs

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: