This week’s visualization holds a special place in my heart because its data source is the arXiv pre-print server that I used so extensively in my previous life in astrophysics. arXiv is an open website sponsored by the Cornell University Library where scientists can publish their research and make it immediately available for the whole world to read. It has been around since 1991, but in that time it has amassed a staggering 865,000 papers in physics, mathematics, computer science, statistics, quantitative biology and finance. As part of the computer science category, there is a section on machine learning that I use now for my work in data science. Each weekday around 300 new research papers are added to the repository.
Scientists live in a very fast-paced world where hundreds of new papers – each representing months or years of hard work – flash by each day. With such a large volume of information, it would be nice to have some way of organizing it, making it possible to discover papers you might have missed, and also allowing someone new to your field to know where to start.
Paperscape is a visualization of the scientific papers housed in the arXiv repository using an algorithm that simulates the formation of galaxies, replacing stars with scientific papers. The papers are represented as circles, colored according to their subject area and sized according to the number of other papers that cite them. Clicking on a circle displays that paper’s reference, with its links on arXiv, including a pdf of the paper. The layout of the visualization is controlled by physics: each paper is repelled from every other using a simulated inverse-gravitational force and attracted toward all its references by a simulated spring. Users can also see the relative age of each paper and get a sense of the most active areas of research that are housed on the repository.