Sign up for our newsletter and get the latest big data news and analysis.

How Open Source is Driving New Innovations in Data Analytics

It wasn’t long ago when open source software was on the fringe of cutting edge technology. The software then was rough, untested and insecure. No longer is this the case. From tiny startups to the largest Fortune 20 companies, open source technology is permeating every corner of the business world.

But how did this happen? Why is open source now viewed as a viable, and even advantageous, way of bringing new innovations to the market?

Successful open source projects with a vibrant global community of contributors provide confidence that all code is thoroughly tested, and that bugs will be fixed quickly.  Over time, companies’ concerns about the quality and security of code in open source software have fallen flat. As a result, open source has become considered a more mainstream and reliable option for businesses.

Many of the emerging business trends I’ve witnessed are a result of all the data we’re capturing today. Businesses are becoming more and more data driven. Open source is helping businesses navigate how to manage all this data and determine the best ways to use it. For example, the Hadoop computing infrastructure has become widely successful for enabling distributed processing of large data sets. The broad adoption of Hadoop across many businesses speaks to the impressive power and growth capability of the open source model.

Although widespread acceptance has taken time, open source is not a new idea. Even before the recent rise of open source communities, AT&T had a role in several major open contributions to the software industry. For example, Unix, the S statistical language (a precursor to today’s open source R statistical language), and the C programming language. These languages have spawned generations of programmers. Unix eventually inspired the Linux operating system, which gave companies a non-proprietary option for operating systems. More recently, contributions from web search giants have given rise to the huge popularity of Hadoop and its ecosystem.

For any company, even one with a world-class industrial lab like AT&T, there is undoubtedly more aggregate talent outside the company than inside. Leveraging that talent through engagement with open source projects makes smart business sense. Engaging in open source allows us to collaborate with other great minds to innovate to our fullest potential.

Like many others, we are users of some of the most successful open source tools. At the same time, we firmly believe that open source is a two-way street. Being part of the open source ecosystem is so much more than just downloading and using code. It requires contributing back to the open source community and leading and developing projects.

In one case, we released software for policy management into an Apache Incubator with XACML 3.0. The technology provides access control for certain systems and information, particularly for network and IT system managers. In another, we are collaborating with startup, Cask, to bring near real-time data stream processing software into open source and into the market through Tigon. It’s built on top of Apache Hadoop and Apache HBase, and it lets users stream big data analytics to address business use cases.

In addition to Tigon, we’ve released other open source technologies on Github. Nanocubes provides interactive visualizations of massively large datasets, through heatmaps, bar charts and histograms. It’s an award-winning software, and all you need is a web browser to see the results.

The spirit of collaboration is alive in RCloud. It’s an open source platform that allows data scientists to share code and collaborate effectively. Drawing from the principles of social networking, RCloud uses “notebooks” stored in the cloud so others can easily repurpose code or tweak it to achieve desired outcomes.

Projects like our XACML 3.0 policy engine, Nanocubes and RCloud help us contribute to the community and innovate in a way that’s not possible by keeping the technology locked up internally. The technologies are available for others looking to analyze and use their data in new and smarter ways.

Open source also exposes us to potential new business opportunities. We may learn about novel technology applications through the exposure to those in other industries and worldviews.

It’s important to note we aren’t doing this alone. We have collaborated on these projects with colleagues in academia and industry. Through contributions with our collaborators, we’re making it easier for others to become part of an ecosystem that contributes new features and enhancements. We’re looking for fellow travelers who are similarly aligned to drive the industry forward in this direction.

Chris VolinskyContributed by: Chris Volinsky, Assistant Vice President of Big Data Research at AT&T Labs. Chris leads the Big Data research group, a team of data scientists using big data to help solve some of AT&T’s toughest problems. Since joining AT&T in 1997, Chris brings particular expertise in large scale data analytics, statistics, recommender systems, social networks, and statistical computation.


Sign up for the free insideBIGDATA newsletter.


Leave a Comment


Resource Links: