Sign up for our newsletter and get the latest big data news and analysis.

The Power of Data Is Stuck In Silos

In this special guest feature, Geoff Tudor, VP and GM of Cloud Services at Panzura, discusses how data is rapidly spreading across the enterprise, and data islands are forming and living in silos among separate infrastructures and sites. This has caused a major roadblock to business leaders who want to quickly find and mobilize their data, and are left emptying their pockets and drowning in funds to find a solution. Geoff has over 22 years of experience in storage, broadband, and networking. He holds an MBA from The University of Texas at Austin, a BA from Tulane University, and is a patent-holder in satellite communications.

The average enterprise is doubling its data volume every two years, driven largely by machine-generated data from 3D images, 4K video and the Internet of Things.

To solve the challenges posed by this data explosion, most companies are looking to leverage cloud infrastructure to increase corporate agility, reduce cost and drive higher productivity. In fact, the RightScale State of the Cloud Report states that 82 percent of corporations are planning to use multiple clouds, with an average of 4.8 clouds used per enterprise. IDC also estimates that by 2021, $530 billion will be spent on cloud technologies globally, with 90 percent of that spending being on multi-cloud.

Yet along with this multi-cloud future comes a dark side to data. Data is being spread rapidly across the enterprise and across organizations from DevOps, HR and IoT to SaaS applications like Office 365 and Dropbox. As a result, data islands are forming and living in separate silos on separate infrastructure and sites. This has created governance and compliance headaches, making the unified search for data across the enterprise practically impossible.

So, how valuable are your cloud efforts if decision makers cannot quickly find and mobilize trusted data crucial to big data analytics, security, compliance and other key functions? When can we get to a “single source of truth” for big data that allows business and technology leaders to confidently analyze and control key data across all clouds?

The first step in this process depends on accurate data indexing and search. There are billions of files in an average petabyte of enterprise storage, making manual data management solutions largely unfeasible. This has resulted in companies having to empty their pockets and drown in funds in order to find a solution, with the average company spending $2.5 million a year on data search. This inability to instantly locate, search, index, access and mobilize the most current and accurate data greatly hinders executive insight and action.

Open-Source Enterprise Search

Many vendors have sprung-up over the years to address this need, each with different architectures and solutions. Yet amongst all the noise, Elasticsearch provides enterprises with a tangible solution through an open, modern and scalable architecture for building enterprise search capabilities.

Based on the powerful search engine apache Lucene, Elasticsearch has grown to become the most popular open enterprise search engine, with over 350 million downloads to date. It is multi-tenant and distributed, which allows it to scale-out horizontally and handle billions of records across multiple indices. It is schema-flexible, meaning you can add new fields to a record dynamically without having the rigidity of a traditional structured query language tables. Search is near-instantaneous with records being available within about one second of being indexed.

Elasticsearch ushers in a new era of enterprise search capabilities. So much so that a new enterprise search stack, termed “ELK” is emerging. The solution is changing the game by providing a common framework for indexing an enterprise data set into a single, searchable view. By indexing file shares and data sets, metadata is easily ingested and stored. This metadata is a small fraction of the actual size of the data itself and a repository can be built economically.

With this solution, end users can also implement a faceted search in their data analysis. Faceted search is akin to the kind of search one does when browsing a travel website, in which they refine their search based on multiple criteria. For an enterprise, faceted search allows for users to search all files containing a phrase or word authorized by a specific user.

Ultimately, there are a number of open-source plug-ins for various enterprise data sources that are freely available. In addition to downloading the software and running it internally, there are several ELK stack managed service providers that can provide an accelerated time to value. However you decide to deploy, the possibilities for leveraging the ELK stack in your big data initiatives are boundless.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: