Sign up for our newsletter and get the latest big data news and analysis.

File Storage Faces a Perfect Storm, But There’s a Way Out

In this special guest feature, Peter Godman, Co-founder and Chief Technology Officer at universal-scale file storage company Qumulo, discusses the situation where a combination of conditions have come together to form a perfect storm of storage disruption that existing large-scale file systems will find hard to weather. Peter uses his expertise in distributed file systems and high-performance distributed systems to guide product development and management at Qumulo. He is the inventor of more than 20 granted patents in the areas of file system design, distributed systems, and shared-memory concurrency. As Qumulo’s founding CEO, Peter led the company through fundraising rounds totaling $100M, the delivery of Qumulo Core, and the acquisition of Qumulo’s first hundred customers.

This year marked the 20th anniversary of the publication of Sebastian Junger’s “The Perfect Storm,” the tale of a commercial fishing vessel that was lost at sea after a confluence of problems. Likewise, a combination of conditions have come together to form a perfect storm of storage disruption that existing large-scale file systems will find hard to weather. Consider:

  • IDC predicts that the amount of data created will reach 40 zettabytes by 2020 (a zettabyte is a billion terabytes), and we’ll be awash in more than 163 zettabytes by 2025. That’s 10 times the volume generated in 2016. About 90 percent of this growth will be for file and object storage. This monumental increase will leave companies wondering how they will be able to manage the ever-increasing scale of digital assets.
  • The data explosion is just getting started. Machine-generated data, virtually all of which is file based, is one of the primary contributors to the accelerating data growth. Another factor is the trend toward higher resolution digital assets. Uncompressed 4K video is the new standard in media and entertainment, and the resolution of digital sensors and scientific equipment is constantly increasing. Higher resolution causes file sizes to grow more than linearly. For example, a doubling of the resolution of a digital photograph increases its size by four times. As the world demands more fidelity from digital assets, its storage requirements grow.
  • At the same time, huge advances have occurred over the past decade in data analysis and machine learning. These advances have suddenly made data more valuable over time rather than less. Scrambling to adapt to the new landscape of possibilities, businesses are forced into a “better to keep it than miss it later” philosophy – i.e. companies have become data hoarders.

Against this backdrop of unprecedented data growth, add the fact that ever-growing data footprints and the development of sophisticated analytical tools were paralleled by the advent of the public cloud. The cloud has overturned many basic assumptions about how storage should work. The cloud means that elastic compute resources and global reach are now achievable without building data centers across the world. Businesses realize that they no longer must run their workloads out of single, self-managed data centers. Instead, they are moving to multiple data centers, with one or more in the public cloud.

As a result of this vastly altered data landscape, new requirements for file-based storage are emerging. What’s needed now is a system that has no upper limit on the number of files it can manage no matter their size, and it can run anywhere, whether on-premises or in the cloud.

Without such a system, users of large-scale file storage will continue to struggle to understand what is going on inside their systems and cope with massive amounts of data. They will be hard-pressed to meet the demands for global reach, with few good options for file-based data that spans the data center and the public cloud.

Traditionally, companies faced two problems in deploying file-based storage systems — they needed to scale both capacity and performance. In the world of big data, scale is no longer limited to these two axes. New criteria for scale have emerged. They include number of files stored, the ability to control enormous data footprints in real time, global distribution of data, and the flexibility to leverage elastic compute in the public cloud in a way that spans the data center as well.

So what should a modern enterprise storage system look like? Here’s are some ideas:

The ability to scale to billions of files. The notion that capacity is measured only in terms of bytes of raw storage is giving way to a broader understanding that capacity is just as often defined by the number of digital assets that can be stored. Modern file-based workflows include a mix of large and small files, especially if they involve any amount of machine-generated data. As legacy file systems reach the limits in the number of digital assets they can effectively store, buyers can no longer assume that they will have adequate file capacity.

The ability to scale across operating environments, including public cloud. Proprietary hardware is increasingly a dead end for users of large-scale file storage. Today’s businesses need flexibility and choice. They want to store files in data centers and in the public cloud, opting for one or the other based on business decisions only and not technical limitations of their storage platform.

The ability to scale across geographic locations with data mobility. Businesses are increasingly global. Their file-based storage systems must now scale across geographic locations. This may involve multiple data centers, and almost certainly the public cloud. A piecemeal approach and a label that says “Cloud Ready” won’t work. True mobility and geographic reach are now required.

Real-time visibility and control. As collections of digital assets
have grown to billion-file scale, the ability to control storage resources in real time has become an urgent requirement. Storage administrators must be able to monitor all aspects of performance and capacity, regardless of the size of the storage system.

Access to rapid innovation. Modern file storage needs a simple, elegant design and advanced engineering. Companies that develop universal-scale file storage will use Agile development processes that emphasize rapid release cycles and continual access to innovation. Three-year update cycles, a result of cumbersome “waterfall” development processes, are a relic of the past that customers can no longer tolerate.

Elastic consumption of file storage. As the needs of lines of business surpass what central IT can provide in a reasonable time frame, access to elastic compute resources has become a requirement. A flexible, on-demand usage model is a hallmark of the public cloud. However, the shift to cloud has stranded users of large-scale file storage, who have no effective way to harness the power the cloud offers.

The engineers who designed scale-up and scale-out file systems two decades ago — around the time “The Perfect Storm” was published — never anticipated the number of files and directories, and mixed file sizes, that characterize modern workloads. They could also not foresee cloud computing.

Thus, organizations now find themselves in a scalability crisis as the rapid growth of their file-based data footprint exceeds the fundamental design assumptions of existing storage appliances.

They struggle with products that are difficult to install, hard to maintain, and are inefficient and expensive. Not only that but these products offer no visibility into the data. Getting information about how the system is being used is clumsy and slow. It can take so long to get the information that it is outdated even before the administrator sees it.

Legacy storage appliance vendors have been pivoting to the cloud, but their offerings have limited capacity and no scalable performance. That inflexibility negates the opportunities for elastic resources that are the very reason people are turning to the cloud. Also, none of the solutions provide visibility and control of the data footprint in the cloud, which leads to over-provisioning of capacity, performance or both.

In a modern file storage system, unparalleled reliability, scale and performance are table stakes. A great system goes beyond that and gives companies the global access and data insight they need. It moves data where it’s needed, when it’s needed and at massive scale, and it does all of this with lower cost, higher performance, more reliability and greater ease of use.

It’s vital that companies get off the sinking ship of legacy file systems and adopt this new approach.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: