Increasing Analytic Project Success

Print Friendly, PDF & Email

In this special guest feature, Erik Ottem, Director of Product Marketing, Data Center Systems at Western Digital, discusses how to build a better data lake, equipped with elements such as scaling measures, object storage adapters and ultimately enough performance to handle large analytic workloads. Erik is responsible for go-to-market execution, collateral development, product messaging and positioning, sales and channel training, and press and analyst briefings. He has over twenty-five years of experience in high-tech storage sales and marketing that cover systems, semiconductors, devices and software for such companies as IBM, Seagate, Agilent, Gadzoox Networks, Violin Memory and Western Digital. Erik earned his Bachelors of Science degree in Plant Science from the University of California, Davis and his Masters of Business Administration degree from Washington University in St. Louis.

The digital transformation is remaking business models, companies and entire industries. Look at the impact that online search, news, ads, auctions, and markets have had on traditional business. Digital transformation has resulted in a more agile and responsive environment, and it is all built on data.  Lots and lots of data. Data from all kinds of different sources. Sensing the changing world about them, executives from all industries are looking to add analytics to their products, services and operations and are launching new analytic initiatives to stay competitive.

Often times, the only thing slowing them down is alignment with business needs. Developing a technical capability is fine, but analytics projects need to be designed to solve a business problem. A poor connection between business needs and analytic projects is why most projects don’t go into production. To increase the odds of success, make sure your project is solving a business problem.

It’s also widely understood in the data science world that more data often improves analytic outcomes.  There is a real incentive to collect and use as much data as possible to optimize your project. All this data needs to reside somewhere.  You have heard about data lakes, a collection of data from different sources to provide a broader base for analytics to improve decision making. Data lakes can be a powerful tool, but for best results they should be used thoughtfully.

Your analytic infrastructure can be your friend, or your enemy. Too much infrastructure too soon may burden your project with a lot of unnecessary costs. This can increase project risk. From a storage perspective, you may not want to start with all the data you will ultimately want, and the trade-off among price, performance and scale needs to be weighed carefully. Traditional architectures with three copies of data tightly coupled to the compute resources is expensive. If you’re using file storage for your project, performance may degrade under large amounts of data, creating another challenge.  There might be significant concerns about the data curation and tool sets before you’re ready to actually start running jobs on all this data you’ve collected, scrubbed, mounted and protected.  With a traditional approach you’ve already spent a lot of money before you’ve run any analytic jobs.

There is a better way.

Start with a smaller result in mind. Specifically, identify with as much precision possible the desired business outcome, and the tools and data that might be required to solve one important piece of that puzzle. Deconstruct the business result so you can identify a sub-project that is more easily managed, perhaps with fewer tools, less infrastructure and data. The objective is to get an early small win, and use that success to expand with more resources and data to achieve the overall business goal. This approach can also provide the business leaders with the kind of feedback they need so you can beat the dismal project success rate scenario that is common today. It will help your credibility and improve the odds of a successful project.

Building the right type of Data Lake to begin with can help ensure success down the road. Manage risk by building your initial analytics platform with a narrow objective using a Data Science lab.  The nice thing about a Data Science lab is it doesn’t need- or want- curated data so you can save some time and effort while the data scientists do the early work to define data structure and algorithms. If the business results from this early effort look good, it is time to move it to a larger platform, institute governance and move to a larger scale data storage approach with object storage.

As your project starts to show results, you will probably be looking for new data sources and tools to expand your project and your success. Object storage is designed for large scale unstructured data and is a great fit for Data Lakes. Object storage is built for high data durability and scalability which makes it great for data lakes.

Data Science isn’t just science. Don’t fall into the trap of getting too big too soon. Take an incremental approach to build up the data models and infrastructure you increase your odds of success. You can leverage your experience on the early efforts to move to production with a proven approach based on business needs and an incremental approach to reduce risks and cost while boosting you in the eyes of the business.  It can be a great way to keep the funding coming, and move both the business and science forward.

 

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*