Want to Get More Out of Hadoop? Here Are 5 Ways

Print Friendly, PDF & Email

Ashley Stirrup_TalendIn this special guest feature, Ashley Stirrup, CMO at Talend, provides a useful list of five ways to get more out of Hadoop as organizations increasingly look to speed time to market, anticipate and respond to customers’ needs, and introduce new products and services. Ashley Stirrup joined Talend in 2014 as Chief Marketing Officer. In this role, Ashley is responsible for driving market leadership, global awareness, product management and demand generation. Prior to Talend, Ashley held a number of senior leadership positions in marketing and products at leading cloud and software companies, including ServiceSource, Taleo, Citrix and Siebel Systems.

For every business, time is of the essence – a minute can mean millions in some cases. As customers have come to expect businesses to keep up with their every move, operating with data that’s a week, a day or even hours old can be fatal. Hadoop, the big data processing tool, can be businesses’ best friend when it comes to achieving data-based insight in real-time.

As organizations increasingly look to speed time to market, anticipate and respond to customers’ needs, and introduce new products and services, they need to have peace of mind in knowing that their decisions are based on information that’s fresh and true. For this reason, growing numbers of developers are looking at ways to optimize Hadoop to increase both business insight and competitive advantages

For business developers looking to sharpen their use of the Hadoop framework, here are 5 tips to get you started:

Be in the moment

It’s one thing to be able to do things in bulk and batch. It’s another thing entirely to be able to do them in real-time. Staying ahead of the pace of business is not about understanding what your customers did on your website yesterday. It’s about knowing what they are doing right now – and being able to influence those customers’ experiences immediately – before they leave your site.

One of the best things about Spark – and Spark streaming – is that it gives developers one tool set that allows you to operate in bulk, batch and in real-time. With data integration tools, you can design integration flows across all of these systems with one tool set, so you can pull in data from historical data sources alongside real-time streaming data from websites, mobile devices and sensors.

Bulk-and-batch information may be stored in Hadoop, while real-time information can be stored in NoSQL databases. Regardless of the data source, you can use a single query interface with Spark SQL from mobile, analytic and web apps to search across all data sources for the right information.

Get faster

Overall, combining Hadoop with traditional bulk-and-batch data integration jobs can dramatically improve performance. Simply moving data integration jobs built with MapReduce to Apache Spark will enable you to complete those jobs two and a half times faster. Once you convert the jobs, adding Spark-specific components for caching and positioning can increase performance an additional five times. From there, increasing the amount of RAM on your hardware will allow you to do more things in-memory and actually experience a ten-fold improvement in productivity.

Get smart

So now you can process data in real-time. But are you processing intelligently?

Spark utilizes machine-learning that improves the IQ of your query by, for example, allowing you to personalize web content for each shopper. This alone can significantly increase the number of page views. Spark’s machine learning capabilities also enable you to deliver targeted offers, which can help increase conversion rates. So, while creating a better customer experience, you are also driving more revenue – a definite win-win.

For example, it’s possible to use Spark to predict which online customers may abandon their shopping carts – and then present them with incentive offers before they leave the site altogether. You don’t have to be a large retailer to benefit. These simple design tools make it possible for companies of any size to do real-time analytics and deliver an enhanced customer experience.

Still Hand Coding? Cut it out

Everything mentioned in the above tips can be programmed in Spark, Java or Scala. But there’s a better way. Using a visual design interface can increase development productivity ten times or more.

When designing jobs, using a visual UI makes it significantly easier to share work with colleagues. People can view an integration job and understand what it is doing – making collaboration straightforward and the ability to re-use development work simple.

Get ahead of the game

You can start right away by using a big data sandbox: a virtual machine with Spark pre-loaded and a real-time streaming use case. If you need it, there’s a simple guide that walks you through a step-by-step process, making it easy to hit the ground running.


Sign up for the free insideBIGDATA newsletter.



Speak Your Mind