Above the Trend Line: machine learning industry rumor central, is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide our readers a one-stop source of late-breaking news to help keep you abreast of this fast-paced ecosystem. We’re working hard on your behalf with our extensive vendor network to give you all the latest happenings. Heard of something yourself? Tell us! Just e-mail me at: daniel
In some exciting streaming news, LinkedIn, the original developer of Apache Kafka, remains both involved in its development and its largest user. However, while it has become the standard messaging system for large-scale, streaming data, it still poses several problems for the Kafka operators (SREs): reported metrics (e.g. bytes-in rate, offline-partition-count and under-replicated-partition-count) can be unreliable or inaccurate, which can be time consuming for the SRE to investigate, and occasional bugs, which don’t manifest until Kafka has been deployed in a real cluster for days or even weeks. That’s why LinkedIn built the Kafka Monitor, a framework for monitoring and testing Kafka deployments in real clusters. It reports critical health metrics and runs validation tests to capture bugs or regressions before they make their way into a deployed cluster. LinkedIn has made Kafka Monitor available on Github to help other companies who want to validate and monitor their own Kafka deployment … More vendors continue to team up with the rise of data science — Continuum Analytics, the creator and driving force behind Anaconda, a leading open data science platform powered by Python, welcomes Intel into the Anaconda ecosystem. Intel has adopted the Anaconda packaging and distribution and is working with Continuum to provide interoperability. By offering Anaconda as the foundational high-performance Python distribution, Intel is empowering enterprises to more quickly build open analytics applications that drive immediate business value. Organizations can now combine the power of the Intel® Math Kernel Library (MKL) and Anaconda’s Python-based data science to build the high performance analytic modeling and visualization applications required to compete in today’s data-driven economies … As the next batch of data scientists complete their master’s and Ph.D. programs and prepare to enter the workforce, they don’t have many to turn to for advice on this next stage. The data science skills gap continues to grow as the demand for data scientists is more than quadruple the amount of those able to fill the roles, and as a relatively nascent field, the job of a data scientist is always changing. Ashish Thusoo, CEO and co-founder of Qubole, has some thoughts on the workforce these recent grads will enter, and the opportunities in front of them. As the former head of Facebook’s data team from 2007-2011 and Apache Hive author, Ashish has seen how the industry has evolved over the past decade and how the requirements for the job of data scientist has changed. He had this to say:
Today’s graduates are the first data-native employees entering the workforce, but the trickle of graduates isn’t anywhere close to the torrent of demand. These graduates will have the opportunity to reshape how access to data and insights are managed in enterprises. Data scientists will be expected to not only be able to analyze big data, but make it understandable and actionable by different departments within the organization. Currently, data science teams work in isolation from the rest of their enterprise. The new batch of data scientists will be much more integrated into their enterprises’ workflows, and will need to be able to think creatively, integrate data into business strategy, and communicate their findings accurately to non-technical users. As data becomes more ubiquitous in every job role, it will be the data science team’s job to provide data in a way that is visually understandable and workable by non-analysts.”
As the marriage of data analytics and mobile technologies continues, we just heard that Tableau Software (NYSE: DATA) unveiled that it has launched Tableau Mobile for Android phones. This is another step in making data a seamless part of life for every business user. Tableau has offered fast, native mobile apps for Apple’s iPhone and iPad and Android tablets, and is now available for people around the world who use phones with Google’s mobile operating system. Tableau Mobile is the fastest way and most delightful way to stay on top of data from anywhere. People can go from questions to answers in just a few taps. Fluid navigation lets Tableau customers select, filter and drill down into their data with controls that are automatically optimized for touch. Tableau Mobile for Android phones requires a Tableau Online or Tableau Server account. A free trial of either can be downloaded www.tableau.com/trial. The Android app can be downloaded from the Google Play, Google’s official store and portal for Android apps … Our friend Marius Moscovici, Founder & CEO of Metric Insights offers his insights into the open source movements:
The collective unlocking of coding doorways opens opportunities to take advantage of data and compute capacity in new ways. This has birthed a rash of startups, and also forces enterprises to change their success strategies, whether they feel ready to or not. Open-source tools are cheaper, faster and more powerful, and they let companies apply data in unheard-of ways. The new batch of data-driven services will force mass production into a niche. Five years from now, customers will order and receive products and services anywhere. Company workers will change their activities to align with the company data feed. On-demand services like Amazon and Netflix have changed consumer behavior to the point that users expect high levels of service from everyone — and open source is the secret sauce behind getting to know consumers. It’s easy to build a startup around these new realities, but steering an entire corporation on a new heading is a different matter. Smart companies, however, are already adapting.”
On the funding scene, money continues to saturate the big data infrastructure industry with Qumulo, a leader in data-aware scale-out NAS, announcing that it has closed $32.5 million in a Series C funding round. The oversubscribed funding round included participation from new investors Allen & Company, Top Tier Capital Partners, and Tyche Partners, and existing venture investors Kleiner Perkins Caufield & Byers (KPCB), Madrona Venture Group, Highland Capital Partners, and Valhalla Partners. To date, Qumulo has raised $100 million. Qumulo will use the proceeds to extend its market presence throughout North America and Europe, and to invest in widening Qumulo Core’s leadership advantage in scale-out data-aware NAS … Keeping a pulse on various market segments, we found that according to a new report published by Allied Market Research titled, “World Internet of Things (IoT) Healthcare Market – Opportunities and Forecasts, 2014-2021,” the world internet of things (IoT) healthcare market is expected to reach $136.8 billion by 2021, registering a CAGR of 12.5% between 2015 and 2021. Services and system & software segments collectively occupies a dominant share in the world IoT healthcare market and is expected to drive the growth over the forecast period. Patient monitoring application segment is expected to maintain its lead position with $72.7 billion by 2021. The world IoT in healthcare market is anticipated to grow at a significant pace, owing to easy availability of wearable smart devices, increasing need for stringent regulations and decreasing cost of sensor technology. Furthermore, the launch of technologically advanced devices (smart shirts, smart lenses, smart bands and others) & analytics software, rising incidence rates of chronic diseases, surging demand for cost-effective treatment & disease management, better accessibility of high speed internet and implementation of favorable government regulatory policies, are also expected to fuel the growth of this market. Improvement in healthcare infrastructure in developing economies, increase in government support, high R&D investments by major players for developing better IoT infrastructure are expected to offer potential growth opportunities to the market. However, factors such as high costs associated with IoT infrastructure development, data privacy and security concerns, lack of awareness in developing economies and limited technical expertise are projected to restrain the market growth.
Sign up for the free insideBIGDATA newsletter.