Above the Trend Line: machine learning industry rumor central, is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide our readers a one-stop source of late-breaking news to help keep you abreast of this fast-paced ecosystem. We’re working hard on your behalf with our extensive vendor network to give you all the latest happenings. Heard of something yourself? Tell us! Just e-mail me at: daniel
The frenetic pace of our industry continues, keeping us here at insideBIGDATA with our ears on alert for all types of scuttlebutt. Let’s start with new products, services and solutions … MapR Technologies, Inc., the provider of the converged data platform, announced the immediate availability of a persistent storage for containers that offers complete state access to files, database tables, and message streams from any location. The MapR Converged Data Platform for Docker includes the MapR Persistent Client Container (PACC) that makes it easy for stateful applications and microservices to access data for application agility and faster time-to-value … Unravel Data announced that its full-stack performance intelligence platform for optimizing Big Data operations (DataOps) has been certified with the MapR Converged Data Platform. The MapR Converged Data Platform is an enterprise-grade software solution that unifies big data and open source technologies with fast, native access to global event streaming, real-time database capabilities, and web-scale storage. For data teams working in a MapR environment, both on-premise and cloud, Unravel demystifies performance issues on a stack that typically hosts a lot of app engines such as MapReduce, Hive, Spark, Oozie and others. Unravel has teamed with MapR to enable data teams to more effortlessly run applications on top of the MapR Converged Data Platform. MapR provides the stack, while Unravel ensures that all of the processes and apps on the stack run optimally, and that budding issues are detected and resolved before they proliferate … Glassbox, a digital transformer helping businesses optimize the entire digital lifecycle, announced the launch of its Automatic Insights feature that offers automated anomaly detection and real-time analytics. The feature will build on Glassbox’s automatic tag-less recording capabilities to provide an end-to-end enterprise-wide digital transformation solution. Automatic Insights will proactively alert IT teams and marketing executives when there is a problem with online conversion rates, so that they can identify the root cause. The capability shows why users are dropping off a web page, share sample session replays and provide insights for enterprises to better optimize their customer journeys.
We also learned of a number of new partnerships, alignments and collaborations starting with … Semarchy, the Evolutionary MDM™ firm, and Denodo, a leader in data virtualization software, announcing a strategic partnership to collaborate in marketing, partner and market development, as well as in sales and research and development. The combination of Master Data Management (MDM) and data virtualization capabilities enables Chief Data Officers, data stewards, and other information management professionals to enrich trusted master data records with related contextual information such as transactions and social interactions. By adding data virtualization from Denodo, organizations leveraging Semarchy MDM can gain a more complete view of customers, products, and other master data entities. Integrating additional information that may reside across the cloud and on-premises solutions can enable these organizations to significantly improve business systems, such as reporting, compliance, and call center operations … Tenjin, the mobile marketing infrastructure company, announced partnerships with Looker, the company that is powering data-driven businesses, and Chartio, the cloud-based data exploration solution, to enable app marketers to access, analyze and act on their data more effectively and efficiently. By combining Tenjin’s data warehousing infrastructure, DataVault, with data visualization and exploration tools from Looker and Chartio, app marketers can now run a wide variety of queries on app data from across the entire user lifecycle and generate reports in a number of flexible, intuitive and easily accessible formats. The combined offerings provide app developers with powerful end-to-end Business Intelligence solutions that can be used to acquire more profitable new users and optimize their ad spend …
In M&A news, we learned that Informatica, the provider of data management solutions, announced it has acquired Diaku., Ltd, a London-based leader in data governance. The company also introduced Informatica Axon. The industry’s first fully integrated, 100 percent enterprise data governance solution, Informatica Axon is designed to engage all constituencies, technical and business, to effectively govern an organization’s data. Informatica Axon enables enterprise data governance programs across a wide array of industries including highly regulated industries such as financial services, healthcare, life sciences, insurance, and others. Diaku Ltd., is a leader in data governance stewardship applications that empower non-technical business users to overcome complex data governance challenges. Diaku’s flagship application, Diaku Axon, now called Informatica Axon, integrates seamlessly with industry-leading Informatica data management solutions for data quality, master data management, big data and cloud to form the only complete, unified data governance offering, for any market and any size enterprise.
In the special designations category, we learned that for the fifth consecutive year, Tableau was named a Leader in Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms (2017). As one of the top Leaders in this year’s report and the highest in execution, Gartner touts Tableau as the modern BI market leader, as well as the “gold standard for intuitive interactive exploration.” The company was also praised for its focus on customer experience and success, as well as the software’s overall ease of use … A leader in the data for good movement, DataKind has been named one of Fast Company’s Top 10 Most Innovative Nonprofits for 2017 – recognized for its work in using data science to help inform and advance the missions of social change organizations around the world. Part of Fast Company’s annual ranking of the World’s Most Innovative Companies, the list honors leading enterprises and rising newcomers that exemplify the best in nimble business and impactful innovation. To produce the 2017 list, Fast Company reporters surveyed thousands of enterprises across the globe to identify the most notable innovations of the year and trace the impact of those initiatives on business, industry, and the larger culture … Dataiku, the maker of the enterprise software platform for data teams, Dataiku Data Science Studio (DSS), makes its debut into the Gartner Magic Quadrant for Data Science Platforms as a visionary. Gartner has positioned Dataiku as the furthest vendor in their ‘Completeness of Vision’ in the visionary quadrant. Dataiku offers an innovative approach toward data team collaboration and their vision for how organizations can most effectively deliver value from data science. Dataiku empowers all of the members of a data team, from beginner business analysts to advanced data scientists, to collaborate and build data science solutions in environments that allow them to work most effectively. The ease-of-use of their product for companies across a diverse range of industries is also an important feature that Dataiku emphasizes.
In new customer wins news we’ll start with Clinical NLP provider Linguamatics, and Varian Medical Systems, announcing that Varian will utilize Linguamatics’ natural language processing (NLP) technology as part of the data analytics within Varian’s 360 Oncology care management platform. Varian 360 Oncology care management is a software solution designed to meet the full spectrum of needs in oncology care management for hospitals and cancer centres at the oncology department level. It is capable of tracking physician and cancer specialist referrals, integrating evidence, outcomes data, guidelines and care pathways, coordinating data from multiple sites and settings including patients and external caregivers. Varian will utilize the Linguamatics Health platform, powered by Linguamatics I2E text mining technology, to extract unstructured concepts from within pathology reports and convert them to discrete data elements for analytics reporting within Varian 360 Oncology … The Qt Company (NASDAQ: QTCOM) announced that it will integrate NVIDIA DRIVE™ Design Studio, a state-of-the-art 3D HMI authoring system, into the Qt ecosystem. With the use of 3D technologies increasing significantly across all industries – especially in the automotive, healthcare and industrial automation sectors – innovative 3D design tools have become highly sought after by organizations and independent developers alike. By combining Qt’s current cross-platform framework of software and device development tools with NVIDIA DRIVE Design Studio, Qt is able to provide a world-class 3D design solution for the creation of embedded devices and in-vehicle infotainment (IVI) systems and digital cockpits.
We learned of new vendor financial results … GoodData, whose platform provides actionable insights at the point of work throughout the enterprise and ecosystem to drive better business outcomes, announcing its cloud data lake grew by 280 percent as a result of increased adoption and demand from its global customers. GoodData manages the data integration, enrichment, analytics, and predictive functions all in one end-to-end platform. According to Gartner’s report, Predicts 2017: Cloud Computing Enters Its Second Decade, “by 2021, more than half of global enterprises already using cloud today will adopt an all-in cloud strategy. Organizations are leaving behind the cloud experimentation stage and are looking for strategic relationships with cloud technology providers. Seeking strategic partnerships, large-enterprise customers will look for the breadth of a cloud service provider’s (CSP’s) vision and execution.” GoodData believes its momentum is proof of the demand for comprehensive end-to-end cloud platforms from today’s enterprise. The company’s 280 percent growth in data lake size signifies an increase in commitment and trust from its customers as a cloud analytics platform that enables mission critical applications in real-time.
And finally, our friends over at Alpine Data gave us a short summary of their experience at the recent Spark Summit East 2017 in Boston — the company presented details about technology they have developed for auto-tuning Spark jobs. Spark can deliver amazing performance allowing data scientists to apply complex machine learning algorithms on large data sets and quickly deliver actionable insights. However, Spark is extremely sensitive to how the Spark job is configured and resourced, requiring data scientists to have a deep understanding of both Spark and the configuration and utilization of the Hadoop cluster. Failure to correctly resource Spark jobs will frequently lead to failures due to out of memory errors, leading to inefficient and time-consuming, trial-and-error resourcing experiments by the data scientists. This requirement significantly limits the utility of Spark, and impacts its utilization beyond deeply skilled data scientists. Alpine Data’s Spark Auto-tuning technology removes this inefficiency, and automatically resources and configures the Spark jobs launched by the data scientists. This is not at static configuration, but rather at run-time makes a determination of the correct resourcing and configuration for the Spark job that is based on i) the size and dimensionality of the input data, ii) the complexity of the Spark job, iii) and the availability of resources on the Hadoop cluster. This technology started shipping as part of the Alpine Data Science platform in Fall 2016. And going forward Alpine plans to leverage this technology not only to help launch Spark jobs, but to leverage this deep understanding of Spark resource requirements to dynamically manage the sizing of elastic Hadoop instances, including AWS EMR. Alpine Data did not present the details of the algorithm it uses (due to time limitations and IP considerations), but rather used the presentation to start the conversation about the feasibility of Spark auto-tuning using an example algorithm.
Sign up for the free insideBIGDATA newsletter.