Above the Trend Line: machine learning industry rumor central, is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide our readers a one-stop source of late-breaking news to help keep you abreast of this fast-paced ecosystem. We’re working hard on your behalf with our extensive vendor network to give you all the latest happenings. Heard of something yourself? Tell us! Just e-mail me at: daniel
It’s been another action-packed week in the big data industry and we’re delighted to bring you this latest installment of our tech gossip sheet. Let’s start with some financial results … Numetric, the third-generation Business Intelligence (BI) provider, announced the company has grown by a factor of six in the past year alone, while its customer base continues to grow strongly. The company’s compelling vision—providing lightning fast and easy-to-use-business intelligence tools—is gaining traction. In the retail, transportation, and education industries, the young company’s recent investment in sales and marketing is bringing impressive results. Numetric’s rapid growth follows its announcement in late December that the company received $3.75 million in funding led by EPIC Ventures, and that it had appointed Greg Butterfield (Symantec, Omniture, Vivint Solar) as Chairman of the company’s board. The recent funding and addition of Butterfield have enabled a Numetric hiring spree, including members of its executive management team. Numetric will be focused on sales and marketing efforts for the first time in company history … Datawatch Corporation (NASDAQ-CM: DWCH) announced that the Datawatch Monarch self-service data preparation platform is in high demand among healthcare organizations seeking to overcome the common hurdles to data access, reconciliation and reporting. In fiscal year 2016 alone, 118 healthcare organizations turned to Monarch to radically expedite data analysis and fact-based decision making. More than 720 hospitals and other healthcare services providers now rely on Monarch to improve the preparation and analysis of patient, physician and financial data and gain insights vital to driving down operational costs, increasing productivity, maintaining regulatory compliance and improving quality of patient care … DataTorrent, a real-time Big Data analytics company founded by the creators of the leading enterprise-grade, open-source batch and stream processing engine Apache Apex, are excited by the unprecedented growth of the Apache Apex community and traction that DataTorrent is getting with production customers. DataTorrent YoY highlights include: (i) 6x growth of customers in production; (ii) 105% growth of subscription bookings dollars; (iii) 83% growth of in-production Gb. In addition to the company’s exceptional growth, DataTorrent has experienced 0% loss of production customers as a result of its obsessive focus … ThoughtSpot, the search-driven analytics company, announced it exceeded growth targets for its’ fiscal year ended January 31, 2017 with 270% growth in customers. Legacy BI vendors are losing ground as customers shift their technology budgets toward new modern BI platforms focused on making analytics easier for business people – and not just IT data experts … Pivotal announced that Pivotal Cloud Foundry, the company’s flagship software product, has crossed a major milestone, with over a quarter billion dollars in 2016 bookings, accounting for 130% growth in 2016. Pivotal works with over one-third of Fortune 100 companies, and a growing portion of Fortune 200 companies, to help solve difficult engineering problems, create operational efficiency, boost developer productivity and jump-start innovation.
We received a number of comments regarding recent hacker assaults on widely used database systems. First it was MongoDB installs that were targeted by ransomware attacks and now hackers are setting their sights on MySQL databases.
The evolution of database targeted ransomware started with MongoDB and transitioned to Elasticsearch. These two products could be installed without any authentication mechanism,” said Travis Smith, Senior Security Research Engineer at Tripwire. “When deployed to the internet with default configurations, the databases were world writable. When installing MySQL, you’re prompted for a password which protects against ransomware attacks. What these attackers are doing is guessing the root password via brute force attacks. In practice, this is a very inefficient attack vector. The adaption from MongoDB to MySQL can be expected. Databases hold some of the most sensitive information on the internet. Because of this, the value of the data can be exponentially greater than the data traditional ransomware targets. MySQL can provide decent security out of the box, with enhanced protections available quite easily. By issuing the mysql_secure_installation command, users can follow a walk through on hardening their installations to protect against attacks like this. A good rule of thumb is protecting the root account with a long and complex password in addition to preventing login from the internet, preferably only allowing local authentications.”
Ransomware has become one of the largest cyber threats facing organizations in 2017,” explained Michael Patterson, CEO of Plixer International. “The information in MySQL databases represent a gold mine for hackers. However, before companies think about writing a check to get their data back, they need to verify that hackers actually backed up the data in the first place. In several cases, companies have paid the ransomware and never received their data. The most effective protection is to have backups of all critical data to restore compromised systems. Network Traffic Analysis solutions are needed to identify that data exfiltration took place, and then provide the forensic data required to understand the size, scope and impact of the attack. While companies may not be able to defend against every attack, at least they can detect it and remediate the situation as soon as possible.”
The funding spigots seem to be wide open for big data, starting with Confluent, provider of a leading streaming platform based on Apache KafkaTM, announcing that it partnered with Sequoia and raised $50 million. Existing investors Benchmark and Index Ventures also participated in the round and Sequoia partner Matt Miller joined the Confluent board board. With total financing at $80 million, Confluent will use the new funds to further its vision of placing its streaming platform at the heart of every modern business. Streaming platforms are a result of a fundamental shift in how companies think about data–less as something stored to be processed after the fact and more as something that flows and can be processed continuously–making data the heart of the business itself … Incorta, the real-time analytics platform that makes the traditional data warehouse obsolete, announced it has completed a $10 million Series A round of financing. GV (formerly Google Ventures) led the investment round. Incorta additionally unveiled its Direct Data Mapping Engine which rethinks how data is stored and accessed. Rather than the traditional model of performing expensive and slow joins to combine data from different data sets, Incorta’s analytics engine maps directly from each piece of data to all its related data thereby completely removing the need for costly join operations. Even at massive scale, this reduces query times from hours to seconds to enable real-time analysis. Since data is directly mapped to its related data regardless of its form or structure, there’s no need to transform it into traditional normalized structures. This bypasses the need to build a data warehouse and enables companies to reduce the time required to develop highly secure, real-time analytic applications from months to days … Hedvig announced the close of a $21.5 million Series C funding round with new investments from Singapore-based EDBI and Hewlett Packard Pathfinder, part of Hewlett Packard Enterprise (HPE). The round also included expanded investments from Atlantic Bridge Ventures, including its Oman Technology Fund, and contributions from existing investors True Ventures and Vertex Ventures. With a total of $52 million in financing to date, Hedvig will use the latest round of funding to expand into new markets, develop end-to-end cloud and backup solutions for large enterprises and grow its world-class engineering, sales and channel teams … C3 IoT announced a Series E financing round at a $1.4 billion pre-money valuation. Led by Breyer Capital, a global firm with investment interest in long term-oriented entrepreneurs and teams working in artificial intelligence (AI), the funds will be used to fuel C3 IoT’s growth by expanding its product footprint and customer service capacity globally. Investors included Breyer Capital, TPG, Sutter Hill, Wildcat Venture Partners, Pat House, and Thomas Siebel. The amount of the financing was not disclosed. A fast-growing leader in PaaS enterprise software for big data, AI, and IoT applications, C3 IoT applies the sciences of big data, cloud computing, and machine learning to enable a new generation of predictive analytics applications. C3 IoT has more than 20 industrial-scale deployments with more than 100 million connected IoT sensors. The company closed a $70 million Series D equity financing led by TPG Growth in September of 2016.
In big data M&A activity, we learned that Hewlett Packard Enterprise (NYSE:HPE) announced it has entered into a definitive agreement to acquire Nimble Storage, the San Jose, Calif.-based provider of predictive all-flash and hybrid-flash storage solutions. HPE will pay $12.50 per share in cash, representing a net cash purchase price at closing of $1.0 billion. In addition to the purchase price, HPE will assume or pay out Nimble’s unvested equity awards, with a value of approximately $200 million at closing. Flash storage is a fast-growing market and an increasingly important element of today’s hybrid IT environment. The overall flash market was estimated to be approximately $15 billion in 2016 and is expected to be nearly $20 billion by 2020, with the all-flash segment growing at a nearly 17 percent compound annual growth rate. Nimble’s predictive flash offerings for the entry to midrange segments are complementary to HPE’s scalable midrange to high-end 3PAR solutions and affordable MSA products. This deal will enable HPE to deliver a full range of superior flash storage solutions for customers across every segment … Minitab, Inc., which provides quality improvement software to more than 90% of Fortune 100 companies, announced that it has acquired Salford Systems, a leading provider of advanced analytics technology for machine learning, data mining and predictive analytics. For Minitab, the acquisition extends the 45-year-old company’s mission of helping people discover valuable insights into their data by delivering exceptional, easy to use software and unparalleled support and service. The integration of Salford into Minitab’s business will benefit existing users of both company’s products and bring powerful analytic capabilities to new markets … With the announcement that Google has acquired Kaggle, the question that remains is, are more data scientists necessarily better? Pascal Kaufmann Founder and US CEO of Starmind does not think so:
When it comes to AI development it is not about having many scientists, it’s about having the right scientist. This trend we are seeing with massive companies, like Google, is they are hoarding as many data scientists as possible and then hoping they have the right people for whatever projects they are working on. This is the exact opposite approach they should be taking. Google may find itself with too many cooks in the kitchen, which will actually delay AI development, not advance it.”
There’s a lot of speculation surrounding the recently announced IBM/Salesforce partnership. Stephanie Trunzo, COO at PointSource, an IBM partner, has shared her expert opinion on where the value of this new arrangement will lie:
IBM and Salesforce already have many mutual enterprise level customers, some of them quite large and already looking at cognitive roadmaps (such as GM and OnStar). So, this partnership makes sense from the standpoint of being driven by market demand. Salesforce will benefit greatly from expanding the depth of their platform with more robust capabilities from a smart cognitive play. Watson providing contextual and personalized learnings from CRM means the ability to better target consumers, and even get smarter about how people prefer to be communicated with, uncovering synergies for cross-selling and connecting the dots between relationships. It’s not yet clear that IBM will benefit as much from the partnership as Salesforce, though the entrenchment that Salesforce has might mean Watson becomes more normalized and opens some doors for IBM to inject Watson into client conversations they didn’t previously have access to. However, the value of this partnership will be seen in the results driven by the mutual client collaborations. In other words, what IBM and Salesforce accomplish together is only the kindling – the fire that catches will be what their clients create with the mutual technology.”
We learned of a number of important customer wins for members of the big data ecosystem … ClickFox announced that it has chosen Zoomdata, developers of the fast visual analytics platform for big data, as the embedded data visualization solution for their product line, to better surface business opportunities from customer experience insights. As customer journey use cases are analyzed and unique metrics emerge, such as channel of choice or channel hops, there needs to be a fast and intuitive way to visualize and drill into these metrics and transform them into actionable insights. ClickFox already delivers deep journey analytics capabilities, but customers want to see these unique journey metrics in convenient, interactive dashboards. Rather than building and maintaining its own visualizations, ClickFox turned to Zoomdata … Trifacta, a leader in data wrangling, announced NationBuilder, a leading community organizing software platform for leaders, leverages Trifacta Wrangler Enterprise to routinely standardize roughly 145 million voter records from 50 U.S. states and more than 3,000 counties in order to make the data free and available to national and state level political campaigns. With Trifacta, NationBuilder can more quickly and easily wrangle unruly, non-standardized voter registration data with a distributed, user-friendly solution. A process that used to take two years now can be accomplished in less than 2 months … SAP SE (NYSE: SAP) announced that Duke Athletics has taken fan engagement to a whole new level by choosing SAP® technology and LSI Consulting to revamp its statistics site with a complete archive of Duke men’s basketball statistics since 1906. The new site includes individual game box scores, lead information, player stats, season stats, year-by-year breakdowns, rankings, team records and a host of other features, all powered by SAP HANA® — right in time to watch the Blue Devils go dancing at March Madness this year.
In the people movement category, we found out that … MapR Technologies, Inc., the provider of the converged data platform, announced the appointment of Tom Fisher as Chief Technology Officer. Tom brings over 20 years of advanced technology experience in engineering, operations, and IT. In his role as CTO, Tom will spearhead initiatives around advancing MapR’s aggressive innovation agenda globally. Tom was previously with Oracle where he was a senior executive in engineering and operations for over five years, most recently driving the successful adoption of new and emerging technologies at the company’s top 40 cloud customers globally … MapR also announced the appointment of Simon Dale to lead the organization in the Asia Pacific region. As Vice President, Asia Pacific and Japan, Simon is responsible for MapR business expansion across the region, including sales growth, partner development, strategic planning, and customer engagement. Prior to joining MapR, Simon was a member of the senior executive team at SAP Asia Pacific where he launched and managed several traditional software and cloud services businesses. With a 25-year career in the technology industry, Simon has worked extensively across Asia Pacific and Japan introducing new products and services to markets as well as building and developing new sales teams … Treasure Data, the leading cloud platform to make all data connected, current, and easily accessible, announced the hire of Paul ‘Kip’ James as the company’s first Chief Information Security Officer. In a time when keeping enterprise and client data secure is paramount, Treasure Data welcomes this seasoned IT security expert and retired U.S. Marine Gunnery Sergeant, who has received six medals for technology-related processes and engineering, and whose compliance experience has impacted large enterprises in the secure migration of their data, including NORAD and Lockheed Martin. James has also been instrumental in developing and implementing advanced technical programs and resolutions, and will amplify Treasure Data’s stringent security and commitment to data protection and security … Cambridge Semantics, a leading provider of graph-based Smart Data management and exploratory analytics solutions, announced the appointment of Dan Szot as Vice President of Sales for the company’s life sciences division. Szot brings more than 20 years of sales leadership and field operations experience to the Cambridge Semantics team, specializing in enterprise sales, research discovery and clinical applications. With his broad experience in all phases of the pharmaceutical business – from drug ideation to commercialization and product life-cycle management – Szot possesses a keen understanding of the value that cutting-edge data discovery and analytics can offer the industry. Szot joins a rapidly expanding team that has been delivering award-winning, enterprise knowledge graph-based solutions for many of the world’s top pharmaceutical, biotechnology, biomedical technologies and other life sciences firms. Cambridge Semantics’ in-memory, massively parallel, semantic graph-based platform delivers a clear competitive edge to data-driven organizations, while maintaining trust with security and governance, for enterprise-wide data lake and analytic initiatives.
In new partnerships, collaboration and alignments, we heard … Kinetica, provider of the fast, in-memory analytics database accelerated by GPUs, announced it joined the Confluent Partner Program and completed development and certification of its Apache Kafka™ Connector. The connector is available for immediate delivery. Kinetica’s Kafka Connector lets customers read and write data directly between Kafka and Kinetica, allowing organizations to ingest real-time data streams from Apache Kafka and provide a means for analysis and immediate action on incoming data. The Kinetica Connector can be deployed into any Confluent cluster from the Control Center GUI or command line using the Kafka Connect RESTful API. The Kafka Connect API ensures fault tolerant integration between the Kafka topic stream and the Kinetica instance … Kinetica, also announced its real-time analytics and visualization solution is immediately available on the Nimbix Cloud. Providing instant results and visualized insights across massive streaming datasets, Kinetica on the Nimbix Cloud can be launched in seconds and is the ideal solution for GPU-accelerated analytics … Narrative Science, a leader in advanced natural language generation (Advanced NLG) for the enterprise, and Sisense, the Business Intelligence (BI) company that’s disrupting the industry by simplifying analytics for complex data, announced a strategic partnership. With this partnership, Sisense is leveraging the Narratives for Business Intelligence API to power Sisense Everywhere, a program that is changing the way business users consume data by bringing it into their natural environments … Tamr, Inc. announced that Hewlett Packard Enterprise (HPE) will resell the company’s patented data unification software to bring innovative new solutions to its customers. Through this agreement, Tamr is now part of the HPE Complete program, making it one of an elite set of technologies that HPE has validated for interoperability and chosen to resell. The program adds Tamr solutions to HPE’s price list so that customers can purchase complete HPE and Tamr solutions directly from HPE and its resellers … DataEndure and Komprise Inc. announced a partnership that provides organizations with an assessment of their unstructured data, what can be stored using less expensive methods and what can be deleted. By combining DataEndure with Komprise’s solution that runs as a service, customers can save as much as 70 percent or more of the cost of storing, protecting and accessing data … MapR Technologies, Inc., the provider of the converged data platform, and Outscale, the enterprise-class cloud provider, announced that they will work together to provide a cutting-edge Big Data Platform as a Service (PaaS) utilizing the MapR Converged Data Platform. Available in Europe, North America and Asia from Outscale, the new premium cloud service based on MapR provides a cost-effective and highly flexible platform that can support companies on their big data journey — from initial proof of concept, to prototype and application deployment with unlimited scalability … Trifacta, a leader in data wrangling, announced it has collaborated with Google (NASDAQ: GOOGL) to build Google Cloud Dataprep. Google Cloud Dataprep embeds Trifacta’s intelligent, user-friendly interface and Photon Compute Framework, and natively integrates Google Cloud Dataflow for serverless, auto-scaling execution of data preparation recipes with record performance and optimal resource utilization. Google Cloud Dataprep provides analysts with the ability to intuitively explore and prepare diverse data sets within Google Cloud Platform for a variety of downstream uses including analytics and machine learning. Google and Trifacta’s collaboration gives organizations the ability to leverage the full potential of data in Google Cloud Services to drive new sources of business value such as improving operational efficiency, personalizing products and services, and uncovering new insights.
Our antenna was up to receive analysis and comments about the Amazon Web Services outage. Here are a couple of observations:
Amazon’s outage demonstrates how running in the cloud does not absolve companies from the need to ensure high availability in their operations,” said Michelle McLean, Vice President of Marketing for ScaleArc. “All the companies whose services are impacted now were vulnerable because all their operations ran out of a single Amazon region. Companies should architect their systems to share operations across regions so that if one territory does down, the company retains operating systems, even if at lower capacity. Architecting for cross-region operations and failover is challenging, particularly at the data tier. Many companies rely on database load balancing software to make it easier to achieve cross-region failover, since the software enables applications to straddle multiple regions. Companies that have architected their operations with this level of resiliency in mind can’t be fully taken offline even during a substantial Amazon outage.”
Today’s S3 crash will inevitably cost businesses millions of dollars,” said Chip Childers, CTO, Cloud Foundry. “This is why all businesses need a multi-cloud strategy so they can adapt immediately when, inevitably, one of their cloud vendors experiences a failure. It’s not Amazon’s fault, it’s inevitable. #cloudfoundry keeps your cloud options open.”
Stone-cold simple: SaaS vendors need to assume that cloud infrastructure will be down a few times a year. Cloud-native services should add Multi-Availability Zone and Multi-Region replication for seamless failover,” commented Puneet Chawla, Chief Technology Officer and Co-founder of Workspot. “At Worskpot, we’ve spent the last 4 years making sure our service is region and cloud agnostic so that we can failover to another location when disaster strikes.”
The lesson for us all from this major AWS outage may very well be to build redundancy for mission critical apps across multiple clouds,” said James Sivis, VP of ITaaS Nerdio.
Finally, the following expose on the AWS outage was contributed by Shayne Higdon, President, Performance and Analytics at BMC Software:
AWS went down … how painful was it for you?
The public cloud providers including Amazon and Microsoft are enabling business and technology advancements everywhere you look — from our world economy to our personal daily lives. But, what happens when a public cloud provider has an outage? How painful is it for YOU? For you as an individual it may only be a minor inconvenience, like not being able to breeze through a TSA line and pull up your electronic boarding pass as you approach the agent. But, if you made a business bet on the promise of greater agility and lower costs, a cloud outage can create the opposite effect – paralyzing your business and lowering revenues. The really scary reality for business leaders is that they cannot answer just how painful the outage was.
“Which services went down and for how long?”
“How much revenue did you lose?”
“What are customers and the market saying about us?”
“Was the problem we had today caused by the cloud outage, or did something else go wrong?”
It may sound crazy, but many enterprises don’t have visibility to how much they have riding on the cloud. Cloud resources are being purchased at an amazing rate, by teams and departments all over the enterprise, to support business applications and drive their digital transformation. This adoption is not easy to track inside a business and the speed of competition has made getting the app released more critical than getting the centralized control once enjoyed by IT. The cloud providers have made it easy, which is great, but it’s created a surprisingly big challenge for Enterprise IT to answer the boss when she asks, “how bad was it for us?”
Today’s Digital Enterprise requires IT to do more than ever before:
- Support new technology faster to enable business agility
- Regain and maintain a centralized view of the increasingly complex landscape
- Make better use of data, to help drive action before problems are felt by users
- Prepare for failure with automated remediation and the ability to shift workloads from cloud to on-premises and back
- Tie outgoing cost and incoming revenue to the transformative services being rolled out
The Cloud promise is real and the bet is a safe one to make, but it does present new challenges. The IT organizations who can address them the fastest will be best positioned to not just support – but drive – their digital enterprise transformation. And instead of asking “how painful was the AWS outage”, they can think about how much money they made when they stayed up and their competitors went down.
Sign up for the free insideBIGDATA newsletter.