Sign up for our newsletter and get the latest big data news and analysis.

insideBIGDATA Latest News – 2/6/2020

In this regular column, we’ll bring you all the latest industry news centered around our main topics of focus: big data, data science, machine learning, AI, and deep learning. Our industry is constantly accelerating with new products and services being announced everyday. Fortunately, we’re in close touch with vendors from this vast ecosystem, so we’re in a unique position to inform you about all that’s new and exciting. Our massive industry database is growing all the time so stay tuned for the latest news items describing technology that may make you and your organization more competitive.

Qlik Launches Data Literacy Consulting and Service Offerings to Enable Customer Success

Qlik launched a new Data Literacy Consulting Service and Signature Services designed to help organizations adopt a Data Literacy as a Service approach to creating a data-driven culture. These always-on education, consulting and support services will help organizations drive higher data literacy rates while optimizing the value trapped in their data.

“Around the globe, organizations consistently tell us they believe data literacy is essential to their ability to scale data-driven decision making and increase value from data,” said James Fisher, Chief Product Officer at Qlik. “What they are struggling with is how to best blend the people, process and technology elements necessary for a culture to become truly data-informed. Our new offerings, through a proven structure and expert resources, enable a higher level of customer success when bringing data-driven decisions to every aspect of their business.”

OSS Expands AI on the Fly® Product Line, Adding PCI Express 4.0 Expansion System with Eight NVIDIA V100S Tensor Core GPUs

One Stop Systems, Inc. (Nasdaq: OSS), a leading provider of specialized high-performance computing solutions for mission-critical edge applications, announced the availability of a new OSS PCIe 4.0 value expansion system incorporating the latest NVIDIA V100S Tensor Core GPU. As the newest member of the company’s AI on the Fly® product portfolio, the system delivers data center capabilities to HPC and AI edge deployments in the field or for mobile applications. The 4U value expansion system adds massive compute capability to any Gen 3 or Gen 4 server via two OSS PCIe x16 Gen 4 links. The links can support an unprecedented 512 Gpbs of aggregated bandwidth to the GPU complex. The expansion system features 10 PCIe 4.0 slots and 4,000 watts of load sharing power. In conjunction with V100S Tensor Core GPUs, it delivers up to 1,040 teraFLOPS of tensor performance and 65.6 teraFLOPS of double precision performance, accelerating both computational science and data science. The NVIDIA V100S Tensor Core GPU brings CUDA Cores and Tensor Cores in a unified architecture to enable mixed-precision computing. This feature is especially useful for AI training, where operations are run in FP16 precision, and results are accumulated in FP32 precision, which delivers significant speedups while preserving accuracy. In addition, NVIDIA V100S GPUs offer FP64 precision for scientific computing applications like simulations, and INT8 precision for AI inference. The OSS expansion system provides this GPU capability for military, automotive, aerospace and industrial edge applications.

“Our new compute acceleration expansion platform demonstrates how we continue to lead the industry in delivering the latest in high-performance technology for mission critical edge applications,” said OSS CEO Steve Cooper. “By combining our proprietary technology and designs with the latest NVIDIA GPUs, we lead the market in Gen 4 expansion and powering AI on the Fly applications.”

Veego AI Drives Auto-Support into Connected Homes

Veego Software, an Israel-based startup that puts an end to malfunctions in the connected home, announced the Veego Self-Care solution based on artificial intelligence (AI) and other advanced technologies, enabling real-time self-support in the connected home. The solution shifts costly and cumbersome subscriber-support actions from traditional customer care to the vigilant Veego AI, saving service providers vast technical support resources and expenses.

“Until now, Internet Service Providers have had very little visibility and practically no control over the broad range of smart devices that each home acquires and the personalized services used by each individual,” stated Amir Kotler, Veego CEO. “Yet, subscribers hold their service providers responsible when things don’t work.”

SQream Hits Record Analytics Acceleration Levels for Massive Data with Release of SQream DB v2020.1

SQream announced the latest release of its flagship data analytics engine, SQream DB v2020.1. SQream DB v2020.1 is the first release of 2020, with a strong focus on rapid integration into existing Hadoop and legacy data warehouse ecosystems. SQream DB enables the analysis of significantly more data and dimensions, at the fastest query times possible for massive data volumes, revealing previously unobtainable critical business insights and decision-making capabilities. SQream DB v2020.1 is designed for enterprises with massive data stores, who are not able to analyze enough of their data to deliver new and critical insights to propel their business and drive competitive advantage. Analytics that were previously too long-running, or were simply not achievable in the existing ecosystem, are now achievable with SQream DB.

“Enterprises are facing huge challenges in analyzing the exponentially growing data they have stored in Hadoop and legacy data warehouses. They can’t analyze the amount of data they want to, the analytics are taking way too long to be effective, or they need to improve the efficiency of their AI/ML data pipeline,” said Ami Gal, CEO and co-founder of SQream. “With SQream DB v2020.1, companies can rapidly deploy SQream DB into their existing data ecosystem to analyze much more data, much more quickly, providing data scientists and business stakeholders with significantly improved data insights.”

Couchbase Introduces Couchbase Cloud, the Enterprise-Class NoSQL DBaaS for the Multicloud Era

Couchbase, the creator of the enterprise-class, multicloud to edge NoSQL database, introduced Couchbase Cloud, a fully-managed Database-as-a-Service (DBaaS). Couchbase Cloud gives enterprises access to pay for only what they use of the most powerful NoSQL database technology available to the market, with their data hosted entirely within their own Virtual Private Cloud (VPC). With this announcement, Couchbase becomes the first fully managed SQL-on-NoSQL database that supports multiple cloud providers. The service will launch this summer on Amazon Web Services and Microsoft Azure, followed by Google Cloud Platform.

“As enterprises continue to move their applications to the public cloud to increase business agility, they are looking for enterprise-class solutions that can deliver on their business-critical application requirements,” said Scott Anderson, SVP, Product Management & Business Operations, Couchbase. “These requirements include security, high availability, reliability, manageability, performance at scale, and more. And in not just one cloud, but in multiple clouds, across clouds, and in hybrid cloud configurations. Working closely with our enterprise customers over the past two years, we’ve designed a fully-managed Database-as-a-Service offering that meets their most stringent requirements, leveraging the latest generation of open source technologies while retaining all the value of self-managed Couchbase Server, to deliver a uniquely powerful and differentiated offering to the market.”

Collibra Launches Data Lineage, an Automated Data Lifecycle Mapping Capability 

Collibra, the Data Intelligence company, announced the debut of Collibra Lineage, which enables organizations to better understand where data comes from and how it flows and transforms as it moves across the enterprise. The launch also marks the integration of SQLdep, the automated technical lineage provider that Collibra acquired in July 2019. By automatically mapping relationships between data points, Collibra Lineage shows how data sets are built, aggregated, sourced and used and provides complete, end-to-end lineage visualization. Collibra Lineage enables large enterprises to understand the full context of their data and ensure that the most trustworthy data available is used to inform business decisions.

“Data discovery and mapping relationships between data sets is a crucial step in the Data Intelligence journey,” said Jim Cushman, Chief Product Officer for Collibra. “With Collibra Lineage, data citizens have the confidence and trust that their data is accurate, empowering them to better collaborate and innovate with their data.”

Pepperdata Introduces Query Spotlight

Pepperdata, a leader in Analytics Stack Performance (ASP), announced the availability of Query Spotlight, a new product in their big data analytics performance suite. Query Spotlight makes it easy for operators and developers to understand the detailed performance characteristics of their query workloads together with infrastructure-wide issues that impact these workloads. With this new functionality, query workloads can be tuned, debugged and optimized for better performance and reduced costs, both in the cloud and on premises. Query Spotlight simultaneously provides detailed information on query resource utilization, along with detailed database views. Query Spotlight details execution plan skew, poorly optimized queries and historical runtime variance so operations teams can remediate issues as they arise. Query Spotlight also highlights hot partitions, outdated table statistics, and other system and storage issues.

“Queries represent more than 50% of the analytics workloads in today’s big data environments. Working closely with our customers we have identified key issues they face with queries in their deployments. We are confident that Query Spotlight solves these pain points. Query Spotlight provides targeted insight into query execution, giving customers the answers they need to dramatically increase performance and rapidly decrease costs,” said Ash Munshi, CEO, Pepperdata.

RealityEngines.AI Comes Out of Stealth and Launches Autonomous AI Service to Address Common Enterprise Use-cases

RealityEngines.AI, a San Francisco-based AI and machine learning research startup, is coming out of stealth and launching an autonomous cloud AI service to address common enterprise use-cases. The cloud AI service automatically creates, deploys and maintains deep learning systems in production. The engine handles setting up data pipelines, scheduled retraining of models from new data, provisioning high availability online model serving from raw data using a feature store service, and providing explanations for the model’s predictions. The service helps organizations with little to no machine learning expertise plug and play state-of-the-art AI into their existing applications and business processes effortlessly.

RealityEngines.AI tackles common enterprise use-cases including user churn predictions, fraud detection, sales lead forecasting, security threat detection, and cloud spend optimization. Customers simply have to pick a use-case that is applicable to them and then point their data to RealityEngines. The service will then process the data, train a model, deploy it in production and maintain the system for them. Behind the scenes, RealityEngines.AI searches several thousand neural net architectures to find the best neural net model based on the use-case and dataset. The underlying neural net model trained by RealityEngines.AI surpasses custom models that are hand-tuned by experts and take months to put into production.

ArangoDB Expands Machine Learning Offering with ArangoML Pipeline Cloud

ArangoDB, a leading open source native multi-model database, announced the release of ArangoML Pipeline Cloud, a fully-hosted, fully-managed common metadata layer for production-grade data science and Machine Learning (ML) platforms. ArangoML Pipeline Cloud runs on ArangoDB Oasis, ArangoDB’s recently released cloud service, and is the latest offering in ArangoDB’s ML extension, ArangoML. ArangoML Pipeline Cloud meets the needs of both data scientists, who are concerned with the quality of the data, feature training, and model results, as well as DevOps, who need to manage which datasets and deployments are in use, their performance, and how they are being deployed. ArangoML Pipeline Cloud centralizes the metadata produced across the ML pipeline, providing a common interface to show relationships of the data, features, and model training results, as well as the deployments, management, and serving logistics. ArangoML Pipeline solution is pipeline agnostic, allowing any combination of pipeline components to be connected. Additionally, as a cloud-based service, it can be up & running in just a few clicks.

“Common metadata is an often overlooked aspect when building production grade ML pipelines, but is equally as important as good training data,” said Jörg Schad, Head of Engineering and Machine Learning at ArangoDB. “It is not only crucial for DataOps teams when looking for reproducible builds, audit trails, or compliance with privacy regulations, but extremely valuable for data scientists as well — allowing them to easily grasp the lineage of models, what artifacts are involved, and also enabling performance comparisons across different models and approaches.”

Run:AI Contributes to Open Source Data Science Community with Gradient Accumulation Tool

Run:AI, a company providing the first platform for virtualization of AI workloads, has published an open-source mechanism for gradient accumulation in deep learning training. The free open-source project, available on Github, will help both veteran data science teams and beginners train on large batch sizes even when GPU memory is limited, improving both performance and accuracy of models. When building a deep learning model, one of the critical hyperparameters that data scientists consider is how many training examples (e.g. images) the neural network model should process in each training iteration — the “batch size”. However, sometimes the batch size is limited by the available memory of the GPUs which are running the model. Deep learning models themselves are becoming bigger and more complex, taking up more GPU memory and further reducing the maximum possible batch size and the achievable accuracy. One solution to this problem is gradient accumulation. Gradient accumulation splits up the batch into smaller mini-batches which are run sequentially, while accumulating their results. The accumulated results are used to update the model parameters only at the end of the last mini-batch. Gradient accumulation is a particularly good option where there’s only access to a single GPU, because it can be run sequentially on the single resource. 

“We decided that by open-sourcing our solution, we could help many data scientists and researchers to understand the batch-sizing problem they encounter and its main cause, and ultimately to solve the issue in a simple way.” said Dr Ronen Dar, CTO and co-founder of Run:AI.

StreamSets Announces Support for New Microsoft SQL Server 2019 Big Data Clusters

StreamSets®, provider of the industry’s first DataOps platform, announced support and platform integration for Microsoft’s recently announced SQL Server 2019 Big Data Clusters. With this integration, SQL users are empowered to design and operationalize data pipelines for big data workloads without the complexities of coding for big data systems. 

“By building native support for Microsoft SQL Server 2019 Big Data Clusters, the StreamSets DataOps Platform is bridging big data and SQL Server use cases,” said Jobi George, general manager, StreamSets Cloud. “We are excited to continue to bring innovative solutions to our joint Microsoft customers and empower them to deliver data faster and with confidence through our DataOps platform.”

GNY Launches Machine Learning as a Service tool in AWS Marketplace to Make Businesses AI-ready

GNY, a decentralized machine learning (ML) platform, announced the launch of a new Software-as-a-Service (SaaS) tool designed to allow businesses to check the AI-readiness of their data. The GNY Data Diagnostic analyzes a company’s datasets and detects if the historical data is strong enough for effective ML, or if weak or inconsistent datasets are skewing the predictions and weakening the business. GNY’s Data Diagnostic team offers everything businesses need to become AI-ready, from education about AI basics and how predictive analytics works, to learning about the company’s data collection practices to offer a tailored solution. The result includes a detailed analysis of the historical data, what needs to be done to prepare to support ML, as well as an analysis of the digital architecture’s ability to support the company’s business goals. The data exploration service is offered as a flat fee service in the AWS Marketplace, and a free consultation allows clients to explore the service with no obligation. AWS Marketplace is a digital catalog with thousands of software listings from independent software vendors that make it easy to find, test, buy, and deploy software that runs on Amazon Web Services (AWS).

“We are excited to make AI available and affordable for small and medium businesses,” said Cosmas Wong, CEO of GNY. “ML services have only been available to large companies with large budgets. We believe that these services should be available to smaller and medium-sized companies who may be able to leverage the value of their existing datasets.”

InfluxData Announces Availability of Time Series Platform on Google Cloud

InfluxData, creator of the time series database InfluxDB, announced the availability of InfluxDB Cloud on Google Cloud. The strategic collaboration, originally announced in April 2019, is part of a major Google Cloud initiative to make the most powerful open source technologies more accessible to its customer base. InfluxDB Cloud on Google Cloud delivers the leading time series platform to developers and businesses, with unified billing, support and integration with other Google Cloud services, such as BigTable, PubSub and Stackdriver. InfluxDB Cloud on Google Cloud is a serverless, purpose-built time series database-as-a-service with advanced analytics that provide real-time observability into IoT and DevOps workloads. Modern application development requires specialized tools to drive faster, more agile innovation, and general-purpose databases — both relational and non-relational — can’t handle the scale of these workloads in a cost-effective manner. InfluxDB Cloud empowers users to derive new insights from their data, and with usage-based pricing, ensures customers never have to worry about added costs of overprovisioning or risk their application being unavailable when they need it.

“Data has gravity, and open source technologies are increasingly critical for building next-generation applications,” said Evan Kaplan, CEO of InfluxData. “Through this strategic partnership, Google Cloud is creating a marketplace for developers to thrive — with options for specialized workloads and access to the most powerful open source tools at their fingertips. It’s a continuation of our shared commitment to deliver solutions that help drive new data insights and applications.”

VocalZoom Introduces Autonomous Sensors for Industrial Internet of Things

VocalZoom (VZ), a leading provider of vibration sensors for industry 4.0, launched its groundbreaking Autonomous Sensors for the Industrial Internet of Things (IIoT). Combining contactless, high-resolution vibration sensor technology with built-in data processing and wireless communications, the company’s Autonomous Sensors offer low-cost and fast deployment of a wide range of monitoring applications for IIoT environments, providing manufacturers with crucial operational and technical insights. VZ’s laser sensors measure 3D, motion and vibrations of any surface to enable industrial manufacturers to monitor the real-time health and performance of engines, turbines, pumps and more. Their compact and contactless form enables the sensors to work on hot, wet, and moving surfaces, and they even continue to analyze mechanical health through glass. VZ’s Autonomous Sensors also include built-in real-time data processing and decentralized data logic powered by Ucontrol’s uPC platform, offering a standalone solution for edge processing and data analytics. The system can connect to a computer in the manufacturer’s production line or to an internal or external cloud.

“We believe data from next-generation sensors will shape the future of industrial IoT by reducing the cost of implementation, solving new use cases, and helping factories become smarter and more efficient,” said Tal Bakish, VZ CEO. “Our Autonomous Sensors revolutionize the Industrial IoT by providing a complete monitoring and edge processing solution in a single, compact package that saves manufacturers time and money while providing the most accurate monitoring possible.” 

The Future of Graph Databases Is Here: Introducing Neo4j 4.0

Neo4j, a leader in graph technology, announced the general availability of Neo4j 4.0, a significant product release in the graph technology market. The 4.0 release of the Neo4j graph database addresses the broad and complex challenges of application development in the decade to come, including unlimited scalability, intelligent data context and robust enterprise-grade security. Enterprise organizations are already using Neo4j 4.0 to build intelligent applications that leverage the increasingly dynamic, interconnected nature of data.

“I believe Neo4j 4.0 will set the pace for all graph database technology in the next decade and beyond,” said Emil Eifrem, CEO and Co-Founder of Neo4j. “With Neo4j 4.0, we made a massive, audacious engineering investment to raise the bar on the scale, performance and security that can be expected from a graph database – and from databases in general. Our customers challenge us with new use cases for graph technology which require unlimited scale, as well as development and deployment flexibility, all while maintaining security and privacy. These are hard problems to address, but Neo4j 4.0 is already delivering on this promise.” 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: