Sign up for our newsletter and get the latest big data news and analysis.

State of the Art Natural Language Processing at Scale

The two part presentation below from the Spark+AI Summit 2018 is a deep dive into key design choices made in the NLP library for Apache Spark. The library natively extends the Spark ML pipeline API’s which enables zero-copy, distributed, combined NLP, ML & DL pipelines, leveraging all of Spark’s built-in optimizations.

insideBIGDATA “Ask a Data Scientist” Series

Welcome to the series of articles sponsored by Intel – “Ask a Data Scientist” from insideBIGDATA’s popular Data Science 101 channel. These articles constitute many of our site’s most popular resources for newbie data scientists. The 12 articles listed below were from reader submitted questions of varying levels of technical detail and answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist.

Field Report: DataWorks Summit 2018

In this field report I wanted to give you a sense for what the vendor ecosystem was saying at DataWorks Summit, their corporate message if you will. Each company had a somewhat different slant of course which aligned with their products and services, but there was also a lot of commonality. Most everyone had some tie into the industry’s current buzz – AI, machine learning and deep learning. This was perfect for me as a practicing data scientist myself. Let’s get started with some vendor snapshots …

Be on Top of Key Data Analytic Trends

Emily Washington: ‘Businesses are increasingly evaluating ways to streamline their overall technology stack… to successfully leverage big data and analytics’. Tech trends in data analytics are seeing the industry soar. Discover more here.

AI for Pharma R&D – Creating Anti-cancer Drugs Faster, Reducing Process from Years to Days

The costs and process of developing anti-cancer drugs has been an extreme challenge for decades. Today one company, AccutarBio, is harnessing the power of AI to accelerate drug discovery and reform the current “hit-to-lead” drug discovery scheme. The company recently received $15 million in funding (including money from Chinese AI/facial recognition company YITU) and is now partnering with Amgen.

Advancements in Dynamic and Efficient Deep Learning Systems

We’re seeing much hype in the marketplace about the potential of AI, especially with respect to computer vision systems and its ability accelerate the development of everything from self-driving cars to autonomous robots. To create more dynamic and efficient deep learning systems, that don’t compromise accuracy, IBM Research is exploring new and novel computer vision techniques from both a hardware and software angle.

AI Study: The Coasts Are Excited, The Midwest & The South Are Not

Conversica, a leader in conversational AI for business, released the results of its Regional AI Adoption study that polled residents across the United States to see how they are currently using AI and how excited they are about the prospects of artificial intelligence helping to solve local problems. Results of the study show that Americans feel optimistic about AI and hope that it will someday help to address homelessness, relieve traffic and cure diseases—although priorities vary by location and generation.

2018 State of Embedded Analytics Report

The 2018 State of Embedded Analytics Report by Logi Analytics explores the top benefits of embedded analytics, the latest trends, advantages of different development methods, and what the future of analytics looks like. Logi surveyed more than 500 people who shared their perspectives on how they are embedding analytic capabilities to meet ever-changing market needs.

Best of arXiv.org for AI, Machine Learning, and Deep Learning – May 2018

In this recurring monthly feature, we will filter all the recent research papers appearing in the arXiv.org preprint server for subjects relating to AI, machine learning and deep learning – from disciplines including statistics, mathematics and computer science – and provide you with a useful “best of” list for the month.

Data Science 101: Handling Missing Data (Revisited)

I recently received the following question on data science methods from an avid reader of insideBIGDATA who hails from Taiwan. I think the topics are very relevant to many folks in our audience so I decided to run it here in our Data Science 101 channel. The issue of missing data is one most data scientists see quite frequently.