I recently caught up with Jim McHugh, GM of the Deep Learning Group for NVIDIA, from the floor of Strata + Hadoop World 2016 NYC to discuss analytics, the overall AI evolution and how there’s a lot of synergy between the two. Jim leads the teams who are responsible for the NVIDIA DGX-1, the world’s first AI supercomputer in a box. His responsibilities include product management, product marketing and partner solutions. Jim is focused on executing strategies to deliver GPU-based computing solutions for the data center. Jim has more than 25 years of experience as a marketing and business executive with many technology leaders including Apple, Cisco, and Sun. He has a deep understanding of business drivers, market/customer dynamics, and technology-centered products for AI, enterprise and data center applications.
Daniel D. Gutierrez – Managing Editor, insideBIGDATA
insideBIGDATA: Please give me a run down on what NVIDIA is doing here at the conference and maybe some of the discussions you’re having.
Jim McHugh: This has been an interesting combination of shows for us because in one case, I did the keynote at O’Reilly AI yesterday, and was on stage just after Yann LeCun and a bunch of interesting people came on. Google was representative, I had someone who was doing AI emotions, etc. Here, we’re talking all about accelerated analytics and how that’s going to effect more and more what’s going on in big data. When I say “accelerated analytics,” I’m referring to a lot of the accelerated databases like Kinetica, MapD, SQream, BlazingDB, as well as the analytics components which allow you to access data in milliseconds. With a normal query, you go about 10 seconds per query, but now we’re talking about reloading a page with about 10 to 15 queries in milliseconds. This would take 10 seconds per query using a normal x86. Here, we’re really doing an ecosystem type of show. I have all all my partners in. We did an event Monday night, and we talked about analytics. I’ll be on stage later today and I’m going to have 3 of the companies come up on stage and do demos with me. It’s just really the ability to see the reality of it that captures people’s eyes. They can’t believe it, like how fast they’re able to navigate through the data, it really changes things. It really overcomes some of the bottlenecks and the workarounds that people have tried in the big data space for a while.
insideBIGDATA: What other conversations have you been having?
Jim McHugh: I think it’s exciting, because this is changing, even here at Strata, because the last couple of years it’s been about Hadoop, and then Spark was the big thing, and people were trying to accelerate there, but I think people found that they were still stymieing the creativity of their analysts if you literally have to wait a minute and then go on to the next query. It becomes wearisome. So I just think it’s just exciting to watch people’s eyes when they say, “Oh, can you really explore that fast?” So it’s almost like the speed of thought as you start going through it. It allows you to try things that aren’t just trying to answer the question that was giving you. You’re actually exploring, and that makes it really cool.
Another component is, I don’t know if you’ve heard of a company called Graphistry. They would have been at GTC in April. But what they are is a visual graph company, and here what we’re doing with them is doing more security demos. So they’re able to look at security logs. Let’s say you’re using Splunk, you could point Graphistry at Splunk, and with a few lines of code it could be the digitalization capability. It allows you to quickly go through one of the security alerts for the day, and there could be hundreds of thousands of those, and then you could visually correlate those where it groups them naturally, and then you could figure out whether this actually is just noise coming from the firewall, or an internal sort of scan. You can quickly get to what was the core activity of the day that you need to be looking into. Again, that one is really interesting and it’s getting a lot of attention because people have all these log files and they could do a report, but now they can actually visualize trouble spots and keep dissecting and going deeper into it from a visual standpoint. It’s been a great to have Graphistry with us, and demonstrating with us. The idea of looking through your Spunk log files has totally changed.
insideBIGDATA: What role do you see your partners playing in this “AI driven analytics” area?
Jim McHugh: I think a great way of looking at “AI driven analytics” would be as an up-and-coming buzz word about the customer’s digital business, and getting access to information. The first thing I see these partners doing is to accelerate the analytics component. The other side of the equation is doing it at a fraction of the cost, since some approaches to scaling-out come with a lot of hidden costs. Scaling-out means I need more of everything just to get compute. But I can’t just get more computers, because I have to bring along the fans, the interconnects, the chassis, all that kind of stuff. That’s the first step that the partners bring to the table.
The second step is to address the concern that while deep learning is coming, it’s coming fast and furious, and it’s changing everything, but people coming in from a particular industry want to understand how these conclusions came to the forefront. They use graph analytic type approaches to understand the data. So you get the correlations but with 100x more data, and it will give you the understanding of how it came to that. We have applications that allow you to track it, you get image recognition like going down to the neural net and see what neuron kicked it off. But I think your every day business analyst, if they want to understand it, I think these visual graphs from Graphistry is a great way of doing it.
So, these are the two things I see. First, they’re accelerating it and then second, they give you a visual component of it. The final step will be a move about how they’re going to move into AI. I’ve had some conversations along these lines with MapD and Kinetica about how they’re going to start doing some joint initiative either using TensorFlow or Torch, and see how that can play out. So our partners are exploring the boundaries about how they can accelerate access to data that’s going to be used for some of the frameworks as well.
insideBIGDATA: Do you think it’s too early right now along this AI driven analytics line of thinking. Is it too early for specific use cases?
Jim McHugh: We have a number of customers who are actual use cases like USPS, PGE, Verizon, EMC and others. There’s also a very large retailer that is all over this technology. They love the acceleration they’re getting. They made it quite clear that the cost savings that their getting is helping to pay for infrastructure along side of it, so the idea of starting with a POC is a no-brainer for this company. Stop for a moment and consider why that’s happening – if you’re in retail and you have a large amount of data coming in, you want to track everything to your inventory, to status, and you have to do quick reporting. If you’re using in-memory databases before, it’s got to be too expensive with the amount of memory you need to have, or you’re having slower queries, or you just were spending a lot of money looking for an SAP HANA trying to use that for in-memory, but the cost there is quite high when you start scaling it out.
There are lots of use case examples coming online. Our booth here has been quite crowded because people are stopping by to see this. So honestly, what I really like about Strata Hadoop East, it’s a real customer-focused sort of show, whereas sometimes on the West Coast, it ends up with the vendors talking to vendors. But there are a lot of people stopping by, seeing the demos, asking where they can get more information. We have a peer presentation we’re running and people are stopping for them. It’s also interesting how if we start doing a deep learning presentation, the crowds start collecting. So there is definitely interest in AI and deep learning in the big data space.
insideBIGDATA: Are you working with Spark?
Jim McHugh: Yes, we’re looking at acceleration of Spark and ways we can do some collaboration with Databricks in particular. Have you heard of a project out of Berkeley called BIDMach? It’s really similar to MLlib as part of Spark, but I’ll stop short of saying it will be a replacement. As the acceleration gauges of BIDMach becomes clear, it is getting attention at companies and it definitely got our attention because it’s all on GPUs. You’re seeing people here talking about it, and thinking how they’re going to accelerate MLlib and we’re happy to work with them on that as well.
insideBIGDATA: Can you look out a year from now, and where do you think NVIDIA is going to be with this new message of AI driven analytics?
Jim McHugh: Sure, I think what you’re going to start hearing is people talking about that it’s time to become an “AI enterprise.” I see that coming as a buzzword, or mantra, I guess is better way of saying it. It’s the idea of your business accessing, and processing, and taking advantage of data by using artificial intelligence or deep learning or machine learning. It’s come, that time has arrived. I believe this is the year we’re going to cross into zettabytes. So with the amount of data that’s at our disposal and what people are trying to work with, they’re looking for new ways to manage it all. The whole idea of drowning in your “data deluge,” people are just looking for something new. If they have to sit through another presentation on the four Vs of big data, they’re going to just stop. So now this gives them the opportunity to start saying, “Hey, we can do more with this data. We can actually access it faster, but we’re going to start using artificial intelligence. We’re going to start using machine learning and deep learning to really start taking advantage of the data.” That’s just the whole mentality that you’re no longer in a data deluge, you’re actually data hungry. You’re just opening an insatiable desire for data, and I think that’s where we’re going.
And when we get everybody talking that way, it’s amazing, we have this industry where people come to NVIDIA and say, “We need to accelerate. We need to get our applications accelerated on NVIDIA. We need to start having GPU acceleration, because frankly the doubling of traditional processors every 18-24 months just isn’t happening anymore. And when we come out with our new architecture like you saw at GTC we’re getting incredible gains. People can’t build their business, and plan their platforms only accelerating by 12% every two years. And so they’re coming to us to get acceleration. Personally, as someone who is in this space and knows a lot of the ecosystem, love the fact that I have companies coming to see me now saying, “How do we accelerate? How do we take advantage of GPUs? What’s it take to get there?” Once that starts up, you just see the tipping point. There was a keynote this morning where Jen-Hsun, his time in Europe – because GTC Europe is going on right now – announced our new partnership with SAP. So now SAP has seen the values of acceleration, they see the value of AI and deep learning as well. So there’s research saying GPUs are coming into the data centers, they’re coming into the applications that people are running for quite some time. I find it exciting, because it’s pretty much moving at the speed that deep learning has taken off the last couple of years. This shift that where people now know they need acceleration for other things.
Sign up for the free insideBIGDATA newsletter.