Interview: KDD2019 Co-General Chairs Ankur Teredesai & Vipin Kumar

Print Friendly, PDF & Email

During my trip to KDD2019 in August, I had the pleasure of sitting down to chat with the co-chairs of the conference, Ankur Teredesai and Vipin Kumar. In the interview that follows, we discuss the growth of the KDD conference over the years, and also it’s changing focus. You can read my KDD2019 Field Report HERE.

insideBIGDATA: Please give our audience a brief introduction including your role here at KDD and also a little bit about your day job.

Ankur Teredesai: Sure, my name is Ankur Teredesai. I am the chief technology officer and co-founder of a company called KenSci. I’m also a Professor of Computer Science & Systems at the School of Engineering & Technology, University of Washington Tacoma where I’ve been teaching since 2006. As part of my educational career, I’ve spent around 25 years focused on machine learning, practicing data science. I’ve really seen the revolution from statistics to big data, to AI, to data science and I’ve thoroughly enjoyed it. It’s a great place to be and I’ve been part of the security community so that’s my day job and my role. I lead the Healthcare AI community as well in shaping and forming our perception of how we make healthcare decisions more data-driven and how to change the perception of physicians that AI is here to help and not replace them. I’m on that big mission.

With KDD I was the information director for KDD for the past 10 years. Essentially I was the spokesman for KDD. I did all the web development tour and making sure that KDD communications are in order, coordinating. So I’ve grown with the community as the community has grown. I learn more about  this conference and how to handle different aspects of it. Two years ago when there was a little bit of a gap where folks were not sure who’s going to lead the conference in Anchorage because we’ve already decided Anchorage as a venue. They asked me to step up and if I would take that role.

Vipin Kumar: My role would be similar to Ankur’s because we are trying to make sure we put a team together, get out of the way, and let them run the show. I guess the history goes back more than 25 years and we do have professional staff to help so this is a big organization. In my day job, I am a professor of computer science at the University of Minnesota. I’ve been in academia for almost 36 years and working in this field all along. The field’s names have changed from time to time. We used to call it artificial intelligence. Then it became data mining. Then it became predictive analytics, then data science and now it’s back to being called machine learning and AI. So labels have changed but we keep doing the same thing. The generation of computer algorithms have become more powerful every decade. We’re seeing the times changing now, where everybody in the world is interested in AI and how they can apply it.

insideBIGDATA: I think we’ve all seen a number of so-called “AI winters” where the technology didn’t fulfill the promises, right?

Vipin Kumar: Or sometimes people expect too much. I remember around ’84, or ’85, which was the previous big hype of AI. People were purchasing computers worth $100,000 just to run AI algorithms. So it would be in today’s terms, a three, four hundred thousand dollar machine. So nobody buys AI on their desk at that cost. So the promise was overhyped. And it also didn’t deliver. After a couple of years, it died down. But then the field of data mining came up. It was much more focused. It wasn’t promising intelligence. It was promising predictive analytics. It sort of jump-started, in many ways, as the field of this conference. It had a big role of jump-starting, I would say, the next resurgence. It naturally became more closely related with big data and data science and then machine learning. It took 20 years to reach this point. So we feel a lot of these ups and downs.

Ankur Teredesai: I frame it like a triad of three forces coming together. The first one is the advent of cloud and making compute very accessible and cheap to a certain extent. And now we have the compute power that it is not $100,000 but $3,000.

Vipin Kumar: And it is 100 times more expensive but no more powerful than what we have.

Ankur Teredesai: We sent a man on the moon with less compute power than what we have in our iPhone, or even watches today. The second force that I see has fundamentally changed or transformed, is the reliability of data, especially in highly regulated markets. So it was impossible for early data science or machine learning developers or scientists to have access to regulated data sets like health care or finance or criminal justice or banking, etc. And that has fundamentally changed the way that we look at AI for wellness ethics. So they reviewed the policy but at the same time more openness to explore issues. The third force, I believe, is really regulation and in policy. So there’s much more awareness that without proper infrastructure and investments from the government in shaping the policies on the AI, it’s going to be all done in the wild. So those three forces have really come together. To make the really concrete example of that, would be Affordable Care Act back in the day. You had the high tech act that forced all the health care systems to digitize themselves and make medical records electronic. Then that’s formed a generation of data collection within the systems. With Affordable Care Act, there was a huge incentive, to now make that data actionable. The affordable health care act was not so much about patients, honestly, it was more about making sure that decisions, that health care systems are making are more accountable. So that policy shaped– combine that with the availability of compute power to actually handle and manage that data, and transforming so that it was ready for machine learning and AI, plus the availability of the cloud.

insideBIGDATA: How has this sort of transformation in our industry affected the conference, KDD? How has it evolved since 1995 when the first KDD happened?

Vipin Kumar: Yes, but then it sort of dates back to 1989, with smaller workshops. So 25 years of conference, five years before that the smaller workshops.

So it sort of started in the late ’80s and many of the sparks that you can see from the field go back to the late ’80s, and if you go back to 1970 and earlier, so this community of artificial intelligence, people were trying to build a machine that truly could become intelligent. People were trying to investigate, even back in those early days, how do you build intelligent machines? Then a group of people started thinking about these algorithms that could do pattern recognition and look at the images and then find things in them. So these algorithms were sort of more– they were not looking for intelligence, they were trying to get something done. So at that time, this community then was pushed out of the AI umbrella and they started this conference. A lot of the talks here, a lot of the work on AI, so the business community has sort of come back together, I believe, with deep learning frameworks. Then, in the late ’80s, some statisticians and some algorithm designers independently developed simple algorithms. That started showing promise and that sort of started this trend of – what can we do with algorithms? And that sort of started this new trend of data mining. What can we do with big scale data? Different generations of algorithms have come about.

People have been analyzing data for as long as we have been alive. It’s like astronomers having to look at the sky and trying to figure out what’s happening up in the heavens. But then the statistician came along and they started analyzing data, doing the science of data. But the generation of algorithms that have come about, say every 10 years, you can say for this decade this was the highlight. Every day you see new innovations, and I think the confluence we see today of machine learning is comprised of all of these developments.

I’ll give you one more example. In 1950 one of the founders of computer science Herbert Simon predicted that within 20 years computers would beat humans at chess. And then 1970 came and nothing happened. Computers were still struggling. The first time a computer was able to beat a chess champion was in 1995 with IBM’s Deep Blue with top class people working on it for decades. It was considered to be a huge milestone where we could fulfill the promise of Herbert Simon so many years ago. This algorithm had a lot of expertise downloaded from the chess experts into the computer program that knew nothing about chess other than rules.

We have to realize that there have been developments in computing, tremendous amount of computing power which nobody could have imagined that in the last 40 years, and also data availability, and data regulation. But the generation of algorithms, and this is what this community is about in the sense that the evolution came because the computer science community kept developing faster and faster computers. That credit can be fully claimed by the field of computer science because that’s what computer scientists were designed to do. And the second thing is that we are trained in our profession to come up with new tricks, new algorithms, and new recipes.

insideBIGDATA: Given the acceleration of our industry and I think the acceleration has just increased in the last five years to an incredible extent. How does that play out?

Vipin Kumar:  There has been acceleration happening every decade that I can point to. This one is just so amazing.

insideBIGDATA: So how do you translate that acceleration into content for the KDD conference? I mean, how do you feel a sense of how the industry is changing, to make sure that you offer content at the show to attract attendees and please them, and make sure that they get what they came for?

Vipin Kumar: One way to think about it is that– I attended this conference in the ’90s, versus I’m attending this conference going back to ’95, ’96. So the question would be what kinds of things were being talked about in the ’90s or what kind of things were being talked about in 2000 or in 2010, versus now. I have all the proceeding going back to the beginning on my bookshelf. One thing that you would notice is that today there is not a single aspect of our life that’s not being touched by these algorithms. You name it, and I will give example. If I can’t, Ankur will find an example. I mean, you just think of anything – it’s amazing that areas we would have thought that this technology would never touch, ever. But now we’re finding applications there.

insideBIGDATA: I think that’s new. I mean, the fact that it’s so pervasive. Every industry. Every walk of life is being touched by it. I don’t think that’s ever happened before.

Ankur Teredesai: The only thing I would add to that is there are conferences, and there are conferences. The one things that is unique about KDD as a conference is the early founders of KDD and folks who attended including women and others. So I’ve been involved in KDD for last 17 years or so, and what I loved about the community the first time I attended the conference was it’s a great home for both applied and theoretical researchers. So that early interaction between folks who are ready real-world application-minded brought in that agility to the conference. Where we didn’t pin ourselves down to saying, “Hey, data mining is one thing, and that’s all that we wanted to do.” We kept the doors open for evolving that community, to shape it side by side with the industry. We always had industrial, but the ratios used to be different where it was primarily an academic conference that had a few industry participants. And then over the years, especially the last decade, we seen a change in the numbers as we’ve gone a lot toward industry.

So if you think about the structure of the conference, you have the research track and you have your applied track. The applied data science track is very impactful because this is where the industry gets to share things that are in progress as well as deployed. So there’s a huge emphasis on deploying the algorithms that are being developed in research. So that’s why in my talk yesterday, I was focusing on how do we go there. So we have focused the last decade on going from research to industry faster. But the question that I want to encourage the audience to think is, “Should we focus on going faster, or is it time now to focus on doing it better?” Sometimes those two can be very orthogonal. And the position that I’m taking today is I’m putting a stake in the ground saying we have understood how to go faster. Now, we need to invest significant energy go deeper, and get better at translating the results from research to industry. That’s one aspect of it.

The second aspect of it is this recent introduction of accepting that data mining, data science, AI, machine learning is starting to get very verticalized. There is significant domain expertise that is needed in order to solve a problem end-to-end. So Vipin has been working on multiple domains in his career from collaborating with environmental folks to healthcare folks to advertising folks to search engine folks. And that’s very characteristic. And I have done a similar type of a journey where from advertising to social networks to healthcare, etc. And what we have done now that is very interesting and different is, we have added this concept of “team days” at the conference so that you can start small movements and be inclusive of inviting epidemiologists, inviting Earth and geospatial sciences researchers, inviting folks that are working with other data science topics like deep learning, but still find a home in KDD.

So I see that as long as we continue to foster that spirit of diversity, of topical thinking, and be more broad, this community will thrive.

insideBIGDATA: I think you’ve succeeded in communicating that message of the quality of split between industry and academia. Because I was at the conference lunch earlier today and I was talking to some attendees. I asked “What is your perspective of KDD?” One gentleman basically repeated what you just said. He reported that he’s gone to other conferences like NIPS and few others and he said “This one is kind of evenly split with industry.” So that was coming from some random attendee, which is just pretty cool. So just briefly, referring to your crystal ball, what do you think will happen with the conference in these next few years?

Ankur Teredesai: I think there’s going to be an amazing growth in this conference. More than the conference, I feel very proud that we have set up a community in the right direction because the conference is just the tip of the iceberg. There is the whole iceberg of community underneath it that helps ensure that the best minds in the world who are working in this field submit their papers to this venue. The process of reviewing those papers and making sure that they are high quality, meet the bar, by ensuring that we continue the double-blind review process and make it fair and open to everyone, not just those scientists who are well funded with deep pockets and have access to specially controlled data sets. It comes out in the paper. So investing in that is going to be the next big challenge for the community.

In terms of growth, I think we have no doubt that next year in San Diego, we’re going to see 5000 people, hopefully. It’s just like any startup in any industry. The first thousand people are the hardest to get. Then once you have a whole community of your first 500 to 1,000 people, the acceleration from 1,000 to 2,000 is—troublesome, a lot of people come and go, but you sustain some sort of a movement for three, four years. When you reach a size of 3,000 in a logistically difficult place to get to such as Anchorage, people have expended time and resources to reach this place. And we had to cap the registration a month ago with 3,200 attendees. So we are there, and for next year and beyond.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind