Sign up for our newsletter and get the latest big data news and analysis.

Interview: Dr. Danielle Dean, Senior Data Scientist Lead at Microsoft


In this podcast interview, I caught up with Dr. Danielle Dean, Senior Data Scientist Lead at Microsoft in the Algorithms and Data Science Group within the Cloud and Enterprise Division, to find out about her experience at Microsoft and get her take on the upward trajectory of AI and deep learning that we’re seeing in the industry today. She currently leads an international team of data scientists and engineers to build predictive analytics and machine learning solutions for external companies utilizing the Cortana Intelligence Suite. Before working at Microsoft, Danielle was a data scientist at Nokia, where she produced business value and insights from big data, through data mining & statistical modeling on data-driven projects that impacted a range of businesses, products and initiatives. Danielle completed her Ph.D. in quantitative psychology with a concentration in biostatistics at the University of North Carolina at Chapel Hill in 2015. She obtained her Masters in quantitative psychology in 2012, and graduated from the University of Massachusetts Amherst in 2010 with two Bachelor’s Degrees, in Psychology and BDIC in Organizational Behavior & Statistical Analysis with a minor in Mathematics/Statistics.

Daniel – Managing Editor, insideBIGDATA

 

insideBIGDATA: Welcome to today’s insideBIGDATA Podcast. I am Daniel Gutierrez, insideBIGDATA’s managing editor and resident data scientist. Today, I am here with Dr. Danielle Alden, a senior data scientist lead at Microsoft in the algorithms and data science group within the Cloud and Enterprise division. Danielle, it’s great to have you with us.

Danielle Dean: It’s so great to be here today. Thanks for having me.

insideBIGDATA: Well, why don’t we just dive right in? Danielle, please tell us a little bit about your background and your experience at Microsoft. I understand you come from the field of Quantitative Psychology and Biostatistics. That’s really interesting, because I have some friends who have some similar backgrounds, and they’ve transitioned into data science. So it appears that this may be more common than people might think.

Danielle Dean: Yeah, definitely. I think one of the really fun parts about data science is people come from such vastly different backgrounds. So I’ll say, when I was growing up, I had no clue what I wanted to become, and Data Scientist wasn’t even really a thing until a few years ago anyway. When I was studying undergraduate, I was actually a psychology major with my primary focus, although I studied mathematics as well. I actually created my own major in organizational behavior through statistical analysis. I was really interested in how people think, and learn, and really individual behavior and analyzing individual behavior, but I was also really interested in the mathematics and statistics. Then I realized we can actually use math and statistics, to actually analyze human behavior and how things work and how things function on a larger scale. So I started shifting my focus and my academic work into, “How can we use statistics?” and study psychological phenomena, how people develop, how things work over time, and models and frameworks for that.

So I shifted in my Ph.D. to study quantitative psychology, which is really a lot of the statistics and modeling side of how to actually use data to analyze behavior and how people change and how people work. So I did a lot of work on the statistics side and then using biostatistics as well, with things like survival analysis models for how long things happen, and event history models and so forth. But then I got into data science through an internship at Nokia analyzing lots of different data, and patterns, and behavior, and how things work with internal projects at Nokia, doing data science and working with people with a lot of different backgrounds on projects there, doing things like warranty analytics, influence sales, and a lot of different projects where we could use data science.

And then shifted to working at Microsoft, where I do a lot of work using analytics to, again, solve a lot of different customer problems. So I merged over time from looking at human behavior and using statistics, to ending up doing data science. I think one of the fun parts about doing data science is the different backgrounds and experiences in working with people of different backgrounds.

insideBIGDATA: Danielle, that’s great. It’s really fascinating to me as a data scientist myself, to see how people transition into the field so I’m glad you made it! But can you tell us a little bit more about your current projects at Microsoft?

Danielle Dean: I have a really fun role at Microsoft, where I’m actually in the product organization. Microsoft is obviously building a lot of analytics products, so our whole Azure platform with the cloud infrastructure as well with SQL R services and Microsoft R server. We’ve built all these analytics products, but obviously, we want our products to be really useful and impactful to our end customers. So my role at Microsoft is working with external companies, external customers, ISP’s, partners and so forth and building out custom solutions for different business problems.

A couple of quick examples: working with the companies doing predictive maintenance, trying to understand how we can utilize all the data available to understand when something might fail in the future so that we can proactively schedule maintenance ahead of time so that things don’t actually fail. The other problems in, for example, healthcare, trying to understand when things will go wrong in the future, or help with financial billing and better scheduling of customers and so forth. So there are a lot of different use cases that different companies watch us solve. What we do on everyday basis is work with customers building out these solutions using Microsoft Analytics products. But then almost as importantly is we’re using that work in order to make sure that our products really can solve customer problems. So we get that feedback directly with the product teams to make sure that they improve. So it’s a really fun role because I both get to build custom stuff, build analytic solutions with customers, and then also work with the product teams to make sure those products are improved over time so that as we move forward we can continue solving bigger and even greater problems.

insideBIGDATA: Wow, you really have your plate full with some very interesting use cases. Now, I understand you’re participating in the AI immersion workshop being held next month in Seattle, and the topic is, “Applying AI at Scale.” I find this area fascinating. Can you tell us a little bit about how Microsoft is approaching planet scale AI?

Danielle Dean: So this is really fascinating because there’s so much you can do today, with cloud infrastructure and with the ability to really scale out processing, things that you really couldn’t do in the past, but the cloud’s really opening it up for you. I’d say there’s really a big spectrum of things that Microsoft is doing. From ready-to-use AI apps, so solutions that Microsoft is building directly or our customers are building with their end customers, then there’s also ready-to-use AI, so things that are enabled for developers to utilize, things like translation services or sentiment analysis, pre-built API’s that are ready to use, but are built with the power of the cloud, so that they scale out to what is needed by the end customer. And then also the infrastructure to actually create your own AI application, along with the ability to train at scale, the ability to create new models at scale, the ability to score at scale. So really there’s a spectrum of things from really finished applications, to services that are ready for developers that might not know too much of AI or data science themselves, to actually the infrastructure to build this stuff yourself.

So a lot of what we’re doing at events like the AI immersion workshop is to show people how you can utilize that range, whether you want to use services that are already built. As one example, a Microsoft Translator app, where you can speak in one language and then out comes another language, is really opening up new opportunities for global interaction. So that’s AI services. Then in AI applications, developers who want to build in AI functionality – maybe they have their own application and they want to build translation into that, or sentiment analysis. So enabling developers to utilize AI, which behind the scenes is built to be done at scale using that infrastructure. And then also a lot of opportunities on how to create your own AI, so some of the things that we’re doing at that AI immersion workshop are showing how you can use services like Azure Batch Shipyard to actually really both train AI applications at scale and score at scale.

A lot of these really sophisticated deep learning models, they really need a lot of data and a lot of processing power, things like GPU and so forth, in order to create good models that fit well for the application. It’s the power of the cloud and infrastructure that’s available, the ability to spin up both so that you have the scale and power that you need, so that you don’t waste your time and money on having the infrastructure available at all times. You can use the power of the cloud to really scale up and only use it when you need to. So at those types of workshops, we’re really trying to showcase the spectrum of what’s available and then people can use what they need, depending on what they’re looking for.

insideBIGDATA: That sounds like a very exciting workshop. And based on what you just said, it’s very much in line with what I keep hearing about how AI has really come into its own, after languishing for a couple of decades. It’s good to see. It’s real, and it’s happening. Very exciting. But let’s change gears a bit, and on a more philosophical note, what are your thoughts about diversity in the field of data science? And how do you see more women becoming data scientists?

Danielle Dean: Yeah, that’s a great question. I personally, in my life, have been super fortunate to have amazing mentors. To take one example, my mom is actually also the field of big data and AI. And so I, personally, find the role of mentors and role models to be super important. And I think that is one way we can really increase diversity, is just having a lot of role models and mentors out there. As an example, I work with the Girls Who Code organization, to show people who are really young in their careers what you can do though these types of careers. I think a lot of what we can do can be done through this role modeling and mentoring.

One really awesome part about data science is that we get to work with people from very different backgrounds, and you can come into data science from very different backgrounds. You don’t have to have a traditional computer science background in order to do that. Take me as an example, coming from quantitative psychology and biostatistics. And I think one really nice part about data science is because we can draw from other different fields who have traditionally more women than computer science, there are more women in data science than I’ve seen in some of these other fields. In my team at Microsoft, we’re actually about 50% women, so that’s been really awesome to work with a much more diverse group than traditionally seen in technology, and in computer science in general. So I think one thing we can do is create more role models, and then draw from other fields, and then hopefully over time, more and more women and underrepresented individuals can be seen in these types of fields.

insideBIGDATA: That’s great to hear about how well-represented women are at Microsoft. And we’re seeing that here too at insideBIGDATA. More and more of our contributors are women, and it’s great to see that level of diversity. Now, I believe you wrote a chapter for a recent Microsoft eBook called Data Science with Microsoft SQL Server 2016, and your topic was predictive maintenance in IOT. So can you give us a quick synopsis of what you wrote about.

Danielle Dean: Yeah. So in this chapter, I was talking specifically about how you can use SQL R services to do predicted maintenance type concepts, and so I’ll just briefly describe predictive maintenance. Predictive maintenance is not a new concept by any means. Think about your car. You might want to take it to the dealership to do an oil change, maybe every six months or 5,000 miles or whatever is recommended for your car. But we don’t actually want to do maintenance based on those little indicators. In the world of the Internet-of-Things (IOT) and the world of enhanced data, we have so much more data at our fingertips. We know when people needed service. We have so many more indicators, so many more censors, built into all these devices, all these things going on around us.

So, how do we utilize all that data to proactively maintain things in a much better way? This can be done in everything from aerospace to manufacturing lines, to even things that are used in service. And so, in the chapter, I’m describing a lot of what you have to do when you’re considering predictive maintenance, from what type of data do you need to collect? What are the labels? For example, when were the historical failures that you’re going to actually create the  machine learning model to predict? How do you actually know if you have data that’s good enough quality? What are the different types of used cases, and what you can do with it? And then of course, how do you actually approach it using example technology, in this case, SQL R services? How do you that processing?

In this case, we’re showing how you can actually run the machine learning models inside of the database, and how you can actually consume those from end applications. So taking it from the beginning of the data science process and, “How do you create the news case? How do you create the business scenario? What data do you need?” all the way to how you implement and utilize the end scenario.

insideBIGDATA: Sounds like a great eBbook, if nothing more for your chapter, so thanks for that rundown. Well, that’s my last question, Danielle. I’d like to thank you for all the great insight you’ve provided and I’m really glad you could join us today.

Danielle Dean: Thank you so much for having me.

 

Download the MP3

Leave a Comment

*

Resource Links: