At TrueCar It’s Time to Invest and Be Innovative with Big Data

Print Friendly, PDF & Email

Russell_Foltz-SmithHADOOP SUMMIT 2015 RECAP

At the recent Hadoop Summit 2015 in San Jose, I had the opportunity to sit down with Russell Foltz-Smith, VP of Data Platform with TrueCar, Inc. to discuss his company’s use of data in general and Hadoop specifically. The company’s data platform is based on the Hortonworks Hadoop distribution along with Spark. In this interview, we’ll get to the bottom for how this data-driven company has found success with Hadoop and Spark. TrueCar is an automotive pricing and information website for new and used car buyers and dealers. The company gives buyers information on what others have paid for cars, upfront pricing, and a network of TrueCar dealers.

Daniel – Managing Editor, insideBIGDATA

insideBIGDATA: Can you give me a little background on the company and how you’re using data?

Russell Foltz-Smith: TrueCar started as a company called Zag in 2005 for auto buying online. A couple of years later we launched to provide a hassle free car buying experience online.

The use of data is core to the company where you get confidence and transparency because you get access to the data. And that’s true for consumers, dealers, partners, car makers, banks, insurers, etc.

For our business, revenues are directly proportional to data growth. The more transparency you can drive, the more transactions you drive. You’re eliminating friction because you’re giving people data so they tend to complete transactions much faster and have a more robust transaction. What I mean by that is – an automotive purchase is a complex transaction – typically you’ll have a trade in, financing, insurance. A lot of transactions bundled into one. More data means there’s less chance the transaction goes sideways.

About four years ago, I was building up some technology for our trade products and about two years in we realized we needed to change course with what we were doing with data. While we were collecting a lot of data, the technology we put in place in 2005 clearly was not going to scale up. Our mobile traffic was starting to take off. In the mobile world data tends to expand very rapidly because the user’s expectations for real-time updates grow exponentially. We wanted to know where the user was geographically, like at a dealer, but also at what point in the transaction process.

As a result we were going to need a data platform at a scale that was 20 to 30 times what we had in place. At the time, I looked out and said, we’re going to be living at the edge, but we’re going to have to go to something like Hadoop and we’re going to have to do it now.

insideBIGDATA: When did your technology transition take place?

Russell Foltz-Smith: In mid-2013 we jumped head first with Hadoop. I told our executive management about our situation and that there were no short-cuts. We were going to have to train up a staff, hire people, and invest in this technology. It’s going to be a radical transformation because we were used to doing things the way they were done in the early 2000s.

So in the last two years, our data has grown 24 times – extraordinary recent growth.

TrueCar_datainsideBIGDATA: Was there any correlation with growth of data and revenue growth?

Russell Foltz-Smith: Absolutely, it allowed us to roll the rest of the business operations without worrying about the data. The way I always tried to build up a data platform was to make sure it’s not the limiting factor. I wanted to make sure we were not enforcing business decisions because we don’t have the capability, or because we can’t give people access to the data or because its too slow. So unlocking that became a force multiplier since we’ve been experiencing 30% year-over-year growth on revenue in the 10 years the company has been in business.

When we first started with Hadoop we had 6,000 certified dealers in our network, and we’re nearing 11,000 now. So just that growth alone, servicing those customers, with all the intelligence you must give them, and bringing in their data, it couldn’t have happened otherwise.

insideBIGDATA: How did you go about choosing your Hadoop distribution?

Russell Foltz-Smith: We had a very short evaluation period. The criteria was very straightforward for those making the decision. We’re very pro-open source due to talent acquisition because it involves a transferrable skillset and also lacks vender lock-in, which is important because we’ve had to change our technology several times in a 10 year period. The way we deliver our service is fairly complex and if it’s made more complex by vendors then that’s a problem.

With Hortonworks it was a partnership approach instead of just buying products. We wanted a seat at the table if we were to make this big 5, 10, 15, 20 year bet. We wanted to contribute to where the technology was going.

In addition, the selection was based on technology. Do we believe the technology components they have in place and will be putting in place are viable? We didn’t want proprietary management tools. Plus, the core technology must continue to evolve in terms of the raw Hadoop engine aspects. Also, we really appreciated their dedication to Hive and Stinger. We felt their implementation was more practical.

Lastly, frankly we just liked the Hortonworks people. We considered taking the Apache distributions directly and played with that a bit, but we wanted the support of a vendor ultimately. We were cautious early on but the training was exceptional for us.

Having Hortonworks going public a couple of months after we did meant our two companies had the same level of understanding while going through something similar. They understand our growing pains and needs – you have to move fast. We’re both in very competitive spaces.

insideBIGDATA: How do you feel about Hortonworks working to integrate Spark technology?

Russell Foltz-Smith: I think it’s great. We started playing with it and had a couple of technical conversations with them a couple of years ago. We know we have needs in this area – real time with machine learning, so we knew we’d have to pick a framework at some point. So this is an exciting development because it will be folded immediately into what they’re doing with the platform, and because we already have use cases with it, and we have familiarity with it. So this is a great development for us.

With Spark, we have some data science needs, but overall on any given day we’re pulling together over 12,000 data source within the industry that we must triangulate between them all – to get at the value of the vehicle, to understand which transactions occurred where. We use machine learning to connect all the different dots because in the lifecycle of a car and the lifecycle of the transactions, information tends to change quickly. The primary deployment of machine learning is the core of our business since it affords scalability.

insideBIGDATA: Do you have Spark in production right now?

Russell Foltz-Smith: We have back-end business operations and front-end technology that powers the website. We do not power anything real time on the website with Spark, but with back-end processes we do. We’re using machine learning with MLlib. I expect Spark to be part of our front-end technology in the future.

insideBIGDATA: What does your crystal ball have in store for your technology future?

Russell Foltz-Smith: We’re looking at a fully mobile experience for us. We’re going to have to be an intelligent service that can be called upon in any way, including non-linear ways. We’re developing this platform that has “Google Now”-like functionality just purely devoted to automotive where less and less input from the user is required. That’s where we’re trying to lead our data platform.

We feel it is the time to invest and be innovative once you go public.


Sign up for the free insideBIGDATA newsletter.



Speak Your Mind



  1. Gerry Kurl says

    I saw this brain beast at the Hadoop Summit. He made big data basic, sensual and nuclear all at the same time. His view on where we can all take this inspired me to start my own research on spatial data spheres influencing prison gang related circular mind roots. If you can solve cars, I can solve gang shanking in the prison yard through prescriptive security tactics. Thanks Dude 🙂