3 Questions With Michael Cavaretta, Data Scientist at Ford Motor Co.
Michael Cavaretta is technical leader of predictive analytics for Ford Motor Co., in Dearborn, Mich. He was hired by Ford 15 years ago, while he was working as a consultant for Churchill Systems doing data mining and statistical analysis. He holds a PhD in computer science, with an emphasis on artificial intelligence, from Wayne State University in Detroit.
What is Ford’s history with data science?
When I was hired, that was the first explosion of data mining on the scene, and the beginnings of business intelligence. The research arm of Ford decided this was an area they wanted to investigate, and hired me to lead that group. Back when I first started at Ford, we used statistical and machine learning techniques. We didn’t have big data technology like Hadoop and Hive and Pig, where you can store huge quantities of data and analyze it using parallel computers. Datasets were smaller, and it was much more traditional business intelligence. We’d use SQL queries to try to analyze things. Some of those are things we do today. But the datasets are much bigger, and the ability to get data from purchasing and mash it up with marketing and mash it up with manufacturing has really been enhanced the past couple of years.
How is Ford using data science?
We started with warranty analysis, then were looking at quality, manufacturing, QA, purchasing, even HR. In my time here, we’ve worked with just about every area of the company. You’d be hard-pressed to find an area we haven’t worked in. We’re kind of like an Ernst & Young, but just inside Ford as internal consultants.
We’ve had a number of analyses where we’re looking at social media and how people talk about their cars. We’ve taken a look at the automatic lift gate for the back end, and whether we should continue to have it as a feature. Should we just flip the glass, keeping the gate shut, or should we have the whole gate open? We looked at the comments and we found that, for the people who have the automatic lift gate, they really like that feature and would miss it.
We did a similar analysis having to do with three-blink automatic turn signals. The data was a little bit confusing when you looked at the survey responses, so we mined people talking about cars on the Internet. And people liked the three-blink feature once they got used to it. It took them a little while, but then they really enjoyed it. So we augment more traditional market research stuff with social media.
How does a company know when it needs a data scientist, and how should it find one when it does?
First we have to think about what we mean by “data scientist.” It’s an ill-defined term for the most part, with people rebranding themselves.
To know if you need one or not, what’s your business? If you’re a data broker, that’s what you do, the value of your company is based on the data you have. You’d better make sure you have a lot of data scientists and they have a good support team around them. If you’re a Fortune 10 company like Ford, you probably have enough data hanging around to get value from that data on things like improving your own internal processes.
But if you’re talking SMBs, you don’t need that. Maybe you need consulting to get you going in the right direction, and then your business analysts can keep the ball rolling. There’s a site for hosting data analytics competition, Kaggle. It’s hundreds of data scientists who compete with each other to make better predictions and classify data in a certain way. But the problem with putting it up as a contest is that it’s not going to work a hundred percent of the time, because the data scientists may say, this data we can use, and this data you may not be able to get in real time.
Data science involves computer and programming experience, the ability to work at statistics, visualization, and some knowledge of the business domain. That’s a tall, tall order. Finding someone with experience in all those elements is really tough. You can have those different skills in the data science team and get along pretty well. Maybe the focus should not be on the data scientist, but on the data science team.