Hello Uberites! Henry here, Uber data guy throwing down another post for the #uberdata. Recently we launched in NYC, and for us that means mo money, mo problems.

Take estimated arrival times, for example. When we launch a new city, we simply don’t have historical data to draw estimates from. That’s a problem because not only are ETAs woven into virtually every corner of our supply chain and dispatch systems, but we also show them to riders to make decisions based on wait times.

So we don’t have estimates at city launch, but Google does. Google has services that predict travel times and that’s what we used to start in NYC. Unfortunately, we found that Google’s ETA predictions were, on average, off by 3.6x the actual pickup time in NYC during our first week. Thanks, New York crosstown traffic and congestion!

Now, we at Uber are like, smart and stuff, so we’re working on using our own algorithms instead of Google API for ETAs.  And as our data shows, we’re better at it:

Uber vs. Google

Yup, we're doin' better.

We measure our predictor’s accuracy using the mean square error; the lower the error, the better. And as the next graph shows, as we accumulate rides we’re also getting even better by the day:

MSE Google vs. Uber

Over time, the gap between our predictor's accuracy and Google's widens in our favor. Math!

To be completely fair, we’re not claiming we’re better than Google. Our domain is more restricted – with reliable and experienced drivers from which we can pull real-time data from. Besides, Google never gave claims of accuracy (“it’s for planning purposes, blah blah blah”) and they’re great for almost everything else, such as geocoding street addresses into latitude/longitudes and then back again. In other words, we love Google APIs for everything except for accurate ETAs.

ETAs are just one, albeit important, part of our pool of #uberdata projects. As we improve on other areas, such as demand prediction or supply positioning, ETA accuracy can lag behind. In fact, some methods to improve actual wait times may negatively impact our predictor. So we’re in a give and take here, and we iterate on each of the core projects to make the system truly Uber in the long run.

Regardless, expect all things Uber to become better, if not more accurate, as ridership continues to shoot through the roof.

‘Til next time.

#uberdata is series of posts by the Uber Engineering Team. We’re highlighting cool and amazing things that data and maths can show us to make Uber a better product. And occasionally make us chuckle.