Uberdata: Rain’s effect on Uber ridership
What up humans?! Bradley Voytek here to bring you the first of what will be many #uberdata posts. So far you’ve read some great posts by Travis and Austin about the awesome and amusing Uber happenings. Those are about Uber’s business and consumer service.
This is about science.
We here at Uber are fans of #bigdata… but being Uber, we don’t just stop at #bigdata. We bring the #uberdata.
Before we delve too deeply, however, we wanted to establish a few basics. Lay the groundwork and get comfortable. Consider this post an introduction for all of us and all of you.
The first thing we wanted to do was take a look at our ridership over the first few months we’ve been in business. Remember, we’ve only been around for about 7 months.
Check out that growth! Very nice.
This plot shows the cumulative number of rides taken since our founding, broken down by week, up to the end of February. Each bar represents the total rides taken up to the end of that week. However, we wanted to separate out weekdays (Sunday-Thursday) from weekends (Friday and Saturday). In this plot you can compare the total number of weekday rides (light blue, in back) we’ve given with the total number of weekend rides (dark blue, in front). Note that the scale on the y-axis for these two charts is exactly the same.
Using this metric and this type of plot, you can clearly see that we give almost as many rides during the two weekend days as we do during the other five week days combined. It’s important to point out here that we begin our days at 4am; this means that Sunday at 2am (when the bars close in San Francisco) is counted as part of Saturday.
Apparently people really like to roll out in style with us on Friday and Saturday nights!
We can look at this slightly differently by breaking it down into day-by-day averages:
Okay, so that’s the beginning. We’re seeing legitimate exponential growth, likely from a whole host of factors but there are a few really big contributors to the number of completed rides on Uber. We call them systemic events: holidays, special events, and weather. Each of these is fairly significant, but the weather really stands out.
Granted, New Year’s Eve was a crazy busy night, but that deserves a whole other post on its own…
So what effect does weather have in our overall growth? The prevailing notion around Uber has been that people took more rides when it rained. I wanted to see if that was true. Mathematically.
One issue is that Uber’s only been in business since June and, as you know, the weather in San Francisco through November was exceptionally great. So when the rest of the country was about to experience snowmageddon/icepocalpyse, those of us in SF were throwing back a few cold ones in Dolores or spending afternoons with bloodies at Zeitgeist.
But sadly, that didn’t last, and now our all-too-familiar springtime weather is returning:
In order to even start looking at weather effects, the first thing we had to do was find daily weather statistics. Thanks to the NOAA, by the way, for having a PDF-based API. AWESOME. That was fun.
But what’s our hypothesis? The most simple would be to say that, on days that it rains, ridership increases. But increases over what?
Beyond what we’d expect to see if it hadn’t rained that day.
So how can we look at this, given the data we have? We can’t just compare rainy days and non-rainy days or else we’d potentially get a false positive result. Why? Well it turns out it didn’t rain too much between June and about mid-November, but after that it started to rain a lot more. Check it out:
This graph shows three things. First, it shows our growth over time. Again, this also shows that our ridership is growing exponentially with or without rain. In fact, according to this chart, by winter 2014 Uber will be giving rides to the entire population of Earth every day. True story. It’s science. Clearly the Uber Singularity is at hand! (Not really, I just don’t understand exponential growth (yes I do (but Wall St. doesn’t seem to (LISP)))).
The dots are colored red and blue, where red are the weeks prior to mid-November (before the serious rain) and blue are dots after the rain starts.
The top inset bar chart shows the total rainfall for these two time periods (the red bar before the rain, blue bar after the rain). Like I said, there’s a big difference. In fact, we got 10 times more rain in the last 3.5 months than we did in the first 5.5 months. The bottom inset bar chart shows the median number of rides taken during these two time periods (same coloring). Also a big difference.
But did the rain cause our growth? Unlikely. Does the rain influence day-to-day ridership? Almost certainly.
How do we check that statistically?
If we compare the average number of rides on days with rain versus the number of rides on days without rain, we get a huge difference, but that difference would be highly biased by our business growth, meaning it would probably show up regardless of whether or not there was rain.
We tackled this issue in a number of ways and got similar results for each. For simplicity, we’re going to focus on the most intuitive model. We took a look at the number of rides we got on every day that it rained. Then we estimated how many rides we would have gotten on that day, and compared the two. So if we got 500 rides on a rainy Tuesday, but our estimate said we should have gotten only 250 rides, then we counted that as a 100% increase due to rain.
How did we get the estimate? Well, let’s take our rainy Tuesday example. Let’s say we got 500 rides. But the Tuesday before that we got 200 rides, and the Tuesday after that we got 300 rides, our estimate would suggest that we should see about 250 rides ((200 + 300) / 2). But instead we saw 500! (In practice, we can get a span of estimates and do some sophisticated resampling statistics to verify our estimates and results, but we don’t need to get into that for now). And of course, we only did this analysis if it wasn’t raining the Tuesday before or after; that would have biased our results.
Well then, what effect does rain have? What did we find? As an example, here are the rides on the three most rainy days:
The dark bars in front are our estimate of rides based on the average of the prior and following weeks. The lighter bars in back are the actual number of rides we gave! The percentages values in white are how many more rides we gave on that day compared to our estimates.
But check it out: on the ends we see that our first and third rainiest days were weekdays (a Thursday and a Wednesday, respectively). Apparently if it rains on weekdays, we see a modest, but significant increase in rides.
That middle bar is the second rainiest day, and it happened to be a Saturday. Here we see a three-fold increase! Crazy!
On average, across all rainy days, we see a 29% increase in rides over our estimate. Unsurprisingly, the more it rains, the more of an increase we see (that is, rain quantity and # of rides are correlated).
So what have we learned?
It would seem that business growth, weather, and day of the week all contribute to how many rides we give. When we include all of these factors into a single model, we are able to explain 82% of the day-to-day variance in the number of riders we see.
My guess is that the last 18% is explained by other black swan events such as holidays. Or by our drunk friends who are fed up with trying to fight 700 other drunkards for a cab in the Mission at 2:10am. All we need to add to our models are data about the sobriety levels of our riders. Start leaving more informative tweets, people! There’s science to do!
The moral of the data is: if you’re out partying on Saturday, and the weather sucks, and you’re not living in the past, you’re really in the minority if you’re not using Uber. So save yourself the headache and summon an Uber car as soon as your watering hole does last call. You’ll be much happier.