How Crime Location Knowledge is a Proxy for Uber Demand
Today is Uber: Freakonomics edition.
We show how where crimes occur — specifically prostitution, alcohol, theft, and burglary — can improve Uber’s demand prediction models.
As you know, Uber Team Science (below) is busy nerding it up around the Uber offices all day adding numbers together, pouring colored liquids into beakers; that sort of thing: One of the most important jobs we do (second only to keeping our Uber Science mutants securely locked up!) is to accurately predict demand to make sure you get a car when you want one. One way to predict demand is to know when people want to ride with us. Another factor is knowing where people will want rides. Location is important to us because proper supply positioning lets us reduce pickup times. (For example, we see an increase in trips in SoMA near AT&T Park before and after Giants games.)
The first technical challenge: determining an easy, intuitive way to break a city into discrete “places” to begin with. Mathematically this isn’t necessary, but for communicating the data internally, it’s important. Thanks to Zillow we were able to extract the complex boundaries for neighborhoods in each Uber city. Check out the 34 neighborhoods in San Francisco: First: not all neighborhoods are part of this map. The Tender Nob may be the trendy place to be, but if we had to figure out all of the boundaries of all of the sub-neighborhoods of San Francisco, we’d be able to figure out the length of the British coastline (nerd joke). Even with the neighborhoods we have, figuring out whether or not a geographic point is inside one of those complicated shapes is complicated, but we got this. The first thing we did was to look at how many trips we’ve done per neighborhood in SF:
We hypothesized that crime should be a proxy for non-residential population density. According to the data from San Francisco Crimespotting (source: Stamen Design), there were 75,488 crimes in San Francisco since Uber’s launch on 2010 June 01. These crime data are broken down into 12 categories: murder, robbery, aggravated assault, simple assault, arson, theft, vehicle theft, burglary, vandalism, narcotics, alcohol, and prostitution. Let’s map crime by neighborhood (deeper, dark red is more crime): If it looks kind of like the trips map to you, that’s because the 2 are decently correlated (r = 0.56, p < 0.001). (For you math sticklers, crime and trip data are log distributed by neighborhood, so all correlations are Spearman rank correlations, but log-log Pearson correlations give approximately the same results.)
Neighborhoods with more crime have more people…and more Uber rides.
But are any specific crimes better predictors of rides than others? We looked at the correlation between the number of each type of crime and the number of trips we’ve done in each neighborhood. All types of crime except murder, vehicle theft, and arson were positively correlated with number of trips. After correcting for multiple comparisons, 4 crimes remained significantly correlated (p < 0.05, Bonferroni corrected):
In other words:
Areas of San Francisco with the most prostitution, alcohol, theft, and burglary also have the most Uber rides. Be safe, Uberites!
Of course this isn’t causal. Uber riders are not causing more crime. Right guys? This effect probably reflects population density in terms of where people socialize: the more people that are hanging out in an area, the more prostitution, alcohol, and theft there is. Makes sense. Now, let’s go back to the timing thing. We know that Uber rides change by hour and day of week. What about crime? Across all crimes there’s not much variation in the total number of crimes between days. However within a day there’s a lot of ups and downs. It turns out that the number of crimes peaks between 6 and 8pm.
But there was one surprise. One crime, beyond all the rest, had a specifically BIG peak on a specific day. Prostitution. On Wednesday nights. This was so surprising that we doubled-checked the effect by looking at crimes in Oakland, too. Oakland Crimespotting also had a lot more data: 152,730 crimes in the database since 2008 Jan 01. We got the same effect: Now mind you, at this point we’re straying from the Uber ride-prediction path. Crime is a good proxy for the “activity” of a city, but the timing of the crimes doesn’t really correlate with our ride patterns. Why Wednesday nights?! I even stopped to talk to 2 Berkeley cops to see if they knew why prostitution peaked at this time (seriously). They had no idea. But then someone pointed out to me that Social Security and welfare checks arrive on the second, third, and fourth Wednesdays of each month. Oh man. Now we’re into dangerous, politically-charged territory. Keep in mind we’re only talking about 4-5 prostitution crimes each Wednesday. This is pretty low considering the cities we’re talking about have populations in the hundreds of thousands to millions. Well, it turns out that there are significantly more prostitution crimes on the second Wednesday of each month compared to the first (p < 0.01): One possibility is that on the second Wednesday, people get their checks after two weeks without any income. The first Wednesday: no checks. Second Wednesday: cash in hand! It might be that any time there’s an influx of cash into a city, there’s also a bump in prostitution crimes. That’s harder to check, but worth following up. We don’t see this effect for any other types of crimes. Just prostitution. And although we’re talking about a difference of, on average, just a few extra cases of prostitution, because we have so much data we can get a good assessment of the statistical significance of this effect.
This one of the coolest things about working for a data-driven company like Uber: on the surface we’re a technology company revolutionizing transportation, but below the hood there are so many ways to look at our data. And sometimes that freedom to play leads to interesting results which aren’t immediately relevant to the core part of our business. This finding is a perfect example of the fascinating insights you can get when you combine big, seemingly disparate datasets. By trying to figure out how to predict where to position our cars, we got a peek at the ebb and flow of the life and crimes of San Francisco.
Expect more of these kinds of posts in the next couple of weeks. We’ve got a lot of cool stuff in store, I promise you!