Uberdata: Mapping the San Franciscome
What up humans?! Bradley Voytek here again to talk some networks and neuroscience at you in today’s edition of #uberdata.
In this post I map San Francisco’s “flow” and look at where people go and when. For example:
* Where do men or women go on weekend nights?
* Where are San Francisco’s iPhone users hanging out?
* Where do people work and play?
Lately a surprising number of people have asked me what the hell a neuroscientist is doing working with Uber.
Other than the fact that the people are amazing and the problems are challenging, believe it or not there’s a lot of overlap in terms of the technical skills required to solving problems in both domains. One of the new tricks I learned with Uber’s head of engineering Curtis Chambers was a project with my wife (that is currently under peer review) to automatically create a map of the brain’s networks.
I can translate some of those skills back to Uber to see how our riders flow from one part of San Francisco to another.
Here are San Francisco’s location networks, showing the probability that a ride starts in one neighborhood and ends in another.
There’s a lot of information in this image, so I’ll walk you though it. You can explore the full-sized map in more detail here.
What you’re seeing are 35 of San Francisco’s neighborhoods outlined in grey. At the centroid of each neighborhood I’ve plotted a circle, the size of which represents the proportion of rides that flow into that neighborhood. The circles are colored according to which statistically-identified subnetwork they belong. Every neighborhood that sends a ride out has a line of the same color as the source neighborhood connecting it to its destination. The weight of each line represents the proportion of rides that go from the source neighborhood to its target. Technically speaking this is a weighted digraph.
Haha okay, okay, let’s clear up some of that clutter and look at this differently.
If instead we look at the raw number of rides moving between neighborhoods you’ll see that the map looks very different. Now almost all the action is going on between neighborhoods in a radius around downtown.
39% of all SF Uber rides move between just 6 neighborhoods:
* Financial District
* Western Addition
This tells you a lot about San Francisco and about Uber. “Like what?” you might be saying.
Well, statistically speaking, SoMa is the main hub of San Francisco in that it has largest node strength (instrength + outstrength).
More simply: people are more likely to catch a ride into or out of SoMa than any other neighborhood. They are also more likely to stay within SoMa than any other neighborhood.
What starts in SoMa has a 17% probability of staying in SoMa. That’s not as catchy as Vegas’ slogan, but it’s more accurate.
SoMa is part of a larger, more diffuse subnetwork that includes Potrero Hill, the Mission, the Castro, Noe Valley, Western Addition, Haight-Ashbury, Twin Peaks, Bayview, and Parkside. This means that those neighborhoods are all more likely to send rides between one another than they are to send rides to neighborhoods outside of their subnetwork.
Enough rambling. Let’s get to the good stuff.
What else can we learn? What about that more interesting thing I said at the beginning about where men and women hang out? Where the ladies at?
Well first, we need to devise a way to statistically assess whether there are more women in a neighborhood than we’d expect. How?
The easiest way is to work from the assumption that there are no differences in the way populations are distributed, and then from there find neighborhoods where that assumption fails.
For example, here’s a plot showing the number of rides done per neighborhood for men plotted against the same data for women:
Each red dot represents a neighborhood and the diagonal dashed line is the predicted linear relationship between number of rides per neighborhood by gender (yes, gender, not sex…) We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%. So someone with the name of Leslie remains unclassified because there’s only a 94.1% chance the name is from a female, whereas a boy named Sue would be misclassified as female with a 99.2% probability.
Any deviations above this line means that a neighborhood has more women taking rides into it than what we would expect given the number of men that take rides there. Deviations below that line are places where we see more men than we would expect given the number of women (actually, technically, places where we see fewer women than we would predict given the number of men).
What’s the gist?
* There are 35% more women in the Marina and 47% more women in Pac Heights on weekend nights than expected.
* Conversely, there are 23% more men in SoMa, 16% more in the Castro, and 14% more in the Financial District.
So if you’re looking for a guy, head to SoMa on a Friday night. If you’re looking for a lady, check out the Marina or Pac Heights!
We can map these kinds of binary discrepancies for several variables. Check it:
Every blue dot shows an unbalance. The larger the dot, the more unbalanced the relationship is. To summarize in words:
* Android users are all about the Haight, the Mission, Nob Hill, and SoMa.
* iPhone users are up in the Financial District, the Castro, North Beach, Russian Hill, and the Marina.
* During normal business hours everyone’s headed into the Financial District…
* …but on weekend nights it’s all about the parties: the Mission, the Marina, Downtown, the Castro, and Western Addition.
* Riders with the lowest ratings are in the Western Addition, the Castro, and the Mission. (Are you all being nice to our drivers? What’s up?!)
* Our highest-rated riders are hanging out in SoMa, Downtown, North Beach, and the Marina. (Thank you all!)
Sure, this isn’t as enticing as hookers and booze, but there’s some really cool stuff we can learn about San Francisco here. Not only can we get a pulse on the ins and outs of the city, but we can get a feel for how people move from place to place.
Where do people go for fun after work and on weekends? What parts of the city are most tightly connected? A rider is far more likely to head into SoMa from Downtown than they are to head over to Golden Gate Park. There’s a lot more at play here than just population or population density.
We can see how San Francisco works and plays. We can see how the city comes alive. To me, this:
is just as beautiful as this:
And that’s one of the reasons why a neuroscientist is working for Uber.