Mapping the San Franciscome
Bradley Voytek here again to talk about networks and neuroscience in today’s edition of #UberData.
In this post I map San Francisco’s “flow” and look at where people go and when. For example:
- Where do men or women go on weekend nights?
- Where are San Francisco’s iPhone users hanging out?
- Where do people work and play?
Lately a surprising number of people have asked me what a neuroscientist is doing at Uber. Well, there’s a lot of technical skills overlap required to solving problems in both domains. I can translate these skills back to Uber to, for example, see how San Francisco riders flow from one part of SF to another.
Here are San Francisco’s location networks, showing the probability that a ride starts in one neighborhood and ends in another:
There’s a lot of information in this image, so I’ll walk you though it. You can explore the full-sized map in more detail here.
What you’re seeing are 35 of San Francisco’s neighborhoods outlined in grey. At the centroid of each neighborhood I’ve plotted a circle, the size of which represents the proportion of rides that flow into that neighborhood. The circles are colored according to which statistically-identified subnetwork they belong.
Every neighborhood that sends a ride out has a line of the same color as the source neighborhood connecting it to its destination. The weight of each line represents the proportion of rides that go from the source neighborhood to its target. Technically speaking, this is a weighted digraph.
Let’s clear up some of that clutter and look at this differently. If instead we look at the raw number of rides moving between neighborhoods you’ll see that the map looks very different. Now almost all the action is going on between neighborhoods in a radius around downtown.
39% of all SF Uber rides move between just 6 neighborhoods:
- Financial District
- Western Addition
This tells you a lot about San Francisco and about Uber. Statistically speaking, SoMa is the main hub of San Francisco in that it has largest node strength (instrength + outstrength). More simply: people are more likely to catch a ride into or out of SoMa than any other neighborhood. They are also more likely to stay within SoMa than any other neighborhood. What starts in SoMa has a 17% probability of staying in SoMa. That’s not as catchy as Vegas’ slogan, but it’s more accurate.
SoMa is part of a larger, more diffuse subnetwork that includes Potrero Hill, the Mission, the Castro, Noe Valley, Western Addition, Haight-Ashbury, Twin Peaks, Bayview, and Parkside. This means that those neighborhoods are all more likely to send rides between one another than they are to send rides to neighborhoods outside of their subnetwork.
What else can we learn? First, we can devise a way to statistically assess whether there are more women or men in a neighborhood than we’d expect. The easiest way is to work from the assumption that there are no differences in the way populations are distributed, and then from there find neighborhoods where that assumption fails. For example, here’s a plot showing the number of rides done per neighborhood for men plotted against the same data for women:
Each red dot represents a neighborhood and the diagonal dashed line is the predicted linear relationship between number of rides per neighborhood by gender.
We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%. So someone with the name of Leslie remains unclassified because there’s only a 94.1% chance the name is from a female, whereas a boy named Sue would be misclassified as female with a 99.2% probability.
Any deviations above this line means that a neighborhood has more women taking rides into it than what we would expect given the number of men that take rides there. Deviations below that line are places where we see more men than we would expect given the number of women (actually, technically, places where we see fewer women than we would predict given the number of men).
What’s the gist?
- There are 35% more women in the Marina and 47% more women in Pac Heights on weekend nights than expected.
- Conversely, there are 23% more men in SoMa, 16% more in the Castro, and 14% more in the Financial District.
So if you’re looking for a guy, head to SoMa on a Friday night. If you’re looking for a lady, check out the Marina or Pac Heights!
We can map these kinds of binary discrepancies for several variables:
Every blue dot shows an unbalance. The larger the dot, the more unbalanced the relationship is. To summarize in words:
- Android users are all about the Haight, the Mission, Nob Hill, and SoMa.
- iPhone users are up in the Financial District, the Castro, North Beach, Russian Hill, and the Marina.
- During normal business hours everyone’s headed into the Financial District…
- …but on weekend nights it’s all about the parties: the Mission, the Marina, Downtown, the Castro, and Western Addition.
- Riders with the lowest ratings are in the Western Addition, the Castro, and the Mission. (Are you all being nice to drivers? What’s up?!)
- The highest-rated riders are hanging out in SoMa, Downtown, North Beach, and the Marina. (Thank you all!)
- A rider is far more likely to head into SoMa from Downtown than they are to head over to Golden Gate Park.
There’s a lot more at play here than just population or population density. We can see how San Francisco works and plays. We can see how the city comes alive. To me, this:
is just as beautiful as this:
And that’s one of the reasons why a neuroscientist is working for Uber.