Sign up for Uber

Uberdata: Mapping the San Franciscome

Uber RSS

What up humans?! Bradley Voytek here again to talk some networks and neuroscience at you in today’s edition of #uberdata.

In this post I map San Francisco’s “flow” and look at where people go and when. For example:
* Where do men or women go on weekend nights?
* Where are San Francisco’s iPhone users hanging out?
* Where do people work and play?

Lately a surprising number of people have asked me what the hell a neuroscientist is doing working with Uber.

Other than the fact that the people are amazing and the problems are challenging, believe it or not there’s a lot of overlap in terms of the technical skills required to solving problems in both domains. One of the new tricks I learned with Uber’s head of engineering Curtis Chambers was a project with my wife (that is currently under peer review) to automatically create a map of the brain’s networks.

I can translate some of those skills back to Uber to see how our riders flow from one part of San Francisco to another.

Here are San Francisco’s location networks, showing the probability that a ride starts in one neighborhood and ends in another.

There’s a lot of information in this image, so I’ll walk you though it. You can explore the full-sized map in more detail here.

What you’re seeing are 35 of San Francisco’s neighborhoods outlined in grey. At the centroid of each neighborhood I’ve plotted a circle, the size of which represents the proportion of rides that flow into that neighborhood. The circles are colored according to which statistically-identified subnetwork they belong. Every neighborhood that sends a ride out has a line of the same color as the source neighborhood connecting it to its destination. The weight of each line represents the proportion of rides that go from the source neighborhood to its target. Technically speaking this is a weighted digraph.

Got that?

Haha okay, okay, let’s clear up some of that clutter and look at this differently.

If instead we look at the raw number of rides moving between neighborhoods you’ll see that the map looks very different. Now almost all the action is going on between neighborhoods in a radius around downtown.

full-size

In fact…

39% of all SF Uber rides move between just 6 neighborhoods:
* SoMa
* Downtown
* Financial District
* Mission
* Marina
* Western Addition

This tells you a lot about San Francisco and about Uber. “Like what?” you might be saying.

Well, statistically speaking, SoMa is the main hub of San Francisco in that it has largest node strength (instrength + outstrength).

More simply: people are more likely to catch a ride into or out of SoMa than any other neighborhood. They are also more likely to stay within SoMa than any other neighborhood.

What starts in SoMa has a 17% probability of staying in SoMa. That’s not as catchy as Vegas’ slogan, but it’s more accurate.

SoMa is part of a larger, more diffuse subnetwork that includes Potrero Hill, the Mission, the Castro, Noe Valley, Western Addition, Haight-Ashbury, Twin Peaks, Bayview, and Parkside. This means that those neighborhoods are all more likely to send rides between one another than they are to send rides to neighborhoods outside of their subnetwork.

Enough rambling. Let’s get to the good stuff.

What else can we learn? What about that more interesting thing I said at the beginning about where men and women hang out? Where the ladies at?

Well first, we need to devise a way to statistically assess whether there are more women in a neighborhood than we’d expect. How?

The easiest way is to work from the assumption that there are no differences in the way populations are distributed, and then from there find neighborhoods where that assumption fails.

For example, here’s a plot showing the number of rides done per neighborhood for men plotted against the same data for women:

Each red dot represents a neighborhood and the diagonal dashed line is the predicted linear relationship between number of rides per neighborhood by gender (yes, gender, not sex…) We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%. So someone with the name of Leslie remains unclassified because there’s only a 94.1% chance the name is from a female, whereas a boy named Sue would be misclassified as female with a 99.2% probability.

Any deviations above this line means that a neighborhood has more women taking rides into it than what we would expect given the number of men that take rides there. Deviations below that line are places where we see more men than we would expect given the number of women (actually, technically, places where we see fewer women than we would predict given the number of men).

What’s the gist?

* There are 35% more women in the Marina and 47% more women in Pac Heights on weekend nights than expected.
* Conversely, there are 23% more men in SoMa, 16% more in the Castro, and 14% more in the Financial District.

So if you’re looking for a guy, head to SoMa on a Friday night. If you’re looking for a lady, check out the Marina or Pac Heights!

We can map these kinds of binary discrepancies for several variables. Check it:

Every blue dot shows an unbalance. The larger the dot, the more unbalanced the relationship is. To summarize in words:

* Android users are all about the Haight, the Mission, Nob Hill, and SoMa.
* iPhone users are up in the Financial District, the Castro, North Beach, Russian Hill, and the Marina.
* During normal business hours everyone’s headed into the Financial District…
* …but on weekend nights it’s all about the parties: the Mission, the Marina, Downtown, the Castro, and Western Addition.
* Riders with the lowest ratings are in the Western Addition, the Castro, and the Mission. (Are you all being nice to our drivers? What’s up?!)
* Our highest-rated riders are hanging out in SoMa, Downtown, North Beach, and the Marina. (Thank you all!)

Sure, this isn’t as enticing as hookers and booze, but there’s some really cool stuff we can learn about San Francisco here. Not only can we get a pulse on the ins and outs of the city, but we can get a feel for how people move from place to place.

Where do people go for fun after work and on weekends? What parts of the city are most tightly connected? A rider is far more likely to head into SoMa from Downtown than they are to head over to Golden Gate Park. There’s a lot more at play here than just population or population density.

We can see how San Francisco works and plays. We can see how the city comes alive. To me, this:

is just as beautiful as this:

San Francisco at Night

And that’s one of the reasons why a neuroscientist is working for Uber.

Post comment as twitter logo facebook logo
Sort: Newest | Oldest
BramCohen 5 pts

Does that thick yellow line correspond to a single frequent customer?

voytek 5 pts

BramCohen not at all, though there are certainly fewer people moving into and out of Crocker-Amazon than other neighborhoods!

KMineoGarcia 5 pts

I would like to see how this model changes in volume and distribution during periods of extended rain, if we ever see any!

voytek 5 pts

KMineoGarcia it's raining now! We've actually written about the effects of rain on rides in SF before: http://blog.uber.com/2011/03/12/uberdata-uber-for-style-and-comfort/

tjfaust 5 pts

Zebbyj's a hater. This data is awesome. Maybe it's just confirming what a lot of us already knew, but it's still tremendously important to verify or validate those intuitive beliefs with cold, hard data. Thanks, Bradley!

voytek 5 pts

tjfaust I'd like to know how one could find statistical clusters without doing the math. But yeah, haters gonna hate, and it was a fun analysis whether or not anyone else thinks so. :)

Zebbyj 6 pts

Still not clear why this takes a neuroscientist. First and foremost, any experienced driver of this city can tell you this info. Secondly, when an app already gives this data (which any good app should), all it needs is put into a mapping program. Über is nothing but a silly hype machine. It's really really silly actually.

Trackbacks

  1. [...] Uber Blog » Uberdata: Mapping the San Franciscome Uberdata: Mapping the San Franciscome. by voytek on Jan 09, 2012. 0 Comments. Uber RSS. What up humans?! Bradley Voytek here again to talk some networks and neuroscience at you in today's edition … [...]

  2. [...] to another – and plotted a map of probable singles’ gathering places by neighborhood. Uberdata: Mapping the San Franciscome (Uber [...]

  3. [...]  Men let’s face it you can go anywhere in this city and have high odds as a straight guy). Uberdata: Mapping the San Franciscome [...]

  4. [...] more men or women on a given night or where all the people from different parts of town are going, Uber might have it figured out. They’ve used data from their cab service to map people flows in [...]

  5. [...] Best Tablet App – StumbleUpon Best Mobile App – Evernote and Taskrabbit Best Location App – Uber (check out the San Francisco grid) [...]

  6. [...] Best Tablet App – StumbleUpon Best Mobile App – Evernote and Taskrabbit Best Location App – Uber (check out the San Francisco grid) [...]

  7. [...] Best Tablet App – StumbleUponBest Mobile App – Evernote and TaskrabbitBest Location App – Uber (check out the San Francisco grid) [...]

  8. [...] at broad behaviors by gender, so I looked at the male/female ridership ratio for each neighborhood from my last post to see if that ratio correlated with the number of Rides of Glory per neighborhood. Sure [...]

  9. Quora says:

    What can you do and what can you know with a degree in neuroscience?…

    You’d be surprised at how much having a B.S. in Neuroscience can prepare you for. Done right, it’s a very broad exposure to psychology, biology, and biochemistry. When I applied to PhD programs, I got interview invitations from a diverse number of pr…