What up humans?! Bradley Voytek again with another round of (hopefully) interesting #uberdata asking a few questions today:

“How is San Francisco’s Financial District like New York?” and “What neighborhood tells us the most about DC’s lifestyle?”

And hopefully providing a few data-derived answers.

Obviously the Uber nerd-collective are into maps. I mean, mapping is kind of important to what we’re building at Uber. But as a brain guy I usually don’t think in terms of space as much as I do in terms of time: temporal correlations, autoregressive models, causal relationships, time-frequency analyses, and so on. What’s happening in the brain and when.

Today, I’ll be talking about what Uber’s temporal patterns of demand tell us about the neighborhoods of the cities we service. Maybe it will be easier to just show you what I’m talking about. This is what San Francisco’s demand looks like, broken down by hour of week:

Uberdata: San Francisco demand curve

Now compare that to New York:

Uberdata: San Francisco v. New York demand curve

Right away you can see there’s something different; the differences aren’t huge, but they’re there.

Our ridership in New York is more heavily skewed toward weekdays whereas San Francisco demand jumps up on weekends.

Of course, now that we’ve been in business for two years and still growing like crazy, we can get much more granular. Instead of looking at differences between cities, we can start to look at differences between neighborhoods. 271 of them across nine major US cities, to be exact:

  • Boston
  • Chicago
  • DC
  • LA
  • New York
  • Philadelphia
  • San Diego
  • San Francisco
  • Seattle

Now that we can get more fine-grained we can begin to observe some pretty clear neighborhood-by-neighborhood differences. Again, a nerd-picture is worth a thousand nerd-words, so have a look at two neighborhoods in San Francisco — the Mission and the Financial District:

Uberdata: San Francisco Mission v Financial District demand curves

Check out how daily demand in the Mission peaks later in the day–after work hours–whereas demand in the Financial District peaks toward the end of the work day. The big difference, of course, is that the Mission has a lot more demand on Saturdays.

Now look at how San Francisco’s Financial District compares to New York’s Financial District:

Uberdata: San Francisco v New York Financial Districts demand curves

I love this stuff!

San Francisco’s Financial District is more Manhattan-like than it is San Francisco-like!

(And, of course, by “more like” I mean Uber demand, which is an index for activity within that neighborhood).

In fact, we can quantify how <city>-like or not <city>-like any given neighborhood is. That is, we can ask, “how San Francisco-like is the Mission, really?” and “how much more like New York is the Financial District than it is San Francisco?”

And we can do this for every neighborhood. What do we find?

Cities have “stereotypical” neighborhoods that very strongly match the flow of their home cities really well, and some neighborhoods just don’t really seem to belong to their home city. They’re outliers.

“But wait a minute!” you might say, donning your +3 internet troll-hat of ones-upmanship, “you’re correlating a variable with another variable that includes the first! If one neighborhood contributes more overall power to the signal of the city average, of course it will correlate with it better!”

“By Jove!” I might retort if I didn’t worry about this crap so much already. “You’re right! Thank you Dr. Needs-to-show-the-internet-how-smart-I-am.”

So yeah. I corrected for that so as not to get you all riled up. Happy now Internet Math Patrol? You made me write 3 more lines of code.

The concern here is that some neighborhoods have more demand and thus contribute more to the overall city demand. One way to address this is to correlate a city’s neighborhood demand with the city’s demand curves removing the effect of that neighborhood. Which, as you can tell from the imaginary argument in my head that I just subjected you all to, is what I’ve done.

The most stereotypically “like” neighborhood for each city is:
• San Francisco: North Beach
• New York: Chelsea
• Seattle: Capitol Hill
• Chicago: Near North Side
• Boston: Back Bay – Beacon Hill
• DC: Dupont Circle
• LA: Mid-City West

Now, in contrast…

The most stereotypically “unlike” neighborhood for each city is:
• San Francisco: Crocker Amazon
• New York: Washington Heights
• Seattle: South Park
• Chicago: Montclare
• Boston: West Roxbury
• DC: Deanwood
• LA: Southeast LA

We can also extract “types” of demand curves: are there neighborhoods that are more active on weekends and others that are clearly work-week hotspots? One simple mathematical technique to identify stereotyped patterns in data is via principal component analysis. The details aren’t too important, so let’s just jump to the results: there are two “types” of demand curves that account for 93% of the variance in overall demand. Here’s what they look like:

Uberdata: PCA demand curves weekend/weekday

Essentially you’ve got one rising demand curve that peaks on evenings and Friday and Saturday nights (red) and one workday/workweek curve that diminishes on weekends (blue). We can then ask, for each city, which neighborhood is the most “weekend-like” and which is the most “weekday-like” (that is, how strongly does each neighborhood correlate with each of these two curves)?

So if we could build the perfect “party city” consisting only of the neighborhoods from each city that correlate most with the weekend curve, this is what it would look like:
• San Francisco: North Beach
• New York: SoHo
• Seattle: First Hill
• Chicago: Near North Side
• Boston: South Boston
• DC: Dupont Circle
• LA: Santa Monica

And now again, in contrast…

The lame all work/no play city would be:
• San Francisco: Financial District
• New York: Garment District
• Seattle: Overlake
• Chicago: O’Hare
• Boston: East Boston
• DC: Deanwood
• LA: Westchester

But this is looking at how neighborhoods relate to cities. What about how they relate to one-another? Well, given that we’re working with 271 neighborhoods, we’re talking about running 36585 correlations, which is messy to display. So I’ve pared the data down to just the strongest relationships, which you can play around with by clicking the image below.

Uberdata - Neighborhood Correlations

interactive plot

built with d3.js

Of course this is all academic. The thing that makes cities like San Francisco great are their diversity. I’m sure living in Uber Party City (UPC) would eventually have to get old. Right?