How Lyft discovered OpenStreetMap is the Freshest Map for Rideshare

Clare Corthell
Lyft Engineering
Published in
7 min readFeb 9, 2021

--

Mark Huberty & Clare Corthell, Lyft Mapping

Roads to the new terminal at LaGuardia Airport opened in July 2020, evidenced by aggregate Lyft traffic (left). OpenStreetMap data had been updated (right) shortly after opening, faster than many other maps.

Lyft Mapping study shows crucial OpenStreetMap road attributes are fresh and high quality in 30 North American cities, as compared to groundtruth.

Lyft moves people — from home to work, work to play, play to rest, through cities and beyond. Maps play a critical role, helping Lyft figure out where drivers and riders are, how best to connect them, and estimate how long it will take to get to the destination.

Lyft Mapping is built on top of OpenStreetMap. This global map database is used by millions of people around the world, for combatting climate change, tracking agricultural land use, disaster recovery, refugee response, academic research, and much more. After 16 years of growth, OSM is now commonly used by many companies to power applications like logistics platforms, social media, and gaming. OSM is now the biggest crowdsourced repository of human geospatial knowledge. But is this map suitable for supporting the rideshare experience? Is it the best option available? Can Lyft support the OSM community and contribute to making the map better? Though we had a strong intuition that OSM offered a complete road network, we didn’t know how well the map matched the real world — so we ran a study.

tl;dr

After three months, thousands of miles driven, and a lot of skilled work from our data curation team, we’re happy to report positive findings:

  1. OpenStreetMap has a very high-quality road network in 30 large North American cities.
  2. The OpenStreetMap community can be credited with maintaining the map at a reliably fresh standard in these areas.
  3. This survey design is potentially valuable for any study of map quality relative to groundtruth.

In this post, we’ll review how we did this (see the paper), the findings, and takeaways for Lyft and the mapping community.

Survey Design: How accurate is the map?

Measuring the entire map — whether a road exists, is annotated correctly, is up-to-date — is simply hard. Lyft operates in over 300 markets in the United States and Canada, from dense, old urban areas like New York City, to sprawling suburban metropolises like Phoenix or Los Angeles. To get the ground truth required to assess map quality, we might send a surveyor to every intersection and record whether something is correct. The US Census Bureau does exactly this — in fact, it hires so many people every 10 years that it singlehandedly changes the unemployment rate. But this approach could be incredibly slow and expensive. We needed a survey design that balanced our desire for regional specificity with cost and logistical feasibility.

An example of our sampling plan in Greater New York City. Mapillary dispatched cars to each of the S2 polygons and collected imagery along all driveable roads in each.

Sampling + Remote Sensing = Feasible Study

We knew that we needed a sampling-based approach to make this tractable. But a pure random sample wouldn’t work; sending people to randomly-sampled intersections around a city would be incredibly time consuming. Instead, we looked to public health and remote sensing for a solution. Health researchers often face a similar problem of how to send a limited number of survey workers to homes for evaluation of disease prevalence or health outcomes. They solve this with cluster sampling:

  1. Sample spatial units, such as a city block
  2. Sample households from that block

This two-stage process simplifies life for survey workers. A survey taker can go to one city block and visit multiple households at a time, maximizing the information they get from one trip. This sort of survey design isn’t as statistically efficient as a pure random sample, but what it lacks in purity it makes up for in logistical simplicity and cost effectiveness.

Example intersection image from the Mapillary forward-facing camera. We can see that turn signage and road name signage are all clearly visible.

We mimicked this methodology and added a twist: remote sensing. Rather than sending surveyors in person, we partnered with Mapillary (now part of Facebook) to collect high-quality imagery from our spatial samples. A team of Lyft Map Data Curators then used these images to study whether OSM matched the real world.

With the curator-reviewed map in hand, standard statistical techniques gave us our answer — based on the sample, how good was OSM quality, and how much did it vary by region? The survey package for the R statistical programming language provided all the tools to estimate nationwide and regional quality based on our sampling design. Detailed estimates follow at the end of this post, and in our public paper.

This process of sampling, rapid imagery collection, and curation allowed us to study all of North America in three months, completed in March 2020.

Findings: High Map Accuracy across 30 Cities*

We found that core features of OpenStreetMap roads are correct more than 95% of the time relative to what exists in the real world. Data critical to safe navigation, such as left turn restrictions, are correct more than 85% of the time. Nationwide, these estimates are precise to within 5% sampling uncertainty. The regional uncertainty varies more based on region-level dynamics, visible in the figures at the end of this post.

As is said in Mapping, perfection is unattainable; the map goes out of date the moment it is published, because the real world is always changing. But as of March 2020, OSM map data showed only minor differences from the real world.

*See full list of cities in data plots below

Takeaways

These findings are encouraging, because they show that Lyft is running on a map that accurately represents reality. This gives us greater confidence that we typically won’t predict a Lyft route with an illegal turn, or driving the wrong way down a street. At the same time, it narrows in on areas of opportunity.

  • OpenStreetMap is in maintenance or Gardening” phase of map curation for many features and geographies. This means that issues arising are usually recent changes to the real world, rather than gaps in the extent of map. Discrepancies with groundtruth are often due to recent changes rather than longstanding unmapped features. This study shows 30 cities in maintenance mode for the features noted.
  • Areas for investment remain including onramp signage and lane annotation in specific cities. This is useful information to help mappers, including the Lyft Mapping Curation Team, direct their efforts to improving the map for everyone who uses OpenStreetMap. Studies like this help us narrow focus and build stronger programs for finding errors.
  • Sample-based field surveys are both fast and less costly than a groundtruth census. The approach of sampling + remote sensing could be useful for further studies on different tags, geographies, or use cases. We hope others will be encouraged to use these methods.
  • Sensor networks combining imagery (like Mapillary) and telemetry (fleets like Lyft) can improve OpenStreetMap, in safe and anonymized ways, by surfacing errors and changes. For example, see How Lyft Creates Hyper-Accurate Maps from Open-Source Maps and Real-Time Data. As mappers know well, sensors are a valuable tool for everyone to use to improve maps — and their accelerating ubiquity is making improvement very accessible.
  • Keeping the map fresh and up-to-date is a matter of finding the needles in the haystack. Evidence beyond this study (such as the title image) show that the community is tracking construction, natural disaster impacts, and new changes (like new bike lanes!) and mapping them as they happen.

The Next Destination

These findings are valuable in themselves — they tell us that Lyft runs on a great map. As OpenStreetMap has demonstrated, we should never underestimate the potential of crowd collaboration. With every edit, addition, modification, and discussion, the community of editors and organizations have created a complete map of North American cities in OpenStreetMap. As Lyft Mapping continues research on how to continue collaborative contribution to OpenStreetMap, we’ll look further afield beyond these cities — stay tuned!

If you want to be a part of fine-tuning our complex mapping systems, join us!

Full Report: 30 North American Cities

Visuals below are fully documented in the paper: bit.ly/lyftosmqualitystudy

The Basics: Road Class, Lanes, Directionality

Road class and directionality accuracy is estimated >95%, while lane metadata are less accurate and more heterogeneous across regions.

Legality: Turn Restrictions

Turn restrictions matter for ride share for two simple reasons: we don’t want drivers making illegal turns, and we don’t want to expose passengers to unsafe routes. Unlike roads, we can’t see them in satellite imagery, nor find them in municipal road data released by most cities. Mapillary’s ground-level imagery proved critical to understanding OSM completeness here.

Estimated correctness rate of turn restrictions in 30 North American cities. “Correct” in this instance refers to whether the turns are correctly modeled in OSM as either allowed or constrained. The vertical red line marks 90%.

Knowing where to go: Freeway Signs

Example freeway onramp signage. The onramp signage shows the freeway shield and travel directions for two onramps. In contrast, offramp signage provides significantly less verbose information than the destination sign.
Estimated proportion of offramp signs with correct name tagging in OSM. We find offramp signs are tagged consistently and correctly in OSM. Washington, DC is an outlier with relatively few freeway exits.
Estimated proportion of onramps with correct name tagging in OSM. We see that overall onramp tagging averages less than 90% correctness, with some regions falling far below that. However, quality appears highly heterogeneous — some regions have very noisy, very low quality data, while others appear to have almost perfect data.
Destination signs signal upcoming exits and their destinations

Ride Safely!

Acknowledgements & Thanks: Lyft Basemap Team, Spencer Jacquish, Lisa Rendez-Webb, Ryan Cook & Janine Yoong (Mapillary), Andy Hyuhn, Alex Kazakova, Lyft Data Curation. Contact us at osm-questions@lyft.com.

--

--