About Bike Crash Mapper

Data Sources

The California Highway Patrol collects data from local police departments about all serious vehicular collisions. I downloaded their 2014 data, and filtered for incidents that involved bicycles. Since most incidents did not include geocoordinates, I used Google's Geocoding API to find the latitude and longitude of each crash (see below for the limitations of this approach).

Data on the population size and number of bike commuters in each city and county come from the US Census Bureau. I used their API to get 2013 American Community Survey 5-year estimates for these values (tables B01003 and B08301, respectively).

I identified "Danger Zones" by finding areas where at least 4 crashes had occured within 250 meters of each other. Thanks to Mindy Huang for suggesting and helping me design a recursive algoritm to find these zones.

Limitations

Since not all collisions involve a police response or are reported to the CHP, this app does not include every bike crash in California. Still, it should contain enough data to provide a decent picture of where crashes typically occur.

Due to issues with how the locations of each crash were formatted, the Geocoding API struggled to accurately locate some of incidents. I've tried to correct obviously incorrect locations, but didn't have the time to manually review all 13,000 incidents. Therefore, the locations shown on the map may not always be correct — refer to the text for the most accurate description of the location.

The map shows the intersection that was used as a location reference in the raw data. Many collisions, however, did not actually occur at at intersection, but rather 200 feet west of it (for example). Again, refer to the text for the most accurate description of the location.

The Census data only counts people who commute to work via bike, so may not be an accurate representation of how many people regularly bike in a given community. Therefore, treat the "collisions per 1,000 bike commuters" as just a rough indication of the ratio of collisions to total bikers.

Both the 4 crashes and 250 meter cutoffs were chosen arbitrarily, and may not be the best way to identify particularly dangerous locations. Moreover, my approach only flags incidents that cluster in a circle, rather than along a stretch of road.

Technical Stack