Tuesday, May 27, 2014

Cycling in Europe: Beyond statistics

Recently Strava released a global heatmap of 77,688,848 bicycle rides and 19,660,163 runs from the Strava dataset. This was more of an engineering challenge to create a visualization of that size than anything else.  Last year the code was cleaned up and became the Personal Heatmaps feature on Strava. This time it has been refactored to handle the large dataset by reading from mmapped files stored on disk. To start out, the world is broken up into regions presented by zoom level 8 tiles. Each one of these regions has a file containing the sorted set of key/values where the key is the pixel zoom and quadkey and the value is the count of GPS points. The quadkeys make it so all the data for a tile is stored sequentially in the file. Pixels with no GPS points are excluded and only every 3rd zoom level is stored in the file. The values for missing zooms can be found by adding the 4 to 16 values from higher zoom levels. Skipping zoom levels saves a bit a disk space, but it also preloads into memory the region of the file needed for deeper zooms. Well, if this is all way too technical, then go to this page and start playing.

No comments: