sample project

Geographic Clustering Based on POIs

January 01, 2017

Geographic clustering of Manhattan and Paris

This sample project is based on data from Open Street Map and Foursquare, that we associate with the neighborhoods of the borough of Manhattan and of the city of Paris.

We aggregate points of interest (POI) by type and count how many venues are present in each neighborhood. Based on this data we run an unsupervised machine learning algorithm to cluster the neighborhoods.

How do we do this?

We have two datasources: Open Street Maps and Foursquare

  • We assume that we already have the tables Ways and Nodes from OSM. They contain all the information about streets and buildings in Paris or Manhattan with their localization.

  • We retrieve Foursquare data from their Public API

We will use the census block dataset from the data portal of the city of New York City as a grid for the borough of Manhattan.

We will use the IRIS dataset from the French statistics institute as a grid for the city of Paris. They represent “small neighborhoods” encompassing 1,800 to 5,000 inhabitants.

All the POI retrieved from OSM and Foursquare can be associated to a specific neighborhood of Manhattan or Paris.

We will compute features for each neighborhood based on aggregations of the POI we retrieved. We aggregate businesses and locations based on their type. So for example we have for each zone the number of food-related locations, both from OSM and Foursquare.

We then create a segmentation with a k-means clustering algorithm.

Exploring the project

We recommend that you follow the links to the project that corresponds to the city you know best. This way, you will have a better understanding of how well the clustering algorithm works!

Explore this sample project

  • Flow

    Start by looking at the flow and visualizing the different steps of the project. You can see the preparation steps in yellow and the modeling steps in green. 

    Explore !
  • Flow

    The flow for the city of Paris is very similar, most of the data preparation and the modeling steps are identical. 

    Explore !
  • Details

    Find out how we proceed for the borough of Manhattan by reading the detailed description of the project. 

    Explore !
  • Details

    The project for the city of Paris also features a detailed description. The only difference is the choice of block-level polygons, provided by INSEE. 

    Explore !
  • Dashboard

    Find out what the clustered map of Manhattan looks like here! 

    Explore !
  • Dashboard

    Find here the map for the city of Paris. 

    Explore !

Ready to enter Dataiku DSS ?

If you never used DSS, it might be worthy to familiarize yourself with DSS concepts in the first place.

Learn the concepts Enter DSS

This sample is already available in your DSS!

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

From your DSS home page, click on "Sample projects".

If your DSS server doesn't have Internet access, you can download this sample and import it manually (click on "Import project")

Don't have Dataiku DSS yet? Try for free now

Your browser has a width smaller than 1000px

You can't access DSS using a mobile device, on desktop a browser width (Chrome or Firefox support only) of at least 1280px is recommended.

Only Chrome and Firefox are supported

Sorry you seem to use another browser not supported by DSS, please try again from Chrome or Firefox.