howto

PostGIS is a powerful extension of PostgreSQL for working with spatial data.

This How-To will help you get started manipulating geographical data with PostGIS.

First of all, you need to have a PostgreSQL database with PostGIS running on your instance. If you are new to Dataiku you can check out the Connecting to PostgreSQL in Dataiku DSS post.

A ** geometry** is a representation of a geographical object (possibly complex) used by PostGIS to perform fast and efficient data manipulation. Most PostGIS functions take geometries as arguments, so we need to understand how to convert other coordinates into geometries.

The most common geographical data are points. So, how do we convert a standard (longitude,latitude) tuple to a PostGIS geometry? With the ST_MakePoint function. It takes the longitude, latitude, and altitude and returns a geometry of type `point`

.

Here is an example of such a request:

In order to use your new geometries with other PostGIS functions, you need to specify the coordinate system (SRID) of your points with the ST_SetSRID function. The most widely used system is SRID=4326; that is, GPS coordinates). If you have no idea where your data comes from, it’s probably this one.

So our request becomes:

Sometimes you may want to convert your data to a specific coordinate system. It is possible with the ST_Transform function, which moves the coordinates of a geometry from its current system to another one.

For example, if we want to project our data into the French ‘Lambert Nord standard’ system (SRID=27561) we can modify our request like this:

If you want to convert from your geometry to the classic tuple (longitude,latitude), you can do this by using the ST_AsText function, which takes a geometry and returns a text (known as “WKT” format).

For example:

You can also use the ST_X and ST_Y functions to directly recover the longitude an latitude of your points.

Once you know how to make points it is simple to build lines. You just have to use the ST_MakeLine function, which takes two points (see previous section) and builds the line between them.

You can also give an array of points and the function will draw the line which interpolates those points.

And then there are several interesting functions to characterize this line. You can:

- get the the length with ST_Length
- get the starting point with ST_StartPoint
- get the ending point with ST_EndPoint
- get the number of points in the line with ST_NPoints

The last main geometry you have to know is the polygon. As you can guess, it delineates an area. There are many ways to build a polygon from points, lines, other polygons… here we will cover two methods.

The first one is the simplest; it creates a polygon from a closed line. This is the ST_MakePolygon function, which draws lines from point to point, with the first and last points the same.

Here is an example:

The other method is more complicated, but also useful. The aim is to build the smallest convex polygon containing a collection of points. In order to do that, you first need to build a geometry from a collection of points with the function ST_Collect and then use the ST_ConvexHull function to create the polygon.

Here is an example:

Now that you know how to make geomtries, here is a non-exhaustive list of useful functions you can use to manipulate them:

- ST_Within checks if a geometry (for example a point) is contained in another one (for example a polygon)
- ST_DWithin checks if a geometry is within a specified distance of another one (be careful with the units! If you want to speak in meters you have to cast your geometry into a geography)
- ST_Distance calculates the distance between two geometries (same remark for the units)
- ST_Area calculates the area of a polygon (same remark for the units)
- ST_Centroid returns a point which is the centroid of your geometry