Working with Geospatial Data in Python from DataCamp

3. Projecting and transforming geometries

3.1 Coordinate Reference Systems

3.1.1 Geographic vs projected coordinates

The CRS attribute stores the information about the Coordinate Reference System in which the data is represented. In this exercises, we will explore the CRS and the coordinates of the districts dataset about the districts of Paris.

# Import the districts dataset
districts = geopandas.read_file("paris_districts.geojson")

# Print the CRS information
print(districts.crs)
# {'init': 'epsg:4326'}

# Print the first rows of the GeoDataFrame
print(districts.head())
   id           district_name  population                                           geometry
0   1  St-Germain-l'Auxerrois        1672  POLYGON ((2.344593389828428 48.85404991486192,...
1   2                  Halles        8984  POLYGON ((2.349365804803003 48.86057567227663,...
2   3            Palais-Royal        3195  POLYGON ((2.339465868602756 48.86213531210705,...
3   4           Place-Vendôme        3044  POLYGON ((2.331944969393234 48.86491285292422,...
4   5                 Gaillon        1345  POLYGON ((2.336320212305949 48.8679713890312, ...

Indeed, this dataset is using geographic coordinates: longitude and latitude in degrees. We could see that the crs attribute referenced the EPSG:4326 (the code for WGS84, the most common used geographic coordinate system). A further rule of thumb is that the coordinates were in a small range (<180) and cannot be expressing meters.


3.2 Working with coordinate systems in GeoPandas

3.2.1 Projecting a GeoDataFrame

The Paris districts dataset is provided in geographical coordinates (longitude/latitude in WGS84). To see the result of naively using the data as is for plotting or doing calculations, we will first plot the data as is, and then plot a projected version.

The standard projected CRS for France is the RGF93 / Lambert-93 reference system (referenced by the EPSG:2154 number).

GeoPandas and matplotlib have already been imported, and the districts dataset is read and assigned to the districts variable.

# Print the CRS information
print(districts.crs)
# {'init': 'epsg:4326'}

# Plot the districts dataset
districts.plot()
plt.show()

# Convert the districts to the RGF93 reference system
districts_RGF93 = districts.to_crs(epsg=2154)

# Plot the districts dataset again
districts_RGF93.plot()
plt.show()

 The plot using longitude/latitude degrees distorted the shape of Paris quite a bit.

3.2.2 Projecting a Point

In the previous chapter, we worked with the Eiffel Tower location. Again, we provided you the coordinates in a projected coordinate system, so you could, for example, calculate distances. Let’s return to this iconic landmark, and express its location in geographical coordinates: 48°51′29.6″N, 2°17′40.2″E. Or, in decimals: latitude of 48.8584 and longitude of 2.2945.

Shapely geometry objects have no notion of a CRS, and thus cannot be directly converted to another CRS. Therefore, we are going to use the GeoPandas to transform the Eiffel Tower point location to an alternative CRS. We will put the single point in a GeoSeries, use the to_crs() method, and extract the point again.

# Construct a Point object for the Eiffel Tower
from shapely.geometry import Point
eiffel_tower = Point(2.2945, 48.8584)

# Put the point in a GeoSeries with the correct CRS
s_eiffel_tower = geopandas.GeoSeries([eiffel_tower], crs={'init': 'EPSG:4326'})

# Convert to other CRS
s_eiffel_tower_projected = s_eiffel_tower.to_crs(epsg=2154)

# Print the projected point
print(s_eiffel_tower_projected)

0    POINT (648237.3015492002 6862271.681553576)
dtype: object

3.2.3 Calculating distance in a projected CRS

Now we have the Eiffel Tower location in a projected coordinate system, we can calculate the distance to other points.

The final s_eiffel_tower_projected of the previous exercise containing the projected Point is already provided, and we extract the single point into the eiffel_tower variable. Further, the restaurants dataframe (using WGS84 coordinates) is also loaded.

# Extract the single Point
eiffel_tower = s_eiffel_tower_projected[0]

# Ensure the restaurants use the same CRS
restaurants = restaurants.to_crs(s_eiffel_tower_projected.crs)

# The distance from each restaurant to the Eiffel Tower
dist_eiffel = restaurants.distance(eiffel_tower)

# The distance to the closest restaurant
print(min(dist_eiffel))
# 303.56255387894674

Because our data was now in a projected coordinate reference system that used meters as unit, we know that the result of 303 is actually 303 meter.

3.2.4 Projecting to Web Mercator for using web tiles

In the first chapter, we did an exercise on plotting the restaurant locations in Paris and adding a background map to it using the contextily package.

Currently, contextily assumes that your data is in the Web Mercator projection, the system used by most web tile services. And in that first exercise, we provided the data in the appropriate CRS so you didn’t need to care about this aspect.

However, typically, your data will not come in Web Mercator (EPSG:3857) and you will have to align them with web tiles on your own.

GeoPandas, matplotlib and contextily are already imported.

# Convert to the Web Mercator projection
restaurants_webmercator = restaurants.to_crs(epsg=3857)

# Plot the restaurants with a background map
ax = restaurants_webmercator.plot(markersize=1)
contextily.add_basemap(ax)
plt.show()

3.3 Spatial operations: creating new geometries

3.3.1 Exploring a Land Use dataset

For the following exercises, we first introduce a new dataset: a dataset about the land use of Paris (a simplified version based on the open European Urban Atlas). The land use indicates for what kind of activity a certain area is used, such as residential area or for recreation. It is a polygon dataset, with a label representing the land use class for different areas in Paris.

In this exercise, we will read the data, explore it visually, and calculate the total area of the different classes of land use in the area of Paris.

# Import the land use dataset
land_use = geopandas.read_file('paris_land_use.shp')
print(land_use.head())

# Make a plot of the land use with 'class' as the color
land_use.plot(column='class', legend=True, figsize=(15, 10))
plt.show()

# Add the area as a new column
land_use['area'] = land_use.area

# Calculate the total area for each land use class
total_area = land_use.groupby('class')['area'].sum() / 1000**2
print(total_area)
                       class                                           geometry
0               Water bodies  POLYGON ((3751386.280643055 2890064.32259039, ...
1  Roads and associated land  POLYGON ((3751390.345445618 2886000, 3751390.3...
2  Roads and associated land  POLYGON ((3751390.345445618 2886898.191588611,...
3  Roads and associated land  POLYGON ((3751390.345445618 2887500, 3751390.3...
4  Roads and associated land  POLYGON ((3751390.345445618 2888647.356784857,...

class
Continuous Urban Fabric             45.943090
Discontinuous Dense Urban Fabric     3.657343
Green urban areas                    9.858438
Industrial, commercial, public      13.295042
Railways and associated land         1.935793
Roads and associated land            7.401574
Sports and leisure facilities        3.578509
Water bodies                         3.189706
Name: area, dtype: float64

3.3.2 Intersection of two polygons

For this exercise, we are going to use 2 individual polygons: the district of Muette extracted from the districts dataset, and the green urban area of Boulogne, a large public park in the west of Paris, extracted from the land_use dataset. The two polygons have already been assigned to the muette and park_boulogne variables.

We first visualize the two polygons. You will see that they overlap, but the park is not fully located in the district of Muette. Let’s determine the overlapping part.

# Plot the two polygons
geopandas.GeoSeries([park_boulogne, muette]).plot(alpha=0.5, color=['green', 'blue'])
plt.show()

# Calculate the intersection of both polygons
intersection = park_boulogne.intersection(muette)

# Plot the intersection
geopandas.GeoSeries([intersection]).plot()
plt.show()

# Print proportion of district area that occupied park
print(intersection.area / muette.area)
# 0.4352082235641065

3.3.3 Intersecting a GeoDataFrame with a Polygon

Combining the land use dataset and the districts dataset, we can now investigate what the land use is in a certain district.

For that, we first need to determine the intersection of the land use dataset with a given district. Let’s take again the Muette district as example case.

The land use and districts datasets have already been imported as land_use and districts, and the Muette district has been extracted into the muette shapely polygon. Further, GeoPandas and matplotlib are imported.

# Print the land use datset and Notre-Dame district polygon
print(land_use.head())
print(type(muette))

# Calculate the intersection of the land use polygons with Notre Dame
land_use_muette = land_use.geometry.intersection(muette)

# Plot the intersection
land_use_muette.plot(edgecolor='black')
plt.show()

# Print the first five rows of the intersection
print(land_use_muette.head())
                            class                                           geometry
0  Industrial, commercial, public  POLYGON ((3751385.614444552 2895114.54542058, ...
1                    Water bodies  POLYGON ((3751386.280643055 2890064.32259039, ...
2       Roads and associated land  POLYGON ((3751390.345445618 2886000, 3751390.3...
3       Roads and associated land  POLYGON ((3751390.345445618 2886898.191588611,...
4       Roads and associated land  POLYGON ((3751390.345445618 2887500, 3751390.3...

<class 'shapely.geometry.polygon.Polygon'>

0    ()
1    ()
2    ()
3    ()
4    ()
dtype: object

3.4 Overlaying spatial datasets

3.4.1 Overlay of two polygon layers

Going back to the land use and districts datasets, we will now combine both datasets in an overlay operation. Create a new GeoDataFrame consisting of the intersection of the land use polygons wich each of the districts, but make sure to bring the attribute data from both source layers.

# Print the first five rows of both datasets
print(land_use.head())
print(districts.head())

# Overlay both datasets based on the intersection
combined = geopandas.overlay(land_use, districts, how='intersection')

# Print the first five rows of the result
print(combined.head())
                            class                                           geometry
0  Industrial, commercial, public  POLYGON ((3751385.614444552 2895114.54542058, ...
1                    Water bodies  POLYGON ((3751386.280643055 2890064.32259039, ...
2       Roads and associated land  POLYGON ((3751390.345445618 2886000, 3751390.3...
3       Roads and associated land  POLYGON ((3751390.345445618 2886898.191588611,...
4       Roads and associated land  POLYGON ((3751390.345445618 2887500, 3751390.3...


   id           district_name  population                                           geometry
0   1  St-Germain-l'Auxerrois        1672  POLYGON ((3760188.134760949 2889260.456597198,...
1   2                  Halles        8984  POLYGON ((3760610.022313007 2889946.421907361,...
2   3            Palais-Royal        3195  POLYGON ((3759905.524344832 2890194.453753149,...
3   4           Place-Vendôme        3044  POLYGON ((3759388.396359455 2890559.229067303,...
4   5                 Gaillon        1345  POLYGON ((3759742.125111854 2890864.393745991,...



                       class  id district_name  population  \
0               Water bodies  61       Auteuil       67967   
1    Continuous Urban Fabric  61       Auteuil       67967   
2  Roads and associated land  61       Auteuil       67967   
3          Green urban areas  61       Auteuil       67967   
4  Roads and associated land  61       Auteuil       67967   

                                            geometry  
0  POLYGON ((3751395.345451574 2890118.001377039,...  
1  (POLYGON ((3753253.104067317 2888254.529208081...  
2  POLYGON ((3751519.830145844 2890061.508628568,...  
3  (POLYGON ((3754314.716711559 2890283.121013219...  
4  POLYGON ((3751619.112743544 2890500, 3751626.6...

3.4.2 Inspecting the overlay result

Now that we created the overlay of the land use and districts datasets, we can more easily inspect the land use for the different districts. Let’s get back to the example district of Muette, and inspect the land use of that district.

GeoPandas and Matplotlib are already imported. The result of the overlay() function from the previous exercises is available as combined.

# Print the first rows of the overlay result
print(combined.head())

# Add the area as a column
combined['area'] = combined.area

# Take a subset for the Muette district
land_use_muette = combined[combined.district_name=='Muette']

# Visualize the land use of the Muette district
land_use_muette.plot(column='class')
plt.show()

# Calculate the total area for each land use class
print(land_use_muette.groupby('class')['area'].sum() / 1000**2)
                       class  id district_name  population  \
0               Water bodies  61       Auteuil       67967   
1    Continuous Urban Fabric  61       Auteuil       67967   
2  Roads and associated land  61       Auteuil       67967   
3          Green urban areas  61       Auteuil       67967   
4  Roads and associated land  61       Auteuil       67967   

                                            geometry  
0  POLYGON ((3751395.345451574 2890118.001377039,...  
1  (POLYGON ((3753253.104067317 2888254.529208081...  
2  POLYGON ((3751519.830145844 2890061.508628568,...  
3  (POLYGON ((3754314.716711559 2890283.121013219...  
4  POLYGON ((3751619.112743544 2890500, 3751626.6...  


class
Continuous Urban Fabric             1.275297
Discontinuous Dense Urban Fabric    0.088289
Green urban areas                   2.624229
Industrial, commercial, public      0.362990
Railways and associated land        0.005424
Roads and associated land           0.226271
Sports and leisure facilities       0.603989
Water bodies                        0.292194
Name: area, dtype: float64

Leave a comment