Lesson 1. Point aggregation#
Aggregation of Cell Towers at country level worldwide#

Introduction#
In this Lesson, we are going to aggregate a global dataset of Cell Towers at country level. Commonly, the aggregation points-polygons or polygons-points is done using Spatial Join which is incorporated in the common Python library Geopandas for spatial analysis. Geopandas has improved considerably and the aggregation runs quite fast. But, as we are exploring parallelization, we will test the library Dask which is designed to use the computational resources in parallel. Also, there is an integration to Geopandas which is Dask-Geopandas that will be used to compare the time processing with and without parallelization.
This exercise is designed for teaching purposes of Spatial Data Science with High Performance Computing (HPC) at Aalto University. Thanks to the computational resources provided by CSC this exercise was tested in the Puhti supercomputer using Jupyter sessions.
Objective#
To improve the processing time of a spatial join at a global level using Dask-Geopandas
Datasets#
Cell Towers from OpenCellID#
The database of Cell Towers was downloaded from OpenCellID and stored in Allas for your availability. The compressed file is ~1 GB and decompressed is ~5 GB. You will download it directly on your supercomputer.
Country admin level#
The country border layer was downloaded from Natural Earth and stored in Allas for your availability. The Geopackage file is ~4 MB.
Output#
The outputs of the spatial join are:
(Polygon) Country borders with numbers of Cell Towers
(Points) Cell Towers with attribute based con overlapping country
GIS env#
This lessons uses Geoconda 🌎
HPC resources#
CSC Puhti supercomputer:
Partition: small
CPU Cores: 8
Memory (GB): 64
Local Disk (GB): 10
Download dataset from Allas#
The datasets for this practice are stored in Allas and can be downloaded to your local using the next commands. Simply, run the next cells.
!wget -N https://a3s.fi/L2-CellTowers/L2-CellTowers-data.csv.gz
--2025-09-23 09:47:54-- https://a3s.fi/L2-CellTowers/L2-CellTowers-data.csv.gz
Resolving a3s.fi (a3s.fi)... 86.50.254.18, 86.50.254.19
Connecting to a3s.fi (a3s.fi)|86.50.254.18|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘L2-CellTowers-data.csv.gz’ not modified on server. Omitting download.
!wget -N https://a3s.fi/L2-CellTowers/L2-CountryAdmin-data.gpkg
--2025-09-23 09:47:54-- https://a3s.fi/L2-CellTowers/L2-CountryAdmin-data.gpkg
Resolving a3s.fi (a3s.fi)... 86.50.254.19, 86.50.254.18
Connecting to a3s.fi (a3s.fi)|86.50.254.19|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘L2-CountryAdmin-data.gpkg’ not modified on server. Omitting download.
Decompress the Cell Tower data.
!gzip -d -f -k L2-CellTowers-data.csv.gz
Hands-on coding#
Follow the instructions and run every cell in the supercomputer.
Importing Python libraries#
Be sure that you have installed the dask_geopandas in your environment. Get familiar with this library reading a bit the Dask-Geopandas Documentation
import os
# -- for GIS
import pandas as pd
import geopandas as gpd
import dask_geopandas as geodask
import dask
import pyarrow
import numpy as np
# -- for Visualization
import matplotlib.pyplot as plt
import pydeck
import datashader as ds
import datashader.transfer_functions as ds_function
from datashader.utils import export_image
from functools import partial
import colorcet as cc
import time
import warnings
warnings.simplefilter("ignore")
# results folder
if not os.path.exists('output'):
os.makedirs('output')
Read country layer#
We will use the country’s administrative border for the Spatial Join. Beforehand, we do a test using a single country with the spatial operation within.
# read country layer
world_admin = gpd.read_file('L2-CountryAdmin-data.gpkg')
# clean and get needed columns
world_admin = world_admin.dropna()
world_gdf = world_admin[['iso3', 'name', 'continent', 'geometry']]
world_gdf.head()
| iso3 | name | continent | geometry | |
|---|---|---|---|---|
| 0 | UGA | Uganda | Africa | MULTIPOLYGON (((33.9211 -1.00194, 33.92027 -1.... |
| 1 | UZB | Uzbekistan | Asia | MULTIPOLYGON (((70.97081 42.25467, 70.98054 42... |
| 2 | IRL | Ireland | Europe | MULTIPOLYGON (((-9.97014 54.02083, -9.93833 53... |
| 3 | ERI | Eritrea | Africa | MULTIPOLYGON (((40.13583 15.7525, 40.12861 15.... |
| 5 | MNG | Mongolia | Asia | MULTIPOLYGON (((116.71138 49.83047, 116.64665 ... |
Country visualization#
We will create a global view of the countries using Pydeck with interactivity.
# add continent color
cc_admin = {'Africa':[12, 213, 204],
'Asia':[221, 76, 8],
'Europe':[8, 173, 225],
'Americas':[12, 211, 39],
'Oceania':[194, 11, 212]
}
# in a new column
world_gdf['cc_admin'] = world_gdf.continent.apply(lambda x: cc_admin[x])
# add country name
names = gpd.GeoDataFrame()
names["geometry"] = world_gdf.geometry.centroid
names["name"] = world_gdf.name
%%time
# view state pydeck
view_state = pydeck.ViewState(latitude=40, longitude=0, zoom=0)
# set height and width variables
view = pydeck.View(type="_GlobeView", controller=True)#, width=800, height=800)
layers = [
pydeck.Layer(
"GeoJsonLayer",
id="world-admin",
data=world_gdf,
stroked=True,
filled=True,
get_fill_color="cc_admin",
get_line_color=[230, 230, 230],
line_width_min_pixels=1
),
pydeck.Layer(
"TextLayer",
data=names,
get_position="geometry.coordinates",
get_size=10,
get_color=[70, 70, 70],
get_text="name",
),
]
deck = pydeck.Deck(
views=[view],
initial_view_state=view_state,
layers=layers,
map_provider=None,
# Note that this must be set for the globe to be opaque
parameters={"cull": True},
)
# deck.to_html("output/map.html")
CPU times: user 505 ms, sys: 15.9 ms, total: 521 ms
Wall time: 529 ms
deck