Scaling Geospatial Analysis with Python
Gridded geospatial datasets, such as those from climate models, Earth system models, and remote sensing platforms, are among the largest datasets available today, easily accumulating in terabytes and growing rapidly. In the first part of the geospatial data processing workshop series, we introduced Xarray as one of the commonly used Python packages for handling these data. We also briefly touched on the Dask library for parallelization and out-of-core computations.
ObjectiveThe main objective of this Part 2 workshop is to dive deeper into distributed geospatial data analyses using Dask. We will build upon the foundations of Part 1 and focus on scaling computational workflows to efficiently process large geospatial datasets on HPC systems.
This workshop provides hands-on experience working with large geospatial raster datasets using HPC resources provided by the RCC. Participants will learn how to efficiently load, process, analyze, and visualize large-scale gridded datasets from satellite sensors and climate models at scale. By the end of this workshop, attendees will be able to develop parallel computational workflows that can handle terabyte-sized geospatial datasets.
Prerequisites- Basic Python programming knowledge
- Familiarity with Xarray from Part 1 workshop (recommended but not required)
- Access to the RCC HPC environment (instructions will be provided prior to workshop)
All materials, including Jupyter notebooks, example datasets, and setup instructions will be provided to participants one week before the workshop.
Wednesday, May 21, 2025 - 13:00 to 15:00