Hydroinformatics Blog - Running research models for hands-on hydrology education
Posted Sep 14, 2022
Hydroinformatics Blog Post
Organized by the CUAHSI Informatics Standing Committee. Contributions are welcome, please contact Veronica Sosa Gonzalez at email hidden; JavaScript is required.
By: Bart Nijssen, Allan & Inger Osberg Professor and Chair in Civil and Environmental Engineering at the University of Washington
Andrew Bennett, postdoc in Hydrology and Atmospheric Sciences at the University of Arizona
Models are an integral part of hydrologic research and investigation. Some hydrologists spend much of their time in the development and application of hydrological models, while others focus their research on field investigations. But all of us employ models to synthesize observations and test hypotheses. These models take many different forms, from statistical relationships, to lumped catchment models, to process-based spatially-distributed models, with an increasing interest in data-driven deep learning models.
Because of the ubiquitous role that models play in hydrology, it is important for students, particularly at the graduate level, to gain hands-on experience with model selection, configuration, application, and analysis. But in practical terms this can be challenging. Research models often have steep learning curves, minimal user interfaces, and require computing environments (operating systems and libraries) that may be different from what students have on their laptops or desktops. This creates hurdles both for the students and the instructor. Students get frustrated and discouraged when they get bogged down because they cannot get the models to run and instructors have no easy way to troubleshoot all software challenges that students encounter.
Over the past few years, we have made efforts to develop and implement solutions that allow graduate students to use a research hydrological model as part of their coursework. The solutions we have come up with build on common tools that have been built by the hydrology and data science communities. We combine a Jupyterhub with python-based Jupyter notebooks and a python-based model API to run a hydrological research model named the Structure for Unifying Multiple Model Alternatives (SUMMA; Clark et al., 2015). We have used this configuration as part of graduate courses at the University of Washington and the University of Saskatchewan, as part of WaterHackWeeks, and as part of a snow course in CUAHSI’s Virtual University. For some of these courses we have used a Jupyterhub run by CUAHSI.
Method and Infrastructure
SUMMA was developed as a hydrological modeling platform that would allow modelers to select and configure model process representations and the spatial representation of the modeling domain (Clark et al., 2015). The model is programmed in Fortran and relies on a few open-source software libraries, such as NetCDF, which can make it difficult for novice users to create a model executable. We have made it possible for users to circumvent this process by creating a conda recipe that allows users to install SUMMA using the conda package manager, which many already use to manage their python installations (using the conda-forge channel).
SUMMA itself does not have much of a user interface. It consists simply of a command-line executable, which takes a model configuration file as one of its arguments. This model configuration file points to additional input files, which contain the model parameters, the model initial state, and the time-varying boundary conditions (mostly atmospheric conditions). These files are either in plain text or NetCDF format. Model output files are all in NetCDF format. For many users it can be challenging to configure a model instance from scratch, manipulate model inputs to conduct model experiments, and analyze model outputs. To facilitate this process, a python wrapper for manipulating, running, managing, and analyzing SUMMA model setups was developed (pysumma; discussed for example in Choi et al., 2020). This wrapper can also be installed with conda (using the conda-forge channel) and it is sufficiently powerful and flexible to also facilitate the model workflow for more experienced users. Pysumma has basic functionality to manipulate model input files, run model simulations, and plot and analyze model output, and it can also be used to perform model sensitivity experiments in which model options and parameters are systematically varied.
For course applications, we have combined the above functionality to create custom environments that students could access via a Jupyterhub and in which they conducted all their model experiments in python-based Jupyter notebooks. This solved some of the problems we laid out in the first section. Students were not required to install any software on their local machines and could run SUMMA in a controlled environment in the cloud. This also reduced the burden on the instructor. Because everyone worked in the same computing environment, it was also easier to troubleshoot problems, since errors were reproducible. Students were provided with packaged model setups and sample notebooks that allowed them to focus on hydrology more than software management. Providing packaged model setups has allowed us to design specific exercises relating to multiple hydrologic processes including snow modeling, land-atmosphere interactions, and streamflow forecasting. This prepackaged setup is particularly important for short courses, hackweeks, and other intensive course environments, where there is little time to troubleshoot and where students participate with widely varying backgrounds.
Results and Conclusion
We have used the above setup successfully for a number of courses over the last few years. But we would be amiss if we leave the impression that all our problems are solved. Despite our best efforts, challenges remain. SUMMA is a research model and is subject to change on a regular basis. It can be difficult to maintain consistency between SUMMA and pysumma or between the latest version of SUMMA and model setups that were developed using earlier SUMMA versions. This is challenging in a teaching schedule in which a course is taught maybe once a year and where SUMMA or pysumma (or both) have undergone significant changes between successive course offerings. We have had to scramble repeatedly just before a course was taught to make sure that everything was aligned as expected. We have also developed binder instances to introduce new users to SUMMA and pysumma, but these instances remain fragile and break regularly because of various system updates. Despite these challenges, the JupyterHub, conda, pysumma, and SUMMA combination has allowed us to introduce graduate students to research hydrological models and to perform meaningful model experiments.
Additional Resources
SUMMA:
Source code: https://github.com/CH-Earth/summa
Documentation: https://summa.readthedocs.io/en/latest/
Conda-forge: https://anaconda.org/conda-forge/summa
pysumma:
Source code: https://github.com/UW-Hydro/pysumma
Documentation: https://pysumma.readthedocs.io/en/latest/
Conda-forge: https://anaconda.org/conda-forge/pysumma
Cloud based tutorial: https://notebooks.gesis.org/binder/v2/gh/UW-Hydro/pysumma/develop
This cloud based tutorial contains four notebooks that showcase some of pysumma’s functionality. It uses a SUMMA setup to perform and analyze a brief model simulation and then demonstrates how to use pysumma to modify SUMMA settings. Note that it can take a few minutes for the binder to launch.
Acknowledgments
A lot of people have contributed over the years to the development of these course setups, either by providing data sets, by providing sample notebooks, by teaching the courses, or by using the setups as part of their coursework. In particular, we’d like to thank Jessica Lundquist and Nicoleta Cristea at the University of Washington, Martyn Clark and Wouter Knoben at the University of Saskatchewan, and Young-Don Choi and Jonathan Goodall at the University of Virginia.
References
Choi, Y.-D., J. L. Goodall, J. M. Sadler, A. M. Castronova, A. Bennett, Z. Li, B. Nijssen, S. Wang, M. P. Clark, D. P. Ames, J. S. Horsburgh, H. Yi, C. Bandaragoda, M. Seul, R. Hooper and D. G. Tarboton, 2020: Toward open and reproducible environmental modeling by integrating online data repositories, computational environments, and model application programming interfaces". Environmental Modelling and Software, doi:10.1016/j.envsoft.2020.104888.
Clark, M. P., B. Nijssen, J. Lundquist, D. Kavetski, D. Rupp, R. Woods, J. Freer, E. Gutmann, A. Wood, L. Brekke, J. Arnold, D. Gochis, R. Rasmussen, 2015: A unified approach for process-based hydrologic modeling: Part 1. Modeling concept. Water Resources Research, doi:10.1002/2015WR017198.