Possible role of anthropogenic climate change in the record-breaking 2020 Lake Victoria levels and ﬂoods

. Heavy rainfall in eastern Africa between late 2019 and mid 2020 caused devastating ﬂoods and landslides throughout the region. These rains drove the levels of Lake Victoria to a record-breaking maximum in the second half of May 2020. The combination of high lake levels, consequent shoreline ﬂooding, and ﬂooding


Introduction
Between late 2019 and mid 2020, eastern Africa experienced heavy rainfall that led to flooding and landslides across the region, displacing over a million people according to some sources 1 and causing hundreds of casualties 2 .In 2019, the rainy season of October, November, and December (OND, known as the short rains) was one of the heaviest seen in the region in the last 3 decades (Wainwright et al., 2021a).Wet conditions compared to the climatological average continued into the 2020 rainy season of March, April, and May (MAM, known as the long rains), causing additional floods and landslides in 2020.The heavy rains aggravated one of the most serious desert locust outbreaks the region has seen in decades.Moreover, this occurred concurrently with the COVID-19 pandemic, setting the stage for a perfect storm of compounding impacts on people's lives and livelihoods 3 .Lake Victoria, the second largest freshwater lake in the world, shared between Kenya, Uganda, and Tanzania, also received above-average precipitation.The lake's levels began to rise in late September 2019 until reaching record-breaking levels in mid May 2020, thereby exceeding the previous maximum levels measured in 1964 (Fig. 1).From April 2020, floods were reported in the Lake Victoria basin, both along the lake shores and in the floodplains of rivers flowing into the lake.For example, in Kenya, an estimated 40 000 people were displaced when the Nzoia River burst its banks in early May 2020 4 .In Tanzania, in the Kagera and Mara basins, 5000 people were displaced due to flash and river floods between March and May 2020 5 .In Uganda, lake shoreline flooding affected the cities of Entebbe and Kampala 6 , and over 3800 people were displaced from the lake islands of the Mayuge District7 .Some media8 and government reports (e.g.Government of Kenya and UNDP, 2021) linked the heavy precipitation and floods to anthropogenic climate change, but the connection has not been scientifically investigated with extreme event attribution methods so far.
This study aims to investigate whether human-induced climate change contributed to the probability and magnitude of the flooding and record-breaking lake levels observed in 2020 in the Lake Victoria basin by following an established protocol for probabilistic extreme event attribution (Philip et al., 2020).Event attribution studies classically define an extreme event based on its meteorological driver.For example, previous attribution studies have mostly defined flood events based on accumulated precipitation amounts (e.g.Otto et al., 2018b;Philip et al., 2018a).Some notable exceptions have extended the analysis, defining the event based on hydrological variables instead (e.g.Pall et al., 2011;Schaller et al., 2016;Philip et al., 2019).Here, we expand on the classical framework by focusing on an impact-relevant variable, namely by defining the flood event based on lake levels.
The eastern Africa region is comparatively under-studied in relation to flood attribution, with most previous studies having focused on drought events, generally finding either no attributable role of anthropogenic climate change (e.g.Uhe et al., 2018;Philip et al., 2018b;Otto et al., 2018a;Kew et al., 2021) or a significant increase in the likelihood of drought events (e.g.Funk et al., 2016Funk et al., , 2019;;Marthews et al., 2019;Kimutai et al., 2023), depending on the specific location, framing, and variable being attributed in the study.One study has analysed the flood-inducing heavy long rains seasons that occurred in Kenya in 2012Kenya in , 2016Kenya in , and 2018, finding no significant trend attributable to human-induced climate change (Kimutai et al., 2022).To our knowledge, this study is the first to use water balance or hydrological modelling to attribute flood events in the region.
To study the floods, we follow a three-step methodology.First, we estimate the flooded area and number of people impacted through a remote sensing analysis.We then use a water balance model for Lake Victoria to reconstruct historical lake levels and identify which water balance terms drove the 2020 flooding.Finally, we use the water balance model as an impact model within a probabilistic extreme event attribution framework to detect the role played by anthropogenic climate change on the observed rapid rise in lake levels.We compare our estimate of impact with emergency databases and media and government reports and frame the results from statistical attribution within the context of previous research on changing hydro-climatic conditions in the region and on other possible drivers of the floods.

Event definition
In this study, we focus on lake levels to define the 2020 flood event, as (i) lake levels are closer to flooding impacts compared to accumulated precipitation amounts, which are the proximate meteorological driver of the event, and (ii) the lake levels were record-breaking in 2020, making headline statements in media reports and raising public interest.Furthermore, since tributary river floods are aggravated by backwater effects when lake levels are high (WMO et al., 2004), we assume that (iii) the lake levels are a proxy for the flooding of tributary rivers.Finally, (iv) the long historical time series of lake level measurements allows for more robust statistical attribution statements.
In the 8 months between September 2019 and May 2020, lake levels rose by 1.44 m, reaching the record-breaking level of 13.46 m measured in situ on 17 May 2020 (Fig. 1).Of this rise, 84 % (1.21 m) occurred in the 6 months between November 2019 and May 2020.We define the 2020 flood event as a 6-month rate of change in levels as extreme as that observed in the lead-up to May 2020.By using the rate of change in lake levels instead of absolute lake levels, we focus on signals in seasonal and year-to-year variability and limit the influence of decadal trends.The choice of a 6-month time window reflects the balance between, on the one hand, limiting the influence of decadal trends, and, on the other hand, defining the event in a way that represents the slow accumulated response of lake levels to seasonal accumulations of precipitation (Khaki and Awange, 2021).We test the sensitivity to these choices in Sect.3.3 and Appendix Sect.B3.

Previous variations in lake levels
Lake level fluctuations are the result of the lake's water balance, which consists of precipitation on the lake surface (∼ 70 %) and inflow from tributary rivers (∼ 20 %-30 %) as input terms and evaporation from the lake surface (∼ 70 %-80 %) and outflow from the Nalubaale dam complex in Jinja (∼ 20 %-30 %) as output terms (Vanderkelen et al., 2018a).Lake precipitation and inflow control seasonal and interannual lake level variability, as evaporation and outflow are characterized by lower variability (Sene et al., 2021).Outflow from the lake is managed as a function of lake levels following the Agreed Curve (Sene, 2000, see Sect. 2.1.2).Lake Victoria's levels have varied by over 3.2 m since the beginning of instrumental measurements in the late 19th century (Fig. 1).Seasonal variations in lake levels are generally small compared to interannual variations (Sene et al., 2021).In 1954, the first dam of the Nalubaale dam complex, which controls the lake outflow and is located near Jinja, Uganda, was completed (Sutcliffe and Petersen, 2007).Subsequently, a remarkable spike in lake levels occurred in the early 1960s, which has been attributed to an increase in eastern African precipitation that affected the levels of multiple lakes in the African Great Lakes region (Sene et al., 2021;Kite, 1981).A period of generally declining lake levels occurred from the mid 1960s to the mid 2000s, which was linked to a combination of low precipitation and excessive release from the lake's dam (Vanderkelen et al., 2018b;Sene et al., 2021).From then on, levels show a generally positive trend and increased by approximately 3 m between 2006 and 2020.A particularly rapid increase in levels occurred between late 2019 and mid 2020, and the levels measured in May 2020 broke the previous 1964 record by approximately 7 cm.

Precipitation variability, extremes, and model representation in eastern Africa
The Lake Victoria basin is located in the African Great Lakes region and characterized by a bimodal rainfall distribution pattern, with rains concentrated in the "long rains" season in March, April, and May and the "short rains" season in October, November, and December (Thiery et al., 2015;Vanderkelen et al., 2018a).The region exhibits strong interannual variability in precipitation, influenced by the El Niño-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD) (Nicholson, 2017;Ummenhofer et al., 2009;Black, 2005;Palmer et al., 2023).The spatial distribution of precipitation in the basin is influenced by topography and the presence of the lake, with high accumulated precipitation amounts and a tendency for hazardous night-time thunderstorms over the lake surface (Thiery et al., 2016;Van de Walle et al., 2020).The heavy 2019 short rains rainy season in eastern Africa was linked to a strong positive IOD event (Wainwright et al., 2021a;Nicholson et al., 2022;Khaki and Awange, 2021), with anomalies in sea surface temperatures leading to weakened westerlies in the Indian Ocean and wetter than usual conditions in eastern Africa (Wainwright et al., 2021a;Black, 2005;Nicholson, 2017).Global and regional climate models generally project an increase in average annual precipitation amounts over eastern Africa with climate change (e.g.Rowell et al., 2015;Akurut et al., 2014;Dunning et al., 2018;Souverijns et al., 2016;Olaka et al., 2019), particularly during the short rains (Palmer et al., 2023), as well as an increasing frequency of extreme positive IOD events (Cai et al., 2014(Cai et al., , 2018)).At the same time, there is evidence of biases in coupled climate models in representing seasonal precipitation in eastern Africa, particularly with respect to the long rains (see Discussion Sect.4; Wainwright et al., 2019;Palmer et al., 2023;Ayugi et al., 2021).Nonetheless, since our study is not restricted to the long rains season, and since coupled global climate models (GCMs) remain invaluable tools to simulate factual and counterfactual (i.e. in the absence of anthropogenic cli-  , 1896-1948), daily in situ measurements from the WMO Hydrometeorological Survey (1948-1992), and satellite-derived 10-daily measurements from the Database for Hydrological Time Series of Inland Waters (DAHITI) (in m a.s.l.) converted to in situ .mate change) climate conditions in the most complete way (Otto, 2017), extreme event attribution studies of hydrological changes in the region using coupled GCMs and other modelling setups can still contribute to improving our understanding of ongoing changes in the region (e.g.Philip et al., 2018b;Kew et al., 2021;Kimutai et al., 2022Kimutai et al., , 2023)).

Remote sensing imagery and population data
The spatial extent of the flooding in the Lake Victoria Basin is estimated by applying the HASARD flood detection algorithm (Sect.2.2.1) to remote sensing imagery from the Sentinel-1 and Sentinel-2 missions of the Copernicus programme of the European Union.We analyse Sentinel-1 level 1 ground range-detected C-band synthetic aperture radar (SAR) over a 3-month window from early April to the end of June 2020 (5 April 2020-1 July 2020).This period is centred around 17 May 2020, when lake levels reached their record high, and spans the period of reported flooding impacts in media reports and emergency and disaster databases, such as the Emergency Events Database (EM-DAT) of the Centre for Research on the Epidemiology of Disasters (CRED).SAR imagery is well suited for flood detection, as it provides imagery throughout day and night in all weather conditions (Chini et al., 2020).The imagery, collected in Interferometric Wide Swath mode, has a spatial resolution of 5 by 20 m, and a combined cycle revisit time of 6 d at the latitude of Lake Victoria.In addition, the algorithm uses optical imagery from the Sentinel-2 mission for the same period and spatial extent as a secondary data source.Sentinel-1 and Sentinel-2 imagery is accessed and processed through the Geohazards Exploitation Platform (GEP) operated by Terradue and developed in the framework of the European Space Agency Thematic Exploitation Platforms (TEP) and the Web Advanced Space Developer Interface (WASDI) operated by WASDI (Luxembourg) with Earth observation (EO) services developed by LIST (Luxembourg).To correct for permanent waterbodies that are erroneously identified as flooded, we use the waterbody mask of the Copernicus Global Digital Elevation Model at 30 m resolution (COPDEM GLO-30;Fahrland et al., 2020).
High-resolution gridded population data are obtained from the WorldPop database for Kenya, Uganda, and Tanzania (Appendix Fig. C1).The dataset is based on 2020 census data from the three countries, disaggregated based on building footprints and ancillary geospatial datasets (top-down constrained data, Stevens et al., 2015;WorldPop, 2018), and has a spatial resolution of 3 arcsec (approximately 100 m at the Equator).

Lake level observations
A time series of lake level measurements from 1896-2021 is assembled from different sources.For the period 1 January 1948-1 August 1996, daily measurements recorded in situ at Jinja are available from the World Meteorological Organization (WMO) Hydrometeorological Survey (hereafter Hydromet; WMO-UNDP, 1974).The data gaps in the years 1977 (whole year), 1978(9-31 August), 1979(15-31 December), 1979 (1 January-9 May), 1981 (1 October-31 December), and 1982 (15 July-2 December) are filled through linear interpolation.From 27 September 1992 to 2021, satellite-derived measurements are obtained from the Database for Hydrological Time Series of Inland Waters (DAHITI) at an approximately 10-daily resolution.In situ Hydromet measurements are converted to absolute levels in metres above sea level with a geoid datum and are corrected to match satellite-derived DAHITI measurements by adding the remaining average difference between the two datasets for the overlapping period (1992)(1993)(1994)(1995)(1996), as in Vanderkelen et al. (2018a).The total resulting geoid correction applied to the Hydromet time series to obtain absolute lake levels in metres above sea level is 1123.32 m.
Furthermore, the near-daily in situ lake level measurements are supplemented by a monthly time series from the UK Centre for Ecology and Hydrology (Sene et al., 2021;Sutcliffe and Petersen, 2007) for the period 1896-1948.We thus create a single 127-year lake level time series, which we use in the observational attribution analysis.We test the attribution statement for sensitivity to the different temporal resolutions of the older and more recent data and find a similar attribution signal when the data are artificially upscaled to monthly resolution.

Observed global mean temperatures
As a measure of anthropogenic climate change, we use a time series of global mean surface temperature (GMST) obtained from the National Aeronautics and Space Administration (NASA) Goddard Institute for Space Science (GISS) surface temperature analysis (GISTEMP; Hansen et al., 2010;Lenssen et al., 2019).The time series is expressed as an anomaly relative to the 1951-1980 global average.A 4-year running mean low-pass filter is applied to remove higherfrequency variability and signals linked to ENSO, as recommended in Philip et al. (2020).

Observational data for water balance terms
The water balance of Lake Victoria is modelled using an updated version of the model described in Vanderkelen et al. (2018a) using observational data for the period 1983-2020.
As input data, the water balance model (WBM) employs daily data for precipitation over the lake and basin and lake evaporation and a time series of dam outflow.
Daily observational gridded precipitation data are obtained for 1983-2020 from the satellite-derived dataset Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks -Climate Data Record (PERSIANN-CDR; Ashouri et al., 2015) at a 0.25°spatial resolution (approximately 28 km at the Equator).This dataset has been shown to perform better than other satellite-derived and reanalysis-based products in the study area (Nicholson and Klotter, 2021).Missing data occur mostly in the first decades of the dataset (419 total missing days spread across 32 years, of which 95 % are between 1983 and 1999 and 5 % are between 2007 and 2014, Appendix Fig. C2), with a third of the missing days concentrated in the 2-year period 1983-1984, and thus we restrict our analysis of the precipitation anomaly to the period 1985-2020 (see Appendix Sect.B1).
Daily inflow, evaporation from the lake surface, and outflow from the dam are calculated as in Vanderkelen et al. (2018a).Inflow is estimated from precipitation based on land use, soil type, soil hydrological characteristics, and antecedent moisture conditions using the USDA curve number method (USDA-SCS, 2004, see further details in Appendix Sect.B2).Evaporation from the lake surface is estimated based on the latent heat flux term simulated by the regional climate model COSMO-CLM 2 forced with ERA5 reanalysis data over the African Great Lakes region for the period 1996-2008 (Thiery et al., 2015(Thiery et al., , 2016)).The latent heat flux is converted to an evaporated water amount by dividing the flux term by the latent heat of vaporization of water, held constant at 2.5 × 10 6 J kg −1 .A yearly climatology of evaporation is calculated by averaging each day across all calendar years, and the resulting climatology is held constant for all WBM simulation years.Outflow is obtained from measurements at the Jinja-Nalubaale dam complex, and in periods without observations it is estimated using the Agreed Curve equation.This relationship prescribes the volume of water that should be released each day from the dam as a function of lake levels and is the object of international agreements between Uganda and downstream countries.The relationship aims to balance water availability at the lake and downstream in the Nile Basin with hydropower requirements at Jinja by mimicking natural outflow.Mathematically, the Agreed Curve is expressed as follows (Sene, 2000): where Q out is outflow at Jinja (m 3 s −1 ) and L indicates in situ lake levels (m).The outflow time series for the period 1950-2006 from Vanderkelen et al. (2018a) is extended for the periods 1948-1950 and 2006-2020 using the Agreed Curve and from March 2020 to December 2021 with daily outflow measurements made at Jinja.The time series is overall similar to the theoretical amount prescribed by the Agreed Curve but shows deviations in certain periods (Appendix Fig. C3).
All gridded input data to the WBM are cropped to the study area (5°S-2°N, 28-36°E) and remapped to the resolution of the WBM (0.065°∼ 7 km) using second-order conservative remapping.

Climate model data for water balance terms
To isolate the effect of anthropogenic climate change on lake level variations, we force the WBM with daily precipitation simulated by a subset of global climate models (GCMs) participating in CMIP6 and the Detection and Attribution Model Intercomparison Project (DAMIP).Simulations from six models are used, namely CanESM5, CNRM-CM6-1, GFDL-ESM4, IPSL-CM6A-LR, MIROC6, and MRI-ESM2-0, with one ensemble member each (see experiment descriptions in Table A1).The data have previously been bias adjusted and statistically downscaled to a spatial resolution of 0.5°(∼ 55 km at the Equator) within the Inter-Sectoral Impact Model Intercomparison Project Phase 3b (ISIMIP3b) using the trend-preserving ISIMIP3BASD method (Lange, 2019a(Lange, , 2020(Lange, , 2021) ) and the W5E5 observational dataset, which is a bias-adjusted version of ERA5 (Lange, 2019b;Cucchi et al., 2020).
To simulate lake levels under "factual" climate conditions, the WBM is driven by GCM simulations with all historical forcings included (hereafter referred to as hist simulations), whereby observed trends of atmospheric greenhouse gas concentrations, from both anthropogenic and natural sources, are prescribed.Historical climate simulations (1850-2014) are complemented with simulations under the Shared Socioeconomic Pathway and Representative Concentration Pathway SSP3-RCP7.0for the period 2015-2020.Lake levels in a "counterfactual" hypothetical world without anthropogenic climate change are simulated by driving the WBM with simulations from the same GCMs, with only natural forcings, such as solar variability and volcanic emissions (hereafter referred to as hist-nat simulations) for the period 1850-2020.For each GCM experiment, an annual time series of simulated global mean surface temperature (GMST) with a 4-year moving average low-pass filter is derived and is used as a covariate in the statistical analysis.All gridded data are remapped to the WBM resolution using second-order conservative remapping.

Flood detection
We use the automated flood mapping algorithm HASARD (Chini et al., 2017) to identify flooded areas in the period of interest based on remote sensing imagery.The algorithm compares successive pairs of SAR images to detect per-pixel changes in the amplitude of the backscattered signal that indicate an area has been flooded.Flood maps are automatically combined to create a multi-temporal binary flood map showing the maximum cumulative flood extent and optical imagery is used to corroborate SAR-derived flood maps (Chini et al., 2017(Chini et al., , 2020)).
We apply HASARD on SAR Sentinel-1 and optical Sentinel-2 imagery with standard parameters (Ashman coefficient 2.4, HSBA depth −1, minimum blob size 150; see Chini et al., 2017, for details) over the 3-month interval from April to June 2020.Flood mapping initially detects large amounts of spurious flooding, including large parts of the lake surface that are identified as flooded due to waves causing surface roughness changes and consequent changes in backscatter amplitude between subsequent satellite images.We remove permanent water erroneously identified as flooded using the COPDEM GLO-30 permanent waterbody mask.Second, spuriously identified flooding outside the area of interest is removed with a buffer that only retains information within 50 km from the lake shores and within the lake basin, resulting in an area of approximately 72 000 km 2 that is analysed for potential flooding.To calculate flooded area, flood maps are reprojected to the UTM 36S geographic projection.Third, the cumulative binary flood map is remapped using nearest-neighbour remapping from its native 20 to 100 m horizontal resolution of the population maps and is multiplied with gridded population data to obtain the number of people affected.Fourth, we perform a case study on the highly impacted basins of the Nzoia and Yala rivers in Kenya.For the case study, SAR imagery is visually analysed using multi-temporal false-colour composites.Finally, we compare the estimated impact of flooded area and number of people affected, with grey literature, newspaper reports, and EM-DAT.All remote sensing analysis was carried out on the WASDI and GEP platforms.

Water balance model
Lake levels are simulated using an updated version of the WBM described in Vanderkelen et al. (2018a), whereby the water balance is calculated as follows: where L (m) indicates lake levels, P (m d −1 ) is over-lake precipitation, E (m d −1 ) is evaporation from the lake surface, ) is outflow from the Nalubaale dam complex, and A (m 2 ) is the lake area.The model runs at daily resolution ( t is equal to 1 d).Each term in the model is calculated in metres of lake level equivalent, assuming a constant lake area of approximately 66 800 km 2 .
To simulate observed lake levels for the period 1983-2020, the model is forced with observed over-lake precipitation, inflow based on observed basin precipitation, outflow time series, and model-based lake evaporation.The model is evaluated against observed lake levels and is used to determine the driving water balance terms of the 2020 flood event.

Statistical attribution methods
To estimate the role of anthropogenic climate change in the 2020 floods, we follow the probabilistic extreme event attribution methodology described in Philip et al. (2020) andvan Oldenborgh et al. (2021).The steps include (i) event definition, (ii) probability and trend calculation from observations, (iii) model validation, (iv) multi-model multi-method attribution, and (v) synthesis of attribution statements.More details on the methodology are given in the Supplement.
i. Event definition.We define the 2020 event in a univariate class-based way as the 6-month increase in levels observed between November and May 2020 (see also Sect.1.1).Based on this definition, the attribution variable used in this study is L t , for t = previous 180 d. (3) ii. Probability and trend calculation from observations.We calculate the return period of the flood event as the inverse of the probability of exceeding the L/ t magnitude observed in 2020 and estimate whether a change in return period due to anthropogenic climate change is detectable in observations.To this end, we first generate a "daily" time series of the attribution variable L/ t from observed lake levels for the period 1896-2020 by applying the time window t with a daily moving window.Next, we extract the annual block maxima of this time series and fit it to a non-stationary generalized extreme value (GEV) distribution, described by the location (µ), shape (ξ ), and scale (σ ) parameters.We model non-stationarity by applying the shift fit method described in Philip et al. (2020).This method assumes that the shape and scale parameters are constant, while the location parameter is modelled as a linear function of the smoothed GMST covariate (T ), which is taken as a proxy for anthropogenic climate change.We estimate the parameters of the linear model (µ 0 and µ 1 ), together with the shape and scale parameters, using maximum likelihood estimation.We then calculate the values of the location parameter in a "current" (µ new ) and a "pre-industrial" climate (µ ref ), defined, respectively, based on the GMST in 2020 and 1900: (5) Based on this fit, we calculate the return period, probability ratio, and change in magnitude of the flood event.
The probability ratio (PR) expresses the change in the probability of exceeding the magnitude observed in 2020 between the pre-industrial climate (p ref ) and the current climate (p new ): The change in magnitude expresses the difference between the magnitude of lake level rise observed in 2020 and the magnitude of lake level rise that has the same return period in a pre-industrial climate.To quantify uncertainty, 95 % confidence intervals (CI) for distribution parameters, PR, and change in magnitude are computed through bootstrapping using 1000 members with replacement.
iii.Model validation.Historical climate model simulations are evaluated by comparing their representation of the seasonal cycle and spatial pattern of precipitation in the Lake Victoria basin with observations.We then force the WBM with precipitation coming from historical climate model simulations for the period 1850-2020.Outflow is calculated using the Agreed Curve, and the observational lake evaporation climatology is held constant (see Sect. 2.1.4).The resulting lake levels are used to compute the annual block maxima time series of the variable L/ t, which is subsequently fitted to a non-stationary GEV distribution similar to observed lake levels but using a GCM-derived GMST time series as a covariate.The parameters of the resulting fits are compared to the observation-derived parameters.Following the method in Ciavarella et al. ( 2021), we exclude the GCMs for which the simulated precipitation results in very different GEV fits compared to the observational fits, namely where the shape and scale parameters do not overlap within confidence intervals with the observation-derived parameters.
iv. Multi-model attribution.To estimate the change in the return period of the flood event based on GCMs, we additionally fit non-stationary GEV distributions with the shift fit method to lake levels derived from hist-nat simulations as well as historical simulations.To account for model biases in simulated event magnitude, we identify the L/ t magnitude for which the return period in the historical GCM simulations matches the return period of the 2020 event, as recommended as a simple bias correction method in Philip et al. (2020).For every GCM simulation, we calculate the probability ratio and the change in magnitude using the same definitions for a current (GMST in 2020) and a pre-industrial climate (GMST in 1900) as in the observational analysis.Finally, for each model, we combine the results from historical and hist-nat simulations.To this end, we first calculate the PR between the probability of observing the event in a current climate in historical and hist-nat simulations.We then calculate the change in magnitude of an event with the same return period as the 2020 event in a current climate in historical and hist-nat simulations.nitude change with their 95 % confidence intervals, following Philip et al. (2020).To this end, the probability ratios and magnitude changes obtained in step (iv) are first averaged for all GCMs, assuming these are lognormally and normally distributed, respectively, using an "unweighted" synthesis methodology to avoid artificially reducing uncertainties.The resulting modelderived average is then averaged with the estimate obtained from observations in step (ii), which is treated as a separate sample that contributes to the final result.This means all climate models are collectively given the same weight as observations and that observations play a relatively large role in the final synthesis result.The synthesis step is carried out using the KNMI-WMO Climate Explorer.

Results
In this section, we first analyse the precipitation anomaly that drove the 2020 floods and estimate of the number of people impacted by the floods.We then carry out a sensitivity analysis of the event definition and analyse what water balance terms drove the lake level rise.Subsequently, we estimate the change in probability and magnitude of the flood event from observations, evaluate the WBM and GCMs, and carry out a multi-model attribution analysis.Finally, we present the synthesis of observational and GCM-derived attribution results.

Meteorological driver of the floods
The 2020 floods were driven by heavy precipitation in 2019 and 2020, which was above-average in nearly the entire study area (Fig. 2).The highest precipitation anomalies occurred over the lake, with values up to 493 mm yr −1 (averaged over both years) above the climatological mean, which corresponds to a 38 % positive anomaly (Fig. 2b, e).Averaged over Lake Victoria and its basin (outline shown in Fig. 2a), precipitation between May 2019 and May 2020 was consistently above average relative to the climatology (Fig. 3a).
The OND short rains season of 2019 ranks second wettest after 1997; the January and February dry season of 2020 ranks second wettest after 1998; and the MAM long rains season of 2020 ranks fourth wettest, after 2018, 1988, and 1990 (Fig. 3b-d) (Fig. 3b-d).Whereas none of the individual seasons was record-breaking in 2019 or 2020, accumulated precipitation during the 3-year period leading up to the flood event was above average (Fig. 4a), with 2020 ranking as the wettest year in the basin since 1985, and the 2-year period 2019-2020 and the 3-year period 2018-2020 breaking the record by an even greater margin (Fig. 4b).
Regression analysis shows generally weak trends in accumulated yearly and seasonal precipitation amounts (Appendix Fig. C4).A weak and non-robust positive temporal trend is visible in accumulated yearly precipitation over the lake and its basin between 1985 and 2020, linked to a neg-ative trend in the MAM long rains season, counterbalanced by a positive trend in the OND short rains season and weak positive trends in the January-February and June-September dry seasons.Considerable scatter is present around all trends, and there is larger uncertainty in precipitation amounts in the 1980s and 1990s due to more missing data in these early decades, which makes it difficult to robustly carry out trend analysis or compare precipitation in different years.Accumulated precipitation in the basin, in particular during the short rains, is strongly positively correlated with the Indian Ocean Dipole index during the same months (Appendix Fig. C5).

Estimation of flooded area and affected population
Based on remote sensing analysis, a total area of approximately 642.5 km 2 in the lake basin within 50 km of the lake shores is estimated to have been affected by flooding between April and July 2020 (Fig. 5).This corresponds to approximately 0.9 % of the 50 km buffer around the lake shores.Key areas identified as flooded include the basins of the Nzoia and Yala rivers and the Kisumu and Homa Bay Counties in Kenya; the floodplains of large rivers (including the Mara, Grumeti, Simiyu and Kagera rivers) in Tanzania; and shoreline and wetland locations near Masaka, Entebbe, and Kampala and along the coasts of lake islands in Uganda.Flooding is also detected along the shoreline of most of the lake.Within 50 km of the shores of Lake Victoria, a total of 29 070 people are estimated to have been affected by flooding between April and June 2020, which corresponds to about 0.12 % of the total population living in this area (23 million people).The affected population is identified throughout the area in both coastal and inland locations near river floodplains.
Detailed visual analysis of SAR images for the Nzoia River basin, which was reported as heavily affected in media, shows important flooding between April and May 2020.The area is mostly non-flooded on 2 April (Appendix Fig. C6a) and starts to show early signs of flooding in late April followed by important flooding on 8 May 2020 (Appendix Fig. C6b, d).By 20 May, large parts of the floods have receded, but some traces are still visible along the floodplain and in the southern and south-eastern sections of the area (Appendix Fig. C6c).Overlaying the area detected as flooded by the HASARD algorithm in the Nzoia basin between April and June 2020 with gridded population data allows us to identify where people were affected by flooding (Fig. 6).
Estimates of population affected by flooding in the Lake Victoria area and the larger eastern Africa region vary widely between media, grey literature, disaster response reports, and the disaster database EM-DAT (Table 1).In part this is because they refer to different geographical areas and time periods.The estimate of people affected by flooding over the larger eastern Africa region in 2019-2020 spans from 700 000 9 to over 2 million people 10 .The disaster database EM-DAT reports over 980 000 affected people and 326 casualties including all flooding events in Uganda, Kenya, and Tanzania for the period between April and November 2020.Filtering the EM-DAT entries to include all those that which include parts of the regions included in our study area (outline in Fig. 5) results in over 830 000 affected people and 292 casualties, with the highest number of people affected in Kenya.However, these EM-DAT entries include many administrative units that are far from Lake Victoria and therefore unrelated to our study area (Guha-Sapir et al., 2022, Table 1).
Analysis of media sources covering the studied regions give an estimate of approximately 32 500 to 54 800 affected 9 Wikipedia, https://en.wikipedia.org/wiki/2020,last access: 1 March 2024.
people aggregated over the three countries, which broadly agrees with our remote-sensing-based estimate (Table 1).For instance, in Kenya, media sources from May 2020 report 3000 people left homeless in the Budalangi constituency of Busia County11 (Fig. 5 box 1).As an effect of the Nzoia River flood in early May 2020 alone, UN OCHA reports at least 40 000 people were made homeless12 .Media sources report 400 families still displaced in August 2020 due to the Nzoia floods13 .Later in the year, in October 2020, the Kenya Red Cross society reported 7000 homes affected by Lake Victoria backflow in the Budalangi Constituency of Bu-  18 (Fig. 5 east of box 13), whereas important flooding was not identified using HASARD in these islands.In Tanzania, disaster response sources report approximately 5000 people impacted in the Kagera and Musoma regions19 .

Event definition
As outlined in Sect.1.1, we focus on the rate of change in lake levels ( L/ t) instead of on absolute lake levels to define the event, choosing a time window ( t) of intermediate length corresponding to 180 d, and subsequently extract annual block maxima of the L/ t time series.The 2020 event thus defined corresponds to a lake level increase of 1.21 m that occurred in the 180 d leading up to 17 May 2020, and is the third most extreme event since 1897, ranking after 1998 (1.39 m) and 1962 (1.30 m; Fig. 7).Lake levels usually Table 1.
Estimates of the number of people affected by flooding in 2020 in the Lake Victoria basin and larger eastern Africa region, compiled from different sources.(Fig. 7b).We test the sensitivity to this choice of event definition in Sect.B3.

Water balance modelling: model evaluation
The water balance model forced with observational data reproduces the observed lake levels reasonably well (Fig. 8).
The model generally captures the timing of increasing and decreasing levels, but sometimes underestimates or overestimates the magnitude of these variations resulting in a mean bias of 0.06 m and a root-mean-square error of 0.45 m.The large and consistent overestimation from 2005 to 2015 could be due to the modelled outflow, which was assumed to follow the Agreed Curve from 2005 on (Fig. C3), while in this period, the real outflow likely exceeded the Agreed Curve, resulting in lower lake levels (Vanderkelen et al., 2018a).Nevertheless, the model does not show systematic wet or dry biases, which justifies its use for the attribution analy-sis.Moreover, as the attribution variable is based on lake level variations, biases in absolute levels are less relevant.
The lake level peak in May 2020 is reproduced by the model, but underestimated by 0.41 m (Figs. 8 and C8a).Between May 2018 and January 2020, the model reproduces observational lake levels well, but from then on it consistently underestimates lake levels (Appendix Fig. C8a, b).The divergence between modelled and observed levels is fastest between January and May 2020 (Appendix Fig. C8c).
For the 180 d rate of change in lake levels, the WBM generally reproduces the time series derived from observations (Fig. 9) but tends to attenuate extremes (Fig. 9a, b, e).Accordingly, the distribution of ( L/ t) shows less extreme high and low values compared to observations (Fig. 9d).Furthermore, the modelled seasonality of L/ t is slightly shifted in time, leading observations by about 10 d to 1 month (Fig. 9c).In 2020, the maximum 180 d increase in levels is shifted in time in the WBM simulation compared to observations: in the former it is modelled between September 2019 and March 2020 (with a magnitude of 0.94 m), whereas in the latter it was observed between November 2019 and May 2020 (with a magnitude of 1.21 m).Nonetheless, the annual block maxima L/ t time series derived from modelled lake levels leads to an estimate of the rank of the 2020 event that is high and similar to observations, with the 2020 event ranking second after 1998 (Appendix Fig. C9).
Given (i) the overall skill of the observation-driven WBM simulation, (ii) the similarity of the rank of the 2020 event in the modelled and observed time series, and (iii) the application of a simple bias correction (Sect.2.2.3), we conclude that the WBM can be trusted to attribute the 2020 event in combination with observed lake levels.

Water balance modelling: analysis of drivers
The input terms of the lake's water balance reflect the seasonal cycle of precipitation in the basin, with peaks in overlake precipitation and inflow in the MAM and OND rainy seasons (Fig. 10a).Annually averaged based on the 1983-2020 period, over-lake precipitation supplies 125 mm per month (+75.7 %) and is approximately balanced by an evaporative loss of 123 mm per month (−74.7 %).Inflow provides 40 mm per month (+24.3 %) of input, and 42 mm per month (−25.3 %) is lost through outflow, agreeing with estimates in Vanderkelen et al. (2018a).Lake precipitation has the highest interannual variability (Fig. 10a).
Over-lake precipitation and inflow were generally above average between May 2019 and April 2020 (Fig. 10b-c and Appendix Fig. C10a).Both were particularly anomalous in October 2019 (Fig. 10b), when lake precipitation was a nearly 4 SD anomaly (330 mm) and inflow was a 3.5 SD anomaly (122 mm lake level equivalent) compared to the long-term mean for the month, and they both broke records since 1983.In the WBM simulation, the maximum 6-month L/ t ending in 2020 occurs between September 2019 and March 2020, with a magnitude of 0.93 m.This deviates from observations, where the maximum rise happens between November 2019 and May 2020 and has a magnitude of 1.21 m, which is further discussed in Sect.3.4.1.Between September 2019 and March 2020, accumulated overlake precipitation and inflow reached levels similar to their total annual long-term average (Appendix Fig. C10b).Lake precipitation saw an anomaly of +0.59 m (+72 %), inflow of +0.26 m (+93 %), outflow of +0.01 m (+40 %), and lake level equivalents compared to climatological average, resulting in a positive residual of approximately +0.75 m (Appendix Fig. C11).This is smaller than the full magnitude of the modelled event (+0.93 m) because 19 % of the 2020 event corresponds to the climatological average rise in lake levels for the period from September to March (+0.18 m), whereas 81 % of the event (+0.75 m) was due to anomalous precipitation and inflow, which were only partially balanced by above-average outflow following the rise in lake levels.Lake precipitation and inflow contributed 70 % and 30 %, respectively, to the anomalous lake level rise.Since this is similar to the historical proportion between the two input terms in the lake's water balance in climatology (see Sect. 1.2), in relative terms these can be understood to have contributed equally to the anomalous rise, although precipitation was a greater contributor in absolute terms.

Observational analysis: return period and trend analysis
The 2020 observed increase of 1.21 m is estimated to have a return period of 63.2 years in the current climate (CI 27-395 years; Fig. 11a-b).This implies that if we have no prior information on circulation, sea surface temperatures, dam management, or further increases in GMST, there is a 1.6 % chance each year of experiencing a 180 d lake level increase of 1.21 m in today's climate.to occur every few decades, or it could be quite a rare event, expected to occur only every few hundred years.In a preindustrial climate, the event has an estimated return period of 104 years (CI 43-1097 years), which results in a probability ratio of 1.7 (CI 0.3-3.9),indicating that the event is estimated to be 1.7 times as likely in the current climate compared to a pre-industrial climate.The confidence interval does however not exclude 1, meaning that uncertainty includes the possibility that no detectable change in the likelihood of the event has occurred.In a pre-industrial climate, lake levels would have risen 0.11 m (0-0.23 m) less than observed, with uncertainty including the possibility of no attributable change.Observational results for key distribution parameters and return periods are shown in Tables 2 and 3.The estimated return period of the event in the current climate is taken as the return period to calculate a model-specific magnitude threshold that represents the flood event in each climate model historical and hist-nat simulation pair.While some non-homogeneity is introduced in the time series due to a different temporal resolution of lake level observations in 1896-1948 (monthly) and 1948-2021 (daily to 10-daily), we test the sensitivity of the observational attribution to this, by artificially reducing the resolution of the entire lake level time series from daily to monthly and repeating the return period estimates.The results are robust, giving similar estimates of the return period of the event in the current climate (best estimate of 63.5 years, CI 27-426 years), and of the probability ratio (best estimate 1.4, CI 0.2-3.4) and magnitude change (best estimate +7 cm, CI −4 cm to +20 cm) compared to a pre-industrial climate.
Furthermore, we test the sensitivity of our estimates to the presence of overlapping blocks in the annual block maxima time series (see Sect.B3 and Appendix Fig. C7).We exclude the overlapping blocks by removing any year with a block ending between October and December.Results give similar estimates of the return period of the event in the current climate (best estimate of 64.8 years, CI 27-467 years) and of the probability ratio (best estimate 1.4, CI 0.2-3.4) and magnitude change (best estimate +8 cm, CI −4 cm to +23 cm) compared to a pre-industrial climate.

GCM evaluation
All GCMs, when used to force the WBM, underestimate the magnitude of a 63-year event compared to observations (Table 2).Nonetheless, since the WBM simulations also show this bias when driven by observational precipitation, this could be due to a bias introduced by using the WBM as well as GCM biases in representing precipitation.The loca- tion and scale parameters of all distribution fits agree well with each other and with observations (Table 2).While the observational fit results in a slightly positive shape parameter, all GCM-driven fits result in negative shape parameters.Nonetheless, the shape parameter is also slightly negative in the observationally driven WBM simulation, and the confidence intervals of the shape parameters of models and observed lake levels overlap for all models, except for MIROC6, which shows a very negative parameter.For this reason, we reject MIROC6 and exclude this model in further analysis.Both the seasonal cycle of basin precipitation (Appendix Fig. C14) and the spatial pattern (Appendix Figs.C12  and C13) are reasonably represented by all models.

Multi-model attribution
The attribution signal is similar in observed lake levels and historical climate model simulations.Based on WBM simulations driven with historical GCMs and applying a shift fit, a 1-in-63-year event in the current climate is modelled to be slightly rarer in a pre-industrial climate in all models, with best estimates of the pre-industrial return period ranging from 66 to 142 years.This leads to best estimates of probability ratios between the current and pre-industrial climates that are slightly above unity, ranging from 1.1 to 2.2 across historical simulations (Table 3, Method 2).Nonetheless, none of the confidence intervals for the probability ratios exclude unity.Similarly, all GCMs indicate an increase in the magnitude of the event between a pre-industrial and a current climate, with best estimates ranging from approximately 0.01 m to approximately 0.08 m.Nonetheless, the confidence intervals for the change in magnitude of individual models all include zero, suggesting that uncertainty due to natural variability is high.
The non-stationary fits based on counterfactual WBM simulations driven with precipitation from hist-nat (natural forcing only) GCM simulations, show probability ratios near unity and magnitude changes close to 0 (Table A2), indicating that there is no trend in the likelihood of the event due to natural forcings.When combining the historical and hist-nat simulations for each model, the best estimate is that the event has been made more likely and that the magnitude has slightly increased due to anthropogenic climate change (Table 3, Method 3).Nonetheless, all confidence intervals include the possibility of no attributable change, indicating large natural variability.Furthermore, the hist-nat simulations of CanESM5 and MRI-ESM2-0 have infinite upper bounds in the confidence intervals of the return period of the event in a current climate without anthropogenic climate change.This suggests that the event could be extremely unlikely in a counterfactual world but also that the uncertainty of a return period estimate based on these models is very high (Table A2).As a result, the upper bound of the probability ratio estimated combining historical and hist-nat simulations of these two models is also infinity (Table 3, Method 3).To synthesize the results of observations and all models, we cap the upper bound of the confidence interval of the PR from both models to 10 000, assuming anything higher than this to be an overestimation.

Hazard attribution synthesis
Synthesizing observations and models, the best estimate is that the event is approximately 1.8 times as likely in the present-day climate compared to a pre-industrial climate (CI 0.8-15.8,Fig. 12).Models and observations generally agree on a slightly positive best estimate for a PR but with a confidence interval that always includes unity.Further, the intramodel uncertainty due to internal variability is larger than the inter-model uncertainty due to model disagreements.The upper bound of the confidence interval of the probability ratio is determined by the chosen limit to the unbounded confidence intervals of the probability ratios of CanESM5 and MRI-ESM2-0, and it thus should be interpreted with caution.Further, the best estimate is that the magnitude of the event has been slightly increased by climate change and that https://doi.org/10.5194/esd-15-225-2024 Earth Syst.Dynam., 15, 225-264, 2024 Table 2. Validation results based on seasonal cycle, spatial pattern, and fitted scale σ and shape ξ parameters, with 95 % confidence intervals in brackets.Results are shown for observed lake levels for the period 1897-2020 (observations), lake levels simulated by the WBM driven by observational precipitation for the period 1983-2020 (observational WBM), and lake levels simulated by the WBM driven by GCM simulations.For observations and the observational WBM the magnitude of the 2020 event is shown.For GCMs the magnitude of a 63year event in the current climate estimated based on a non-stationary GEV fit is shown.the in a pre-industrial climate an event with a 63-year return period would have led lake levels to rise by 7 cm less than observed.Nonetheless, the confidence interval ranges from no attributable change in magnitude to a possible 14 cm attributable increase in lake levels, which would correspond to 9350 m 3 of water.
Although the best estimates indicate a slight increase in the likelihood and magnitude of the event in the current climate compared to a pre-industrial or counterfactual climate, the confidence intervals of the synthesized PR and magnitude change both include the possibility of a null signal.This indicates that uncertainty due to natural variability is large, and results include the possibility that there is no detectable change in the likelihood or magnitude of the event that is at-tributable to anthropogenic climate change.Further, the uncertainty estimated through bootstrapping is a measure of natural variability, but neglects epistemic model uncertainty, for example that related to the impact of anthropogenic climate change on atmospheric dynamics, and neglects the uncertainty linked to potential confounding factors that are not included in the statistical modelling applied here.This could point at a potentially larger true uncertainty than quantified here.Nonetheless, for a variable related to seasonal precipitation accumulations, which is less directly associated with the thermodynamical effects of anthropogenic climate change than short-duration precipitation extremes, and with no conditioning on modes of climate variability applied, the general agreement between models is conspicuous and points to

Discussion
The 2020 flooding in the Lake Victoria basin was a highimpact event, which affected tens of thousands of people.Not only the lake shorelines but also tributary rivers flooded.People were impacted both by being displaced and by damage to infrastructure and sources of livelihood.The event occurred while floods and landslides were affecting the wider eastern Africa region, and impacts were compounded by COVID-19 and a locust outbreak that damaged crops (Salih et al., 2020).The event was driven by heavy precipitation that lasted nearly a year and was linked to a positive IOD event, which is known to intensify OND short rains in eastern Africa (Wainwright et al., 2021a).The floods and their impacts were likely also influenced by land use patterns, the type and number of infrastructure and dykes present on rivers, the management of the Lake Victoria dam complex, and people's exposure due to the location of settlements in flood-prone areas.Given this complexity, the attribution carried out here is necessarily a partial study of the event.Nonetheless, it represents a first step towards disentangling the multiple drivers of the event and quantifying the role of anthropogenic climate forcing.
Areas identified as flooded through remote sensing analysis in this study overlap well with areas reported as affected in news and disaster response sources.The flood mapping adds spatial detail to sources that otherwise provide mostly county, district or regional-level information.There are however several ways in which the remote sensing analysis could be refined.( 2017,2019,2020).Since built-up areas are densely populated, underestimating floods in these areas likely leads to underestimating the number of people affected.Next, much of the identified flood occurred in farmed areas in floodplains, suggesting the floods had an impact on economic activity, which is not taken into account when defining impact only based on resident population affected.Furthermore, the HASARD algorithm overestimates flood over open waterbodies through the detection of waves on the water surface that temporarily increase surface roughness.This spurious flood signal is partly removed by using permanent waterbody masks, but some overestimation of flood could still Table 3.Estimated return periods, probability ratios, and magnitude changes of the flood event in a current and a pre-industrial climate based on observed lake levels for the period 1897-2020 (observations), lake levels simulated by the WBM driven by observational precipitation for the period 1983-2020 (observational WBM), and factual (historical) and counterfactual (hist-nat) climate model simulations.In Methods 1 and 2 "current" corresponds to a 2020 climate, while "pre-industrial" corresponds to a 1900 climate.In Method 3 "current" corresponds to a 2020 climate in historical simulations, while "pre-industrial" corresponds to a 2020 climate in hist-nat simulations.Only models that passed the evaluation are shown.be present, in particular around the lake shoreline.These sources of error could be estimated by comparing HASARDderived flood maps with high-resolution optical imagery over a small study area.

Data
The WBM performs well in the observational period, with the water balance of the lake closing without applying a residual term, in the same way as in Vanderkelen et al. (2018a, b).Our WBM simulations show that the rapid rise in lake levels was driven by anomalous precipitation and inflow, accumulated between late 2019 and mid 2020.The modelling setup does not account for various factors, which could be additional drivers.First, land use along rivers that are tributaries of the lake was reported in the media as a compounding factor due to decreased vegetation cover causing increased erosion, sediment transport, and siltation of river channels and higher peak discharge amounts (Mati et al., 2008;Mugo et al., 2020).The WBM uses land cover data prescribed from the Global Land Cover 2000 project (Mayaux et al., 2003) to calculate runoff from precipitation, but as this is not transient, the impact of land use and land cover change on runoff is not accounted for.For instance, we do not include potential changes such as wetland encroachment that could increase runoff into the lake.Second, the modelling setup assumes lake evaporation follows a climatology during the modelled period and thus omits interannual variations in lake evaporation.Third, other possible drivers of the flood extent and its impacts include human dam management, including of infrastructure along tributary rivers, which are not represented in our model, and outflow from the dam complex at Jinja, Uganda, for which data are not fully available for the 2019-2020 period.Finally, impacts are determined by the exposure and vulnerability of settlements and economic activities, with those located close to the lake shores, within wetlands, or in river floodplains more likely to be affected.The extent to which exposure and vulnerability changes drove flood impacts in 2020 is not quantified here.
The underestimation of the lake level rise simulated by the WBM between late 2019 and mid 2020 corresponds to a bias whereby the WBM mutes the magnitude of the most extreme 6-monthly variations in lake levels.For 2020, this bias could be due to (i) an underestimation of true precipitation amounts in the PERSIANN-CDR data product; (ii) uncertainties in the curve number method leading to an underestimation of true inflow; (iii) an overestimation of true evaporation from the lake surface; (iv) an overestimation of true outflow, which could have been below Agreed Curve levels; or (v) variations in other water balance terms (e.g.groundwater) that are not accounted for in the WBM but might lead the WBM to underestimate peaks in L/ t.Since observational outflow was used for the period March-May 2020, an overestimation of outflow could participate to the model bias in the first https://doi.org/10.5194/esd-15-225-2024 Earth Syst.Dynam., 15, 225-264, 2024 months of 2020 but is unlikely to be the main cause of the 2020 bias.
In terms of the event definition, the 180 d rate of change in lake levels was found to be a good compromise between representativity of the event and limiting the influence of decadal trends compared to raw lake levels, and allowed us to move beyond an attribution of a meteorological variable to the attribution of an impact-relevant variable (Otto, 2016).Nonetheless, the variable relates only indirectly, through backflow effects, to tributary river floods, which caused a large part of the impacts in 2020.Moreover, an increased frequency of high L/ t events can be caused by increased interannual variability in seasonal precipitation, which, if not preceded by already high lake levels, would not necessarily represent a high-impact flooding event.Further, lake levels preceding the L/ t event would be influenced by evaporation rates, particularly during dry seasons, which do not vary in our study but might change under climate change.Furthermore, as discussed in Sect.B3, the daily variable does not fully meet the theoretical assumptions of extreme value theory, since it is not independent and identically distributed.Moreover, while some annual blocks extracted from the observations were found to be overlapping, our results were found to be robust, and we find a similar attribution signal when the overlapping blocks are excluded from the analysis (Sect.3.5).Finally, while we cannot readily assume that our annual block maxima time series is in the asymptotic tail of the distribution of maxima, similar objections can be raised to a number of extreme event attribution studies that study slow-onset extremes (e.g.Philip et al., 2018b;Kew et al., 2021), and while these limitations are recognized they do not impede us from providing useful information on these events (see discussions in, e.g.Philip et al., 2020;van Oldenborgh et al., 2021).
Possible sources of non-stationarity not linked to anthropogenic warming must be considered.Decadal variability linked to atmospheric dynamics and modes of climate variability such as the IOD can introduce a non-stationarity that might be unforced and not linked to anthropogenic warming and that can therefore act as a confounding factor in our analysis (Shepherd, 2014(Shepherd, , 2016;;Philip et al., 2020).Moreover, other factors such as land use changes and dam management can introduce non-stationarity in observations that is not linked to anthropogenic climate forcings.Finally, the different resolution of data before and after 1948 could also introduce non-stationarity, although our attribution results were found to be robust to an artificial reduction in the temporal resolution of the data (see Sect. 3.5).
Strong dynamically induced variability can introduce uncertainty in frequentist probabilistic extreme event attribution statements (Shepherd, 2016(Shepherd, , 2021;;Faranda et al., 2020).Probabilistic attribution statements are recognized to be strongest when the greatest source of non-stationarity is thermodynamical and when previous knowledge on the physical processes linking the observed change to anthropogenic forcings are high, as is the case, for instance, in relation to short-duration temperature and precipitation extremes (Otto, 2017(Otto, , 2020)).Further, the shift fit method assumes a linear relationship between anthropogenic forcings (often represented by global surface warming) and the response in the modelled distribution of the variable.More complex interactions are likely in our variable, as seasonal precipitation amounts in eastern Africa are mediated by sea surface temperatures in the Indian Ocean and circulation dynamics (Cai et al., 2018;Wainwright et al., 2019).Decadal variability in precipitation amounts is extensively documented in the region and linked to various factors including ENSO and the IOD (Wainwright et al., 2019(Wainwright et al., , 2021a, b;, b;Cai et al., 2018;Marthews et al., 2019;Nicholson, 2014Nicholson, , 2015Nicholson, , 2017Nicholson, , 2018;;Rowell et al., 2015;Ummenhofer et al., 2009;Conway et al., 2005;Dunning et al., 2016).The anomalous precipitation in eastern Africa in 2019 was linked to a persistent extreme positive IOD in the same year (Wainwright et al., 2021a;Khaki and Awange, 2021), which was the strongest on record since 1950 (Nicholson et al., 2022).Previous positive IOD conditions were likely linked to the heavy 1961 and 1998 precipitation seasons in the basin (Wainwright et al., 2021a;Nicholson et al., 2022), which emerged as very rare events in our attribution study as well.The statistical methods applied in this study neglect such sources of decadal variability by assuming anthropogenic climate change is the only source of nonstationarity.According to Philip et al. (2020) decadal variability can be a problem for probabilistic attribution when the variability is larger than the signal of anthropogenic climate change.One possible solution would be to condition the return period estimates on the IOD Dipole Mode Index value observed in 2020 by including it as an additional covariate in the shift fit method, as recently done in Kimutai et al. (2023).Conditioning the analysis on a dynamical state moves towards the storyline approach to extreme event attribution (Shepherd, 2021(Shepherd, , 2019(Shepherd, , 2016;;Otto, 2017;Otto et al., 2015).Previous studies have regressed out the influence of modes of climate variability (as in Philip et al., 2018b, to account for the influence of ENSO on precipitation in Ethiopia), but Cai et al. (2014Cai et al. ( , 2018) ) suggest that an increase in frequency and intensity of the positive IOD is projected with climate change in the region, meaning that regressing out its influence could remove a pathway of influence of anthropogenic climate change on the regional climate via a dynamical mediator.Nonetheless, there is currently no consensus on the detection and attribution to anthropogenic forcings of an observed increasing trend in the IOD (Gulev et al., 2021), so it is likely premature to assume we are already observing a climate change signal in a positive observed IOD trend.
Additional scientific challenges are recognized in relation to attributing extreme events and their impacts in the Global South, linked to the limited availability of reliable long-term observational and impact data, sometimes flawed representation of climate processes in models, and high natural variability of some of the variables being attributed, making it harder for a trend to emerge as signal from the noise (Otto et al., 2020a, b).For instance, despite a projected increase in average annual precipitation amounts over eastern Africa in most global and regional climate models participating in the Coupled Model Intercomparison Project Phases 5 and 6 (CMIP5 and CMIP6; Rowell et al., 2015;Akurut et al., 2014;Dunning et al., 2018) and the Coordinated Regional Climate Downscaling Experiment (CORDEX; Souverijns et al., 2016;Olaka et al., 2019), a drying trend was observed in eastern Africa between the mid 1980s and 2010, leading to what has been termed the "East African Precipitation Paradox" (Rowell et al., 2015;Souverijns et al., 2016;Wainwright et al., 2019;Palmer et al., 2023) and to investigations of whether this is linked to a misrepresentation of processes driving seasonal precipitation variability in coupled GCMs (e.g.Rowell et al., 2015;Seager et al., 2019).Recent studies have shown climate model projections of increasing average precipitation in the region are mostly driven by representations of longer and heavier October, November, and December "short rains" in the future (Dunning et al., 2018;Cook et al., 2020), while the observed drying has been linked to a shorter duration of the March, April, and May "long rains" season, which has partly reversed since 2010 (Wainwright et al., 2019;Palmer et al., 2023).An improvement to the attribution carried out here would be to include simulations from different modelling setups, for instance with prescribed sea surface temperatures or dynamics, to control for some of these biases (Stone et al., 2019;Cook et al., 2020).Finally, the coarse resolution of GCMs does not allow us to fully represent the mesoscale processes that characterize the Lake Victoria basin, which are linked to the interaction of the atmosphere with the region's complex orography and the lake surface (Thiery et al., 2016(Thiery et al., , 2017;;Van de Walle et al., 2020, 2021), meaning that higher-resolution convective-permitting models could be of added value (Van Lipzig et al., 2023).

Conclusions
In 2020, heavy rainfall caused Lake Victoria's shorelines to flood and its tributary rivers to spill over their banks, displacing thousands of people and threatening lives and livelihoods.Media and government reports linked the heavy precipitation and subsequent floods to anthropogenic climate change.In this study, we mapped the impact of the floods and investigated the influence of anthropogenic climate change on the event by combining probabilistic extreme event attribution methods with a water balance model of the lake.
Based on remote sensing analysis, we estimate that between April and July 2020 an area of 640 km 2 close to Lake Victoria flooded, affecting more than 29 000 people.Impacts were caused by lake shoreline and river flooding.For the attribution analysis, we define the 2020 event as the change in lake level over 180 d.In the 180 d leading up to May 2020, Lake Victoria's levels rose by 1.21 m, ranking as the third most extreme event after 1998 and 1962.The event was driven by anomalous lake precipitation and inflow, which contributed to 70 % and 30 % of the anomalous lake level rise, respectively.Outflow was also above average, but was insufficient to balance the increased input into the lake.
Based on observational data, the flood event has an estimated return period of 63 years (CI 27-395 years) in the current climate, and in a pre-industrial climate lake levels would have risen 11 cm (0-23 cm) less than observed.This change in magnitude is, however, associated with a relatively large uncertainty, including the possibility of no forced change. https://doi.org/10.5194/esd-15-225-2024 Earth Syst.Dynam., 15, 225-264, 2024 The WBM forced with observational data reproduces the observed event reasonably well, although it underestimates its magnitude and anticipating its timing.Five out of six climate models reproduce lake level variations well, performing in a similar way to the observational WBM, whereas one model is rejected due to excessive biases.Based on a synthesis of observations and climate model simulations, the observed event is slightly more likely in the current climate than in a preindustrial climate by a factor of 1.8 (CI 0.8-15.8),although uncertainty is relatively large and includes the possibility of no change.Similarly, we estimate that in the absence of anthropogenic climate change a 63-year event would have resulted in lake levels rising 7 cm (CI 0-14 cm) less than observed, which corresponds to approximately 6 % (0 %-13 %) of the total November-May rise in lake levels.
The multi-model attribution carried out showed agreement in the sign and magnitude of change over all GCMs that passed the model evaluation step, which in turn agreed with observations.For a precipitation-related variable, where attribution statements are generally weaker than temperaturerelated variables, this is a conspicuous result.Accordingly, natural variability was found to be a greater determinant of uncertainty than inter-model disagreement.Further, while the ensemble of climate models used is small, the choice of using a bias-adjusted ensemble was confirmed by the good spatial and seasonal pattern the models showed and the relatively realistic lake level simulation results.The best estimates for the probability ratios and magnitude changes estimated by models and observations all point to a slight increase in the likelihood and magnitude of the event linked to anthropogenic forcings.Using a larger model ensemble and different statistical methods could strengthen the attribution statement, but irreducible uncertainty linked to high natural variability in the region must be recognized.
While limitations in the models, data, and statistical methods used must be taken into account when interpreting our results, this study provides a first step towards disentangling natural variability from a forced response due to anthropogenic climate change behind the high-impact 2020 Lake Victoria floods.The eastern Africa region is comparatively under-studied in attribution, and high natural variability means that both droughts and floods are common in the region.However, given the high vulnerability of local communities in the region, it is of key importance to better understand how anthropogenic climate change is affecting extreme events in eastern Africa.
To assess the importance of the missing days in PERSIANN-CDR data we (i) count the number of missing days per year and per month and (ii) compare the yearly accumulation of precipitation in the Lake Victoria basin with estimates coming from two other state-of-the-art observational datasets: Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS; Funk et al., 2015), obtained for the period 1981-2020, and Multi-Source Weighted-Ensemble Precipitation V2.2 (MSWEP; Beck et al., 2019), obtained for the period 1979-2016.All datasets are remapped to our study area using the same procedure.
PERSIANN-CDR data have 419 missing days, which are concentrated in the early decades of the dataset (Appendix Fig. C2a, b, d).The year with the most missing days is 1984, where a third of all days are missing.Nonetheless, even in the early years, PERSIANN-CDR shows similar accumulated yearly precipitation estimates compared to CHIRPS and MSWEP (Appendix Fig. C2c).Moreover, PERSIANN-CDR generally provides higher estimates of precipitation compared to the other datasets.Correcting for the missing days in PERSIANN-CDR by interpolating between missing values or replacing missing days with their climatological average causes the water balance model to drift and overestimate lake levels, suggesting this correction overestimates true precipitation amounts.For this reason we do not correct for the missing days.We run the water balance model for the whole 1983-2020 period, as this gives very similar results to running the model from 1985.We instead analyse the precipitation anomaly starting from 1985 to avoid overestimating the positive temporal trend in yearly precipitation amounts.

B2 Water balance modelling: curve number method
In the water balance model, the inflow term (Q in ) is calculated using the USDA Natural Resources Conservation Service curve number (CN) method to estimate runoff (USDA-SCS, 2004), as in Vanderkelen et al. (2018a).The reader is referred to Vanderkelen (2016) and Vanderkelen et al. (2018a) for more details.The CN method relates accumulated precipitation to runoff based on an empirical relationship.Total runoff is modelled as a function of accumulated rainfall and a parameter referred to as the curve number, which is specific to each land grid cell.The CN parameter is in the range 1 ≤ CN ≤ 100 and is a function of the hydrologic soil type, land use, hydrologic surface condition, and antecedent moisture conditions.The CN method can be summarized by the following equations (Descheemaeker et al., 2008;USDA-SCS, 2004): ) where Q is the total runoff (mm) produced by the total accumulated precipitation P (mm); I a is the initial abstraction parameter (mm), indicating the minimum initial amount of water that must be in the soil before runoff can start; and S is the maximum water retention capacity of the soil (mm), which indicates the maximum amount of water the soil can hold.λ is a dimensionless parameter, commonly set equal to 0.2, which expresses the initial abstraction amount necessary as a fraction of the total maximum soil water retention capacity.As visible in Eq. ( B3), S depends on the CN parameter of the soil.The magnitude of the CN determines the proportion of grid cell precipitation that is converted to runoff.The higher the CN, the greater the proportion of precipitation that turns into runoff (Eq.B1).For open waterbodies the CN is set to 100, meaning that all precipitation becomes runoff.
In Vanderkelen et al. (2018a) a CN was assigned to each grid cell in the study area under standard moisture conditions as a function of land cover and the hydrologic soil group (HSG) of the grid cell.Within the water balance model, the CN then varies temporally based on the antecedent moisture condition (AMC), whereby cumulative 5 d antecedent precipitation over the grid cell is calculated, and if this value is below a dry threshold or above a wet threshold (Table B1), the CN of the grid cell is modified based on Eqs.(B4) and (B5).
Each grid cell therefore has three possible CNs.The CN is decreased under dry conditions (CN I < CN II ), meaning that maximum soil water retention increases and the proportion of precipitation that becomes runoff decreases.Under wet conditions the opposite is true, the CN is increased by applying the AMC (CN III > CN II ), meaning that the proportion of precipitation that becomes runoff increases.To calculate the inflow term (Q in in Eq. 2), the direct runoff calculated in all basin grid cells is summed for each day.This calculation disregards groundwater flow and any river routing, assuming that all direct runoff calculated in each grid cell immediately enters the lake.Regardless of these assumptions, the model was shown to work well for the Lake Victoria catchment in Vanderkelen et al. (2018a).

B3 Event definition: sensitivity analysis
We develop a univariate class-based definition of the 2020 flood event.The choice of definition has, on the one hand, to be representative of the event, for example by maximizing its hydro-meteorological extremity or by being as close as possible to its impacts (Philip et al., 2020), but it should, on the other hand, minimize serial correlation and unforced low-frequency interannual trends that violate the statistical assumptions made in the attribution methodology.
As outlined in Sect.1.1, we focus on the rate of change in lake levels ( L/ t) instead of on absolute lake levels to define the event, reducing the influence of decadal trends.Consequently, an appropriate length of the time window ( t) has to be selected.A smaller window results in a daily time series that shows less serial correlation, while a larger window reduces the independence of subsequent observations and causes interannual trends to become visible in the daily time series.At the same time, the rate of increase in lake levels observed in 2020 is most extreme when the window is large.When the time window considered is 1 year, the 2020 event ranks second after 1962 (Fig. 7a), while the extremity of the event decreases quickly as the window shrinks to 60 or 30 d (Fig. 7a).An intermediate time window is therefore chosen of 180 d, and the annual block maxima of the L/ t time series is extracted.The 2020 event thus defined corresponds to a lake level increase of 1.21 m that occurred in the 180 d leading up to 17 May 2020, and is the third most extreme event since 1897, ranking after 1998 (1.39 m) and 1962 (1.30 m; Fig. 7a, b).Moreover, with this chosen interval, the different temporal resolution of observed lake levels before and after 1948 plays a small role and does not importantly affect return period estimates (see Sect. 3.5).
The 180 d window of each block can extend from the previous year into the calendar year to which the value is assigned.Most of the intervals in the block maxima time series occur in the 6 months between November or December and May or June, capturing lake level increases due to part of the OND short rains season and most of the MAM long rains season (Appendix Fig. C7).There are overlaps only in eight pairs of blocks in the 125-year time series.In these instances, a late time window of the previous year partially overlaps with an early time window of the subsequent year, generally for a 2 to 3 month period.Since this violates the assumption of independence of the annual block maxima time series, we test the sensitivity of our results to these overlapping blocks in Sect.3.5 and find a similar attribution signal regardless of whether overlapping blocks are included or excluded from the analysis.
Finally, we note that the parent daily L/ t variable from which the annual block maxima time series is extracted does not fully meet the theoretical assumptions of extreme value theory (Coles, 2001).These assumptions are that (i) the daily variable L/ t is independent and identically distributed, except for the non-stationarity modelled by the shift fit, and (ii) the size of the sample from which each annual block maximum is drawn is sufficiently large such that the distribution of the maxima approaches its limit distribution (here, a GEV).In relation to the first assumption, the realizations of the variable L/ t cannot be considered independent since (i) the time window upon which each subsequent daily observation is calculated overlaps with the previous observation for 179 out of 180 d, and (ii) heavy precipitation days are likely to be clustered in time.In relation to the second assumption, we cannot readily assume that we are in the asymptotic tail of the distribution of maxima.Although block maxima are often drawn from a year of daily observations (Hammerling et al., 2019;Coles, 2001), in our case we do not have 365 independent daily observations, but we instead have 365 rolling 180 d windows, meaning that only two windows each year are fully non-overlapping.Further, variability in 6-month rates of change in lake levels depends on relatively low-frequency variability in seasonal precipitation.The block maxima usually detect a part of the signal from the OND short rains season and the whole of the MAM long rains season, resulting in a time window that ends around the month of May in most years.This suggests that we are not extracting each block maxima from a very large sample of independent observations but are instead analysing a time series that is closely linked to seasonal precipitation amounts.Nonetheless, similar objections can be made to a number of extreme event attribution studies that study relatively "slow" climate variables, such as multi-month droughts (e.g.Philip et al., 2018b;Kew et al., 2021), and these limitations are recognized by the community but do not impede the possibility of carrying out such studies and providing useful information (Philip et al., 2020;van Oldenborgh et al., 2020).-1900, 1902-1903, 1961-1962, 1982-1983, 1997-1998, 2006-2007, 2011-2012, 2019-2020.The difference in the temporal resolution of lake level measurements before 1948 (monthly) and after this year (daily to 10 daily) is visible as an increase in variability of the blocks.

Figure 1 .
Figure 1.Lake Victoria levels (1896-2022) with high and low peaks labelled.The time series is reconstructed based on monthly in situ measurements from the UK Centre for Ecology and Hydrology (UKCEH, 1896-1948), daily in situ measurements from the WMO Hydrom-eteorological Survey (1948eteorological Survey ( -1992)), and satellite-derived 10-daily measurements from the Database for Hydrological Time Series of Inland Waters (DAHITI) (in m a.s.l.) converted to in situ.

Figure 2 .
Figure 2. (a) Observed average annual precipitation from PERSIANN-CDR for the period 1985-2020.The Lake Victoria basin outline is shown in red.Absolute precipitation anomaly in the years (b) 2019-2020, (c) 2019, and (d) 2020.Relative precipitation anomaly in the years (e) 2019-2020, (f) 2019, and (g) 2020.All anomalies are calculated with respect to the period 1985-2020.

Figure 3 .
Figure 3. (a) Monthly accumulated precipitation over Lake Victoria and its basin for the period 2019-2020, shown relative to the climatology (calculated based on the period 1985-2020).Periods of positive anomaly are shown in blue, and periods of negative anomaly are shown in red.(b) Accumulated precipitation in OND rainy season, JF dry season, and MAM rainy season, with the long-term seasonal average for the period 1985-2020 shown as a dashed grey line.

Figure 4 .
Figure 4. Annual accumulated precipitation anomaly with respect to the period 1985-2020 in the Lake Victoria basin for (a) 1 year, (b) a 2-year rolling window, and (c) a 3-year rolling window.The record-breaking year is marked with an asterisk.

Figure 6 .
Figure 6.Flood-affected populated grid cells (red) in the Nzoia-Yala area (box 1 in Fig. 5), estimated by combining the flooded area between April and June 2020 (blue) retrieved using the HASARD algorithm of LIST (Luxembourg) and population data provided by WorldPop.

Figure 7 .
Figure 7. (a) Rank of the 2020 event in the 1897-2021 time series of annual block maxima of the rate of change in lake levels ( L/ t) based on the size of the time window ( t).Red indicates a higher rank (more extreme), while blue indicates a lower rank (less extreme).The rank of the 2020 event with the chosen event definition ( t = 180 d) is highlighted by the red box.(b) Annual block maxima time series ( L/ t) max with t = 180 d for the period 1897-2021 and 10-year rolling mean of the time series.

Figure 8 .
Figure 8.Comparison of observed lake levels and lake levels modelled with the observational simulation of the WBM.Model bias is shown in grey (note different scales of the axes).

Figure 9 .
Figure 9. Bias in how water balance model represents L/ t for t = 180.(a) Time series of L/ t in observations and WBM.(b) Bias in L/ t, smoothed with a 3 d rolling window.(c) Climatology of L/ t in observations and WBM for overlapping period.(d) Comparing the distribution of the variable L/ t in observations and in the WBM.(e) Joint distribution of variable L/ t in observations and WBM and ordinary least-squares regression line of best fit through the data.

Figure 10 .
Figure 10.(a) Climatology of water balance terms modelled over the period 1983-2020, expressed in lake level equivalent (l.l.e.), with the uncertainty bands spanning 1 standard deviation.Water balance terms in (b) 2019 and (c) 2020 compared to climatology.Evaporation is not shown in (b) and (c) as the annual cycle is fixed by modelling design for all years.

Figure 11 .
Figure 11.GEV shift fit to annual block maxima time series L/ t based on observed lake levels for the period 1897-2020.(a) Linear model of the location parameter µ as a function of the GMST covariate based on the estimated parameters µ 0 and µ 1 .The vertical red lines show the best estimate and 95 % confidence interval of the location parameter values in 1900 (pre-industrial climate) and 2020 (current climate).(b) GEV shift fit in current (red) and pre-industrial (blue) climates, based on the shift in the location parameter, with uncertainty intervals calculated by bootstrapping distribution parameters.The year 2020 is included in the fit and is labelled as a horizontal pink line in (b).

Figure 12 .
Figure 12.Synthesis of (a) PR and (b) change in magnitude estimates from observations and models between a current factual climate and a counterfactual or pre-industrial climate, following the methodology explained in Philip et al. (2020).Coloured bars indicate the 95 % CI, with the best estimate shown as a black line.Uncertainty denotes natural variability and takes model representativity into account but neglects intrinsic epistemic model uncertainty.The red bar is an average of model results, computed through an unweighted synthesis methodology.The purple bar shows the average of observations and models.

Figure C2 .
Figure C2.Assessment of missing days in PERSIANN-CDR data and comparison with other observational data sources: (a) distribution of the missing days in PERSIANN-CDR each year (419 total missing days in the period 1983-2014), (b) total number of missing days each year, (c) total number of missing days grouped per month, and (d) comparison of yearly accumulated precipitation amounts in the Lake Victoria basin in PERSIANN-CDR, CHIRPS and MSWEP V2.2 (CHIRPS and MSWEP have no missing data in this period).

Figure C3 .
Figure C3.Daily outflow time series for the period 1948-2021 compiled from different sources and compared with the Agreed Curve based on observational lake levels.The semi-observational outflow time series is created by extending the composite time series used in Vanderkelen et al. (2018a).For the period 1948-1950, outflow is calculated with the Agreed Curve.From 1950 to 5 March 2006, the outflow time series from Vanderkelen et al. (2018a) is used.In the period 1950-1997 this is based on monthly in situ measurements, for 1998-1999 outflow is calculated with the Agreed Curve, for 2000-2004 outflow is digitized from a graph in Lake Victoria Basin Commission (2006), and from 2004 to 5 March 2006 daily measurements are used.For the period 5 March 2006-17 March 2020 outflow is calculated using the Agreed Curve.From 18 March 2020 to 2021 daily outflow measurements at Jinja are available.

Figure C4 .
Figure C4.Linear regression of (a) yearly accumulated precipitation and (b) seasonal precipitation accumulations in the Lake Victoria basin against time to estimate temporal trend for the period 1985-2020 (values are given in units of mm).

Figure C5 .
Figure C5.Correlation of precipitation in the Lake Victoria basin in the period 1985-2020 with the Indian Ocean Dipole Mode Index.Pearson correlation coefficients are shown for precipitation accumulated during (a) the OND short rains and (b), (c) the whole year, against the Indian Ocean Dipole Mode Index averaged (a), (b) in the months October-December, and (c) across the entire calendar year.

Figure C7 .
Figure C7.The 180 d time windows selected from the L/ t time series by the annual block maxima methodology for each year in the period 1897-2021.These indicate the period in which the largest 180 d rate of change in lake levels occurs in each calendar year.The value is assigned to the year in which the final day in the 180 d period is located (year n), but the period can start in the previous calendar year (year n − 1).Blue bars do not overlap with blocks of other calendar years; purple portions of the bars indicate a partial overlap with the block of the previous or subsequent calendar year.There are overlaps in the years1899-1900, 1902-1903, 1961-1962, 1982-1983, 1997-1998,  2006-2007, 2011-2012, 2019-2020.The difference in the temporal resolution of lake level measurements before 1948 (monthly) and after this year (daily to 10 daily) is visible as an increase in variability of the blocks.

Figure C8 .
Figure C8.Water balance model bias in the period January 2018-December 2021: (a) observed and modelled lake levels, (b) lake level bias (i.e.difference in modelled minus observed curves in a) smoothed with a centred 60 d moving average low-pass filter, and (c) daily rate of change in smoothed lake level bias (i.e.approximation of first derivative of bias curve from b).

Figure C9 .
Figure C9.Rank of the 2020 event defined as a rate of change in lake levels ( L/ t) based on the size of the time window, from the observational WBM simulation for the period 1983-2020.Red indicates a higher rank (more extreme), while blue indicates a lower rank (less extreme).

Figure C10 .
Figure C10.(a) Monthly anomaly in each water balance term from January 2019 to December 2020.(b) The 6-monthly accumulated water balance terms in metres of lake level equivalent and resultant residual in the water balance.(c) Yearly accumulated water balance terms.Accumulated terms are calculated with a rolling window applied to a monthly time series to obtain the cumulative sum over the previous (b) 6 months or (c) year.The residual is the difference between positive terms (precipitation and inflow) and negative terms (evaporation and outflow).

Figure C11 .
Figure C11.Cumulative monthly anomalies in water balance terms from January 2019 to December 2020 (a, cumulative version of Fig. C10a) and resultant cumulative anomaly in the water balance residual term (b).

Figure C12 .
Figure C12.Spatial pattern of mean annual precipitation accumulation in Lake Victoria basin in PERSIANN-CDR (a) and GCM historical and SSP3-RCP7.0simulations (b-g) for the period 1985-2020.

Figure C14 .
Figure C14.Representation of seasonality of daily (a) and monthly precipitation (b) over Lake Victoria and its basin in observations (PERSIANN-CDR) and bias-adjusted GCMs (historical and SSP3-RCP7.0)evaluated in this study for the period 1985-2020.