A non-stationary extreme-value approach for climate projection ensembles: application to snow loads in the French Alps

. Anticipating risks related to climate extremes often relies on the quantiﬁcation of large return levels (values exceeded with small probability) from climate projection ensembles. Current approaches based on multi-model ensembles (MMEs) usually estimate return levels separately for each climate simulation of the MME. In contrast, using MME obtained with different combinations of general circulation model (GCM) and regional climate model (RCM), our approach estimates return levels together from the past observations and all GCM–RCM pairs, considering both historical and future periods. The proposed methodology seeks to provide estimates of projected return levels accounting for the variability of individual GCM–RCM trajectories, with a robust quantiﬁcation of uncertainties. To this aim, we introduce a ﬂexible non-stationary generalized extreme value (GEV) distribution that includes (i) piecewise linear functions to model the changes in the three GEV parameters and (ii) adjustment coefﬁcients for the location and scale parameters to adjust the GEV distributions of the GCM–RCM pairs with respect to the GEV distribution of the past observations. Our application focuses on snow load at 1500 m elevation for the 23 massifs of the French Alps. Annual maxima are available for 20 adjusted GCM–RCM pairs from the EURO-CORDEX experiment under the scenario Representative Concentration Pathway (RCP) 8.5. Our results show with a model-as-truth experiment that at least two linear pieces should be considered for the piecewise linear functions. We also show, with a split-sample experiment, that eight massifs should consider adjustment coefﬁcients. These two experiments help us select the GEV parameterizations for each massif. Finally, using these selected parameterizations, we ﬁnd that the 50-year return level of snow load is projected to decrease in all massifs by − 2 . 9 kN m − 2 ( − 50 %) on average between 1986–2005 and 2080–2099 at 1500 m elevation and RCP8.5. This paper extends the recent idea to constrain climate projection ensembles using past observations to climate extremes.


Introduction
The use of climate model simulations is the main scientific paradigm to anticipate extreme climate events. In particular, multi-model general circulation model (GCM) and regional climate model (RCM) ensembles are widely used to quantify the changes in climate extremes and their uncertainties (IPCC, 2021). GCMs represent key processes of the climate system relevant at the global scale and provide input for RCMs used to downscale and refine the climate projections at the local to regional scale.
Climate extremes are often assessed within the statistical framework of extreme value theory (EVT) by focusing on either annual maxima or values exceeding a high threshold (Coles, 2001). EVT makes it possible to estimate return levels, i.e., extreme quantiles that occur on average once every T years, where T is the corresponding return period. Return levels play a key role in the design of structures (dams, pro- Table 1. Temporal non-stationary GEV-based approaches for GCM ensembles and GCM-RCM ensembles. An asterisk ( * ) indicates that the ensemble is an initial condition ensemble, i.e., each ensemble member consists of the same GCM-RCM pair with different initializations.  Hanel and Buishand (2011) × Linear Precipitation Kharin et al. (2013) × Linear Temperature and precipitation Brown et al. (2014) Linear Temperature and rainfall Um et al. (2017) × Nonlinear Precipitation Tramblay and Somot (2018) × Linear Precipitation Together Kharin and Zwiers (2004) × * Linear Temperature and precipitation Wang et al. (2004) × * Linear Significant wave height Aalbers et al. (2018) × * Linear Precipitation Fix et al. (2018) × * Linear Precipitation Wehner (2020) × Linear Temperature and precipitation Our approach Piecewise linear Snow load tections, roofs) to withstand the effects of natural hazards (floods, avalanches, wildfires, snow loads); see, e.g., Rao and Hamed (2000), Eckert et al. (2008), Evin et al. (2018), and Le Roux et al. (2020).
Most approaches using EVT to study climate extremes from multi-model ensembles (MMEs) rely on stationary generalized extreme value (GEV) distributions estimated separately on each climate simulation of the MME, i.e., with each ensemble member (Kharin et al., 2007;Beniston et al., 2007). Specifically, for each ensemble member, annual maxima are assumed stationary for two time periods of 20 or 30 years: one in the historical period representing the late 20th century climate and one in the future period. For instance, Fowler et al. (2007) opted for two 30-year time periods: 1961-1990 and 2071-2100, a 30-year time window corresponding to the usual duration that is used to describe the statistical properties of a climate according to World Meteorological Organization (WMO) standards. Next, stationary 20-or 30-year return levels are estimated for each time period with a GEV distribution. Finally, average changes, i.e., differences in return levels between time periods averaged on all ensemble members, are usually reported (Kharin et al., 2013;O'Gorman, 2014). However, such approaches based on stationary GEV distributions have several drawbacks. First, the assumption of stationarity for 20 or 30 consecutive annual maxima can be debatable, and the possibility of a trend within the 20-or 30year time periods is often not checked (Kharin and Zwiers, 2004). In addition, the choice to rely only on 20 or 30 maxima implies that the estimated GEV parameters have large uncertainties due to the small number of values used. In this case, large return levels, e.g., the 50-year (or even larger) return periods that are usually considered to design structures (see, e.g., Table 1 of Cabrera et al., 2012), can be highly uncertain.
Temporal non-stationary GEV approaches address these limitations by taking into account all the available annual maxima for each ensemble member, i.e., all the historical and future annual maxima are fitted with a single statistical model (Kharin and Zwiers, 2004). Such approaches combine a stationary random component (a fixed extreme value distribution) with non-stationary deterministic functions that map each temporal covariate (such as the years or the global mean temperatures) to the changing parameters of the distribution. Another advantage of temporal non-stationary approaches is that they allow return levels to be estimated conditionally on each temporal covariate (Kharin et al., 2013).
A majority of temporal non-stationary approaches for MMEs rely on the GEV distributions estimated separately with each climate simulation of the MME (Table 1), with some exceptions (Caires et al., 2006;Kyselý et al., 2010;Roth et al., 2014;Winter et al., 2017), and report return levels (conditionally on a given covariate) averaged on all ensemble members. We believe that such approaches are sub-optimal because they estimate one non-stationary GEV distribution with each climate simulation of the MME, i.e., with fewer than roughly 200 maxima values, which often implies a simple parameterization (linear) for the non-stationary functions ( Table 1).
The present study introduces an alternative approach that relies on a temporal non-stationary GEV model fitted to all ensemble members. This approach enables us to quantify uncertainties using standard tools from non-stationary extreme value analysis. Such an approach has mainly been proposed for initial condition ensembles (Table 1), i.e., ensemble members that consist of replicates from the same GCM-RCM pair (or the same GCM for GCM ensembles) simulated with different initial conditions. For initial condition ensembles, this alternative approach estimates a single non-stationary distri-bution on all ensemble members by assuming that they are independent and identically distributed (iid). However, this alternative approach is inadequate for GCM-RCM ensembles with several GCMs because the iid assumption, i.e., that all GCM-RCM pairs follow the same non-stationary distribution, is unlikely to hold in all the cases. Our study fills this gap with a novel non-stationary extreme value approach inspired by the recent trend of statistical methods that constrain climate projections using past observations (Brunner et al., 2020). We propose to fit a non-stationary GEV distribution to past observations and all GCM-RCM pairs without necessarily assuming that all GCM-RCM pairs follow the same distribution. To this end, we introduce parameters (so-called adjustment coefficients) for the location and scale parameters of the GEV distribution that can account for systematic differences between the different climate trajectories. Different parameterizations of these adjustment coefficients are tested in order to describe the variability between the climate trajectories, the best parameterization being selected using splitsample tests on past observations. Besides, non-stationary GEV-based approaches for climate projection ensembles usually consider linear functions for the non-stationary functions, with the exception of the study by Um et al. (2017) that applies nonlinear functions (Table 1). In this study, we extend these approaches by considering piecewise linear functions for the non-stationary functions.
We illustrate the proposed methodology with an application to snow load data, which corresponds to the pressure exerted by accumulated snow on the ground (proportional to the snow water equivalent). The probabilistic assessment of ground snow load is of major interest for the structural design of buildings (Croce et al., 2018), for water resource management (Marty et al., 2017), or for the prevention of large-scale environmental or infrastructure damage caused by snow storms (e.g., damage to forests, transportation networks and electricity networks). Since annual means of snow loads are expected to decrease under future climate change, it could be expected that extreme snow load would also decrease. However, in cold regions (high-elevation regions for instance) extreme snowfall is expected to increase with climate change (O'Gorman, 2014), and this increase in extreme snowfall can possibly lead to an increase in extreme snow load. This application verifies whether snow load extremes are expected to decrease in the French Alps, a quantification of these decreases being of prime interest to the study of compound extremes, e.g., extreme snow load combined with extreme wind, or to adapt structures standards, e.g., to decrease the constraints used to design new structures (Le Roux et al., 2020).

Data
Our application focuses on snow loads at 1500 m elevation in the 23 massifs of the French Alps, i.e., between Lake Geneva to the north and the Mediterranean Sea to the south (Fig. 1). This region, home to the largest ski resorts in the world, is prone to snow-related hazards, such as avalanches (Favier et al., 2016;Dkengne Sielenou et al., 2021), which are heavily impacted by ongoing warming Castebrunet et al., 2014). This study estimates potential changes in snow load (i.e., the pressure exerted by the snow) hazard for a high-emission scenario (RCP8.5) as a case study, although it could also be applied to other scenarios and variables. Snow load can be defined as the gravitational acceleration (g = 9.81 m s −2 ) times the snow water equivalent (in kg m −2 ) and is often expressed in units of kN m − 2. Snow water equivalent corresponds to the mass of snow per unit surface area, which also corresponds to the observed depth of accumulated snow (in m) multiplied by the snow density (in kg m −3 ). Following the block maxima approach to estimate the hazard of snow load (Sect. 3.1), we compute annual maxima of daily snow load at 1500 m centered on the winter season, e.g., an annual maximum for the year 1959 is the maximum from 1 August 1958 to 31 July 1959 (Fig. 1).
The S2M reanalysis (Durand et al., 2009;Vernay et al., 2019Vernay et al., , 2022 combines large scale reanalyses and forecasts with in situ meteorological observations to provide daily values of snow load from 1959 to 2019. The S2M reanalysis has been evaluated with both in situ temperature and precipitation observations (Durand et al., 2009) and various snow depth observations Quéno et al., 2016;Revuelto et al., 2018;Vionnet et al., 2019;Vernay et al., 2022). The S2M reanalysis focuses on the elevation dependency of meteorological conditions. Indeed, this reanalysis is not produced on a regular grid but provides data for each massif every 300 m of elevation between 600 and 3600 m.
ADAMONT (Verfaillie et al., 2017) is a bias-correction and downscaling method that aims to adjust daily climate projections from a regional climate model against a regional reanalysis of hourly meteorological conditions using quantile mapping. This method was used to adjust the EURO-CORDEX dataset (Jacob et al., 2014) against the S2M reanalysis to provide daily values of snow load that spans historical  and future (2006-2100) time periods. Specifically, the EURO-CORDEX dataset consists of RCMs forced over Europe by GCMs from the CMIP5 ensemble (Taylor et al., 2012) for the historical scenario and several representative concentration pathway (RCP) scenarios (Moss et al., 2010). We focus on the RCP8.5 emission scenario and consider a total of 20 GCM-RCM pairs, with 6 GCMs and 11 RCMs total (see Supplement, Table S1). Finally, adjusted EURO-CORDEX meteorological data are used as input to the snow cover model Crocus (Vionnet et al., 2012) every 300 m of elevation for each massif. This provides estimates of the time evolution of the snow cover (Verfaillie et al., 2018), enabling us to compute the maximum annual value of snow load at 1500 m elevation. For simplicity, we often refer to the S2M reanalysis as our observation reference. We note that we discard the two most southern massifs because many projected annual maxima are equal to zero.
The anomaly of global mean surface temperature (GMST) with respect to the pre-industrial period (1850-1900) is chosen as the temporal covariate for our statistical methodology. In practice, we smooth this anomaly with cubic splines to obtain a covariate that does not depend on the internal variability of GMST (Fig. 2). For each GCM-RCM pair we rely on the GMST corresponding GCM as a covariate, while we rely on GMST from HadCRUT5 (Morice et al., 2021) as a covariate for the observations. For simplicity, we refer to +1 • C of smoothed anomaly of GMST as +1 • of global warming.

Generalized extreme value distribution
Following the block maxima approach of extreme value theory (Coles, 2001), we model annual maxima with the GEV distribution. Indeed, similarly to the central limit theorem that motivates to model means obtained from different samples using a normal distribution, the Fisher-Tippett- Figure 2. Raw output (dotted lines) and smoothed output (plain lines) for the anomaly of global mean annual temperature with respect to industrial levels . For the six GCMs, we show the anomaly of global mean surface temperature using historical emissions until 2005 and projected emissions (RCP8.5). Years correspond to periods centered on each winter (August-July).
Gnedenko theorem (Fisher and Tippett, 1928;Gnedenko, 1943) encourages to model maxima using the GEV distribution. In practice, if Y represents an annual maximum, a natural choice to model the distribution of Y is Y ∼ GEV(µ, σ, ξ ), which implies that where the three parameters are the location µ, the scale σ > 0 and the shape ξ . Three subfamilies of distribution (reversed Weibull, Gumbel and Fréchet) can be derived depending on the sign of the shape parameter (ξ < 0, ξ = 0 and ξ > 0), respectively.
In theory, the GEV distribution is adequate when maxima are computed over blocks of infinite size. In practice, the GEV distribution is usually applied to annual maximal values and has been shown to provide reliable estimates of return levels in many hydrometeorological applications (Coles, 2001;Katz et al., 2002;Cooley, 2012;Papalexiou and Koutsoyiannis, 2013). The T -year return level, which is defined as a daily value y p exceeded each year with probability p = 1 T , corresponds to the 1 − p quantile of the GEV distribution In this study, we set p = 1 50 as it corresponds to the 50-year return period that is widely used for the designed working life of buildings (Cabrera et al., 2012), specifically for the building standards regarding snow load (Croce et al., 2019).

Non-stationary distribution
Let Y obs t denote an observed annual maximum for the year t between 1959 and 2019 and T obs t represent the smoothed anomaly of global mean surface temperature (GMST) from HadCRUT5 for the same year t (Sect. 2). We rely on a nonstationary distribution where each GEV parameter is a piecewise linear function of T , which is the smoothed anomaly of GMST. A log link function for the scale parameter is introduced to ease the numerical optimization.
where θ is the vector of parameters {µ i , σ i , ξ i , i = 0, . . ., L} for the piecewise-linear functions µ(.), σ (.), ξ (.), 1 ≤ L ≤ 4 corresponds to the number of linear pieces, , and T min and T max are the minimum and maximum smoothed anomaly of GMST for the period 1951-2100 (Fig. 2). In other words, κ 2 , . . . , κ L are fixed and equally spaced between T min and T max and correspond to the L − 1 anomalies of smoothed GMST where the line breaks, i.e., where the slope of the piecewise linear functions changes. Table 2. The five parameterizations of the adjustment coefficients µ k andσ k considered for the location and scale parameters, respectively. When there is only one coefficient for all GCM-RCM pairs k,μ k =μ andσ k =σ for any pair k. When there is one coefficient per GCM (per RCM),μ k =μ g andσ k =σ g (μ k =μ r and σ k =σ r ), where g and r are subscripts referring to the GCM and RCM of the pair k, respectively. For a GCM-RCM pair k between 1 and 20, let Y k t represent an annual maximum for the year t between 1951 and 2100 and T k t represent the smoothed anomaly of GMST (Sect. 2) for the corresponding GCM.

Y k
where denotes the set of parameters θ and of additional parametersμ k andσ k and where ξ (.) is given in Eq.
(2). The parametersμ k andσ k correspond to different adjustment coefficients defined in Table 2. For the K = 20 GCM-RCM pairs, these adjustment coefficients are considered for the location and scale parameters and aim at adjusting the distribution of GCM-RCM pairs. Following Brown et al. (2014), we assume that these adjustment coefficients are constant, i.e., the same for historical and future climates. We consider five parameterizations: zero adjustment coefficients, one adjustment coefficient for all GCM-RCM pairs, one adjustment coefficient for each GCM, one adjustment coefficient for each RCM and one adjustment coefficient for each GCM-RCM pair. Figure 3 illustrates this concept for a fictive ensemble composed of four different GCM-RCM pairs with two different GCMs and two different RCMs. The right column shows how these adjustment coefficients improve the agreement between the different climate simulations with respect to the observations by removing these first-order discrepancies. Obviously, the parameterization with one additional adjustment coefficient for each GCM-RCM pair leads to a better agreement, but this is at the cost of a much higher number of parameters.
Finally, the size of the entire vector of parameters is 3 + 3 × L + S, corresponding to three parameters for the intercepts (µ 0 , σ 0 , ξ 0 ), 3×L parameters for the piecewise linear functions describing the temporal evolution of the three GEV parameters (see Eq. 2) and S parameters corresponding to the adjustment coefficients (see Table 2).
We did not consider adjustment coefficients on the shape parameter because it sometimes leads to prediction failures. This situation can happen when ξ (T ) < 0, which means that the predictive distribution has an upper bound, and when some future annual maxima lies above this upper bound.

Maximum likelihood estimation
For each massif, a temporal non-stationary GEV distribution, parameterized by a vector of coefficients , is estimated using the past observations and all GCM-RCM pairs. Let y = (y obs 1959 , . . ., y obs 2019 , y 1 1951 , . . ., y 1 2100 , . . ., y 20 1951 , . . ., y 20 2100 ) represent a vector with all annual maxima of a given mas-sif, i.e., annual maxima from the observations and from the K = 20 GCM-RCM pairs (Sect. 2). The maximum likelihood method provides the most likely parameters with respect to y. The maximum likelihood estimator is obtained from the past observations and all GCM-RCM pairs by maximizing the following likelihood p(y| ): (4)

Evaluation experiments
Our first evaluation experiment is a model-as-truth experiment, i.e., a perfect model experiment, which evaluates long-term predictive performances using future projections (Abramowitz et al., 2019). The observations from the S2M reanalysis (Sect. 2) are discarded for this experiment. Instead, a simulation from a GCM-RCM pair is chosen as pseudo-observations. The calibration set contains the "historical" data (1959-2019) of the GCM-RCM pair chosen as pseudo-observations and the 19 remaining GCM-RCM pairs . The predictive performance is evaluated on an evaluation set that contains the future data (2020-2100) of the GCM-RCM pair chosen as pseudo-observations. In detail, each GCM-RCM pair is successively regarded as being pseudo-observations. Thus, a model-as-truth experiment can be roughly regarded as a leave-one-out cross-validation with respect to GCM-RCM pairs. We note that for GCM-RCM pairs with the GCM HadGEM2-ES starts in 1982, while the pairs with the RCM RCA4 starts in 1971. Therefore, we successively regard the 12 GCM-RCM pairs (out of 20) that start before 1959 as pseudo-observations, i.e., those that have annual maxima for the period 1959-2019.
Our second evaluation experiment is a split-sample experiment, i.e., a calibration-validation experiment, which enables us to estimate the short-term predictive performance of each parameterization. Specifically, for the calibration of the non-stationary GEV distribution, we rely on the oldest observations from the S2M reanalysis (Sect. 2) and all the GCM-RCM pairs. We validate the predictive performance on the most recent observations. For instance, if we choose to keep 80 % of the observations for the calibration , then the remaining 20 % of the observations are heldout for the evaluation (2008-2019).
In these two evaluation experiments for GCM-RCM ensembles, we calculate the mean logarithmic score (LS) on the evaluation set (the lower the better) to assess the out-of-sample skill of a non-stationary distribution parameterized with the parameter setˆ obtained with the calibration set. The logarithmic score is a proper score that can be used to evaluate the predictive performance of the fitted model (Gneiting et al., 2007). For N out-of-sample observations (or pseudo-observations for the model-as-truth experiment) y obs year 1 , . . ., y obs year N , we

Workflow
First, for a set of past and projected annual maxima, we select one parameterization of the GEV distribution (number of linear pieces, parameterization of the adjustment coefficients) using a two-step selection method: (i) we select the number of linear pieces with a model-as-truth experiment using zero adjustment coefficients for the GEV parameters, and (ii) we then select the parameterization of the adjustment coefficients with three split-sample experiments where the calibration set is composed of 60 %, 70 % and 80 % of the observations, using the number of linear pieces selected in the model-as-truth experiment. Following this, we study trends in the 50-year return level of snow load. For each massif we rely on the parameterization of the GEV distribution selected using the two-step selection method. We report RL50, the 50year return level that corresponds to Eq. (2), i.e., to the 50year return level of the observations and their adjusted projections with respect to GCM-RCM pairs. In other words, if the selected parameterization has adjustment coefficients, we do not add these coefficients to compute RL50, since adding these coefficients would provide the 50-year return level of the GCM-RCM pairs. The 90 % confidence interval is computed using a semi-parametric bootstrap resampling method adapted to non-stationary extreme value distributions (Appendix A). For every anomaly of global mean temperature T , we have that the 50-year return level RL50(T ) is

Selection of one parameterization of the GEV distribution for each massif
In Fig. 4a, for each massif we illustrate the selected parameterization of the GEV distribution (number of linear pieces, parameterization of the adjustment coefficients). Among the 23 massifs, Fig. 4b and c highlights the preferred parameterizations after the application of the two-step selection method.
In the first step, we select the number of linear pieces that minimizes the mean logarithmic score of a model-as-truth experiment using zero adjustment coefficients for the GEV parameters. The mean logarithmic score is averaged on the hold-out pseudo-observations (2020-2100) for each of the 12 GCM-RCM pairs (which are set as pseudo-observations; see Sect. 4.1). We find that the parameterization with three linear pieces minimizes the mean logarithmic score for 80 % of the massifs, see Fig. 4b. The parameterization with two linear pieces is selected for one massif, and the one with four linear pieces is selected for two massifs. Thus, at least two linear pieces are selected for the piecewise linear functions. In the second step, we select the parameterization of the adjustment coefficients ( Table 2) that minimizes the mean logarithmic score for a split-sample experiment using the number of linear pieces selected in the model-as-truth experiment. The mean logarithmic score is averaged on the evaluation observations for three split-sample experiments, where the calibration set corresponds to 60 %, 70 % and 80 % of the observations. Indeed, we observe that the split-sample experiment is quite sensitive to the size of the calibration set. Thus, we choose to average the mean logarithmic score on three split-sample experiments to obtain more robust results. We find that the parameterization with zero adjustment coefficients minimizes the mean logarithmic score for two-thirds of the massifs; see Fig. 4c. Otherwise, the parameterization with one adjustment coefficient for all GCM-RCM pairs is selected for two massifs, the parameterization with one adjustment coefficient for each GCM is selected for two massifs, the parameterization with one adjustment coefficient for each RCM is selected for one massif and the parameterization with one adjustment coefficient for each GCM-RCM pair is selected for three massifs. Thus, for two-thirds of the massifs, adjustment coefficients do not lead to a better predictive performance on the validation periods. This is presumably due to the fact that GCM-RCM pairs have already been statistically adjusted.
A detailed analysis of the mean logarithmic scores of each parameterization for each massif is provided the Supplement.

Trends in the 50-year return level of snow load
In this section, we rely on the parameterization of the GEV distribution selected in Sect. 4.1 for each massif. In Figure 5, we illustrate changes in the 50-year return level between +1 and +4 • C of global warming for four massifs where the selected parameterization is composed of three linear pieces with one adjustment coefficient for all GCM-RCM pairs (Fig. 5a), one coefficient for each GCM (Fig. 5b), one coefficient for each RCM (Fig. 5c) or one coefficient for each GCM-RCM pair (Fig. 5d). In addition, we also perform individual fitting to each GCM-RCM pair, the corresponding return levels being shown with thin gray lines. Note that these estimates are shown for illustrative purposes only, and they The 50-year return levels computed for each GCM-RCM pair (using a non-stationary GEV distribution with the selected number of linear pieces for each GCM-RCM pair) and for the observation (using a non-stationary GEV distribution with one linear piece and a constant shape parameter) are displayed with thin gray lines and thick dark lines, respectively. The probability density functions at +1, +2 and +3 • C exemplify how adjustment coefficients can adjust the distribution.
do not contribute to the final return level estimates indicated with warm colors.
All 50-year return levels (for the non-stationary GEV distribution fitted on the observations, on each GCM-RCM pair, and on the observations and all GCM-RCM pairs) are decreasing with the anomaly of global mean temperature. We observe that RL50 with adjustment coefficients (shown in a warm color) is closer to the 50-year return level of the observation (in dark gray) than RL50 without adjustment coefficients (in cyan). This figure also shows how adjustment coefficients adjust the distribution toward the distribution of the observations by illustrating the probability density functions (with and without adjustment) at +1 • of global warm-ing. Nevertheless, we note that adjusted distributions do not perfectly match the distributions of the observation, which means that the adjusted RL50 does not match the 50-year return levels of the observation. This is probably because we do not consider adjustment coefficients on the shape parameter. For instance, in Fig. 5c, we observe that the shape parameter is negative for the distribution of the observation (because the density has an upper bound), while the shape parameter is positive for the adjusted distribution in orange. We choose to not consider adjustment for the shape parameter because it enables us to constrain predictive distributions on the future period and to avoid prediction failures (Sect. 5.2). Besides, the 90 % confidence interval of RL50 is computed us-E. Le Roux et al.: A non-stationary extreme value approach for climate projection ensembles ing a semi-parametric bootstrap resampling method adapted to non-stationary extreme distributions (Appendix A). We note that confidence intervals are widening at the nodes of the piecewise linear functions, i.e., at the anomaly of global temperature where the slope of the GEV parameters changes (κ i in Eq. 2). This is presumably due to the fact that the variability of the three GEV parameters is more important at the nodes than between them. Figure 6 illustrates RL50 for the 23 massifs of the French Alps at 1500 m elevation for +1, +2, +3 and +4 • C of global warming, i.e., of smoothed anomaly of global mean surface temperature. The return levels are larger in the northwest of the French Alps, and this pattern persists with global warming. Over the whole area of the French Alps, the average RL50 equals 5.7 kN m −2 at +1 • C of global warming and 3.3 kN m −2 at +4 • C. Figure 7 details the relative change of RL50 for +2, +3 and +4 • C of global warming at 1500 m elevation with respect to +1 • C, which corresponds roughly to the cur-rent level of global warming above industrial levels (see Fig. 2). Over the French Alps, the average change of RL50 is equal to −0.6 kN m −2 (−10 %), −1.5 kN m −2 (−27 %) and −2.5 kN m −2 (−43 %) for +2, +3 and +4 • C of global warming, respectively. These relative changes are different for other elevations, a smaller relative decrease being obtained at 2100 m of elevation and a larger relative decrease at 900 m of elevation (see Supplement B). This result is consistent with the literature (Fig. 2.3 of Hock et al., 2022). At 1500 m, the relative decrease is lower in the central eastern side of the French Alps. For instance, for +4 • C of global warming, the relative decrease roughly ranges between −33 % and −38 % in the central eastern side, while it ranges between −40 % and −54 % in the rest of the French Alps.
For each massif, it is also possible to compute the average 50-year return level for several time periods : 1986-2005, 2031-2050 and 2080-2099. For instance, for the time period 1986-2005, the average return level equals the average of the return level found for the years 1986, 1987, . . . , 2005. In order to compute the return level of a given year, e.g., 1986, we rely on the relationship between the anomaly of global mean surface temperature (GMST) and the years (Fig. 2). Specifically, we rely on the anomaly of GMST averaged on the six GCMs to compute this relationship. Following this method, we find that on average the 50-year return level is projected to decrease by −0.8 kN m −2 (−14 %) between 1986-2005 and 2031-2050 and by −2.9 kN m −2 (−50 %) between 1986-2005 and 2080-2099 under the scenario RCP8.5. Note that this method could also provide the rate of change of other RCPs for various lead times using their corresponding global temperature values.

Comparison of our results with the projected trends at the scale of the European Alps
In  (1986-2005, 2031-2050, 2080-2099) can be obtained as explained in Sect. 4.2. Likewise, the mean annual maxima can be expressed with a similar methodology as the expectation of the non-stationary GEV distribution averaged for each year of the time periods. We find a decrease in mean annual maxima of snow load of −30 % and −69 % for the future periods 2031-2050 and 2080-2099, respectively, compared to the reference period 1986-2005.  (Table 3). We observe that our mean annual maxima of snow load has a decreasing rate comparable to the decreasing rate of the mean value of snow load. These comparable rates may stem from the fact that (i) both approaches rely (directly or indirectly) on the EURO-CORDEX data or that (ii) the annual maxima of snow load results from an accumulation during the winter (December to May), which implies that we can expect that the mean value will roughly decrease at the same rate as the mean annual maxima.

Methodological choices, assumptions and limitations
For the non-stationarity of the GEV parameters, we choose piecewise linear functions because they can approximate more complex functions with only a few parameters. This makes our methodology widely applicable. One limitation is that the nodes of the piecewise linear functions are fixed. However, we are confident that these functions are well estimated, owing to the large number of maxima: each of the K = 20 GCM-RCM pairs provides more than 100 maxima. Otherwise, we rely on the anomaly of global mean temperature as a covariate (Sect. 2), as do the majority of references cited in Table 1. Indeed, this anomaly is often thought as a good proxy to measure the level of climate change (Fix et al., 2018), which helps strengthen the global response to Table 3. Projected trends in snow water equivalent (SWE) and snow load under the scenario RCP8.5 using the EURO-CORDEX experiment.
In the first four rows of the table, we specify that the result is approximated because the trend was read from the Fig. 2 , 2018). We choose to focus on the scenario RCP8.5 to have the broadest spectrum of potential changes for the 50-year return level of snow load. Also, to obtain Eq. (4) we assume that all annual maxima are conditionally independent given the vector of parameters , which is a classical hypothesis. Following the principle of parsimony, we assume that the adjustment coefficients are constant, i.e., the same for historical and future climates. Besides, as mentioned in Sect. 3.2, we did not consider adjustment coefficients for the shape parameter because it sometimes leads to prediction failure, i.e., the predictive distribution gives a null probability to some future annual maxima. This can happen when ξ (T ) < 0, which means that the predictive distribution has an upper bound, and when some future annual maxima lies above this upper bound. This illustrates the trade-off between (i) matching the distribution on the past period with respect to the available observations and (ii) having assumptions that help to constrain the predictive distribution on the future period.
For the two-step selection method, we first rely on a model-as-truth experiment to select the number of linear pieces. It assesses the optimal number of linear pieces to predict annual maxima of the pseudo-observations for the evaluation set (2020-2100), i.e., to find a good trade-off between underfitting and overfitting for the calibration set. In this first step, adjustment coefficients are not considered, such that this experiment does not depend on a specific parameterization.
Following this, the best parameterization of the adjustment coefficients is selected with a split-sample experiment. It assesses whether applying adjustment coefficients helps to predict observations of the evaluation set, i.e., whether it is reasonable to assume that the observations do not follow the same distribution as the GCM-RCM pairs. The evaluation score is average for three split-sample experiments where the evaluation set corresponds to the last 40 %, 30 % and 20 % of the observations (Sect. 4.1). Thus, evaluation sets of the three split-sample experiments contain 24, 17 12 annual maxima, respectively, which is a limited amount of information to use for the selection of the best parameterization of the adjustment coefficients. As shown in Fig. 4, it tends to favor the most parsimonious parameterization with no adjustment coefficient. Observed maxima have also a limited effect on the estimated parameters because one observed maxima has the same weight as a maxima from a climate model in the likelihood (Eq. 4). As models are fitted to 61 observed maxima and 20 × 150 maxima from the different climate models, the selected non-stationary models mostly represent the distribution of the maxima simulated by the climate models. However, it can also be argued that 61 years of past observations has a limited predictive power for long-term horizons where the different trajectories shown by the climate projections can possibly show a great variety of evolution after the observed period. As a comparison, the recent study by Ribes et al. (2021) relies on 170 years of past observations to constrain future climate projections at the global scale. In addition, the proposed methodology has been shown to perform well on temperature projections, but the application to other surface variables (e.g., precipitation, snowfall) needs to be demonstrated.
The 90 % confidence intervals of return levels (Fig. 5) account for both the sampling uncertainty (Appendix A) and the climate model uncertainty (distributions are fitted together from the past observations and all GCM-RCM pairs). In contrast, approaches that estimate return levels separately for each ensemble member usually do not account for the sampling uncertainty, i.e., the sampling uncertainty of return levels estimated on each ensemble, even if this uncertainty can be large because return levels are estimated with only one ensemble member. One limitation of our approach is that, contrary to the climatological expectations, the width of confidence intervals does not increase with global warming (Fig. 5). This is presumably a consequence of assuming constant adjustment coefficients.
The goodness of fit of the selected models has been tested with the application of the Anderson-Darling test (see Appendix A) to each GCM-RCM pair separately and the observations for each massif. The test is rejected for 20% of the 483 cases at a significance level of 5 % (23 massifs × 21 time series). This relatively high number seems to be mainly explained by the small values reached at the end of the century for many GCM-RCM runs. Indeed, the same tests applied at an elevation of 2700 m show a much smaller percentage of rejections (7 %) and larger p values. The inadequacy of the selected GEV models to represent these small values can be related to the high proportion of zero snow load values at the end of the century. In these cases, as annual maxima represent maxima of a limited number of positive values, the asymptotic nature of the extreme value theory might be not respected. One alternative could be to consider larger block sizes (maxima over several years). However, the smaller sample sizes would lead to more uncertain parameter estimates. The application of extreme value models to bounded variables thus remains a substantial challenge, especially in a context of climate change where this issue only affects a small part of the dataset.

Related works
First, our methodology based on adjustment coefficients can be seen as an extension of Brown et al. (2014), which estimates non-stationary GEV distribution simultaneously with both observations and a single GCM-RCM pair and introduces constant bias terms for each GEV parameter. There are also some links to a debiasing method proposed for annual maxima from GCM-RCM projections (Fontolan et al., 2019). For the location parameter, we consider additive adjustment coefficients that can be seen as bias terms, while the adjustment coefficients of the scale parameter that are multiplicative (due to the log link function) can be viewed as bias-correction factors (Hosseinzadehtalaei et al., 2021). In this paper, we choose the name "adjustment coefficients" because we introduce them to improve the statistical adjustments. Our idea to add adjustment coefficients for each GCM, RCM or GCM-RCM pair into the non-stationary extreme value distributions (Table 2) comes from the ANOVA framework, which can be applied to partition the uncertainty of GCM-RCM projections by identifying the main effects of GCMs and RCMs or GCM-RCM interactions (Hawkins and Sutton, 2009;Evin et al., 2019Evin et al., , 2021. Thus, our approach based on piecewise linear functions for the non-stationarity of the GEV parameters can be viewed as using linear splines. In the literature, there are many extreme value theory approaches using splines. For instance, linear splines have been applied to model the temporal nonstationarity (Wilcox et al., 2018), while cubic splines are often considered to model spatial extremes (Chavez-Demoulin and Davison, 2005;Gaume et al., 2013).

Conclusions and outlook
Following statistical methods that constrain climate projections using past observations (Brunner et al., 2020;Ribes et al., 2021), we propose a novel approach for GCM-RCM ensembles that aims at fitting a single non-stationary generalized extreme value (GEV) distribution to past observations and to the ensemble of GCM-RCM projections. Specifically, we rely on a non-stationary GEV distribution with (i) piecewise linear functions to model the changes in the three GEV parameters and (ii) adjustment coefficients for the location and scale parameters in order to consider the GEV model as adequate for the climate projections and the past observations up to systematic shifts for these two GEV parameters. This wide set of GEV models aims to provide a more flexible framework. In particular, piecewise linear functions can represent many possible future changes of the GEV parameters and include linear trends as special cases.
In order to select the best parameterization of the nonstationary GEV model (number of linear pieces, parameterization of the adjustment coefficients), we design a two-step selection procedure based on two evaluation experiments for GCM-RCM ensembles: a model-as-truth experiment and a split-sample experiment. The model-as-truth experiment is first applied to select the number of nodes that are required to adequately represent the evolution of the GEV parameters. The split-sample experiment evaluates the added value provided by the adjustment coefficients for the different possible parameterizations.
In this article, as a case study the proposed approach is applied to snow load in the French Alps at 1500 m elevation using 20 GCM-RCM pairs that are statistically adjusted from the EURO-CORDEX experiment under the scenario RCP8.5. More generally, the proposed approach could also be applied to other scenarios, climate variables, and climate projection ensembles. In contrast with most applications of non-stationary GEV models in the literature (which consider linear trends), the piecewise linear functions proposed in our approach are well suited to non-monotonic trends.
Many extensions of this work could be considered. First, if adjustment coefficients are not included, our parameterization of the GEV model considers the same non-stationary GEV distribution for the different GCM-RCM pairs. Even in the case where adjustment coefficients are selected, the distributions corresponding to the GCM-RCM pairs are still constrained to have the same changes with global warming because adjustment coefficients are constant. In future work, we could imagine adjustment coefficients that vary with global warming being used to better account for different changes of distributions among the GCM-RCM pairs. A second potential extension of this work could be to improve the parameterization of the GEV distribution by adding weights for each GCM-RCM pair. In our methodology, GCM-RCM pairs are currently considered as equally plausible even though it is known that for each application some of them can have a better agreement with the past observations. Following the intuition of weighting schemes for climate ensemble (Knutti et al., 2017), we could design a parameterization of the GEV distribution that assigns more weight, i.e., more confidence, to climate models that agree more with the observations. Finally, further work is needed to obtain a better agreement between the non-stationary GEV model representing the ensemble of maxima from climate projections and past observed maxima. Indeed, observed maxima are mainly used to identify and correct strong disagreements between the observed and simulated maxima using adjustment coefficients, and the fitted non-stationary GEV model is not really constrained to represent observed maxima. A Bayesian approach representing the predictive distribution of climate projections conditional on historical observations (Robin and Ribes, 2020;Ribes et al., 2021) could be a possible approach to better constrain a GEV model. However, it requires a long series of past observations to identify the relationships between the forced climate responses to greenhouse gases obtained from the climate models and from the observations.

Appendix A: Uncertainty assessment and goodness-of-fit test
We estimate the uncertainties resulting from in-sample variability with a semi-parametric bootstrap resampling method adapted to non-stationary extreme distributions (Efron and Tibshirani, 1993;Kharin and Zwiers, 2004). This method relies on a transformation f GEV→standard Gumbel to the standard Gumbel distribution. Indeed, if Y x ∼ GEV(µ(x), σ (x), ξ (x)), σ (x) ) ∼ Gumbel(0, 1). Let y = (y 1 , . . ., y S ) denote a vector of annual maxima, with S the size of the vector. The transformed variates are computed as m = f GEV→standard Gumbel (y m ), using for µ(x), σ (x), ξ (x). We generate B = 1000 bootstrap samples with a fourstep procedure. First, we compute the vector = ( 1 , . . ., S ). Then, for each bootstrap sample i, from these transformed variates we draw with replacement a sample of size S denoted as˜ } of B GEV parameters that represents the in-sample variability.
In addition to the sampling uncertainty, we also assess the goodness of fit of the fitted GEV models with the Anderson-Darling statistical test (Abidin et al., 2012). If the nonstationary GEV model is adequate, this test assumes that the transformed variates are drawn from a standard Gumbel distribution. Let F emp and F gum denote the cumulative distribution functions of the empirical and standard Gumbel distributions, respectively. The Anderson-Darling test is based on the following distance: where w(x) assigns more weight on the tail of the standard Gumbel distribution (see Abidin et al., 2012, for more details).
Author contributions. ELR, GE and NE designed the research. ELR performed the analysis and drafted the first version of the manuscript. All authors discussed the results and edited the manuscript.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Financial support. This research has been supported by the Institut National de Recherche en Sciences et Technologies pour l'Environnement et l'Agriculture, IRSTEA, which became INRAE in 2020 (PhD grant).
Review statement. This paper was edited by Valerio Lucarini and reviewed by Tamas Bodai, Antonio Speranza, and one anonymous referee.