Long-term variance of heavy precipitation across central Europe using a large ensemble of regional climate model simulations

Widespread flooding events are among the major natural hazards in central Europe. Such events are usually related to intensive, long-lasting precipitation over larger areas. Despite some prominent floods during the last three decades (e.g., 1997, 1999, 2002, and 2013), extreme floods are rare and associated with estimated long return periods of more than 100 years. To assess the associated risks of such extreme events, reliable statistics of precipitation and discharge are required. Comprehensive observations, however, are mainly available for the last 50–60 years or less. This shortcoming can be reduced using stochastic data sets. One possibility towards this aim is to consider climate model data or extended reanalyses. This study presents and discusses a validation of different century-long data sets, decadal hindcasts, and also predictions for the upcoming decade combined to a new large ensemble. Global reanalyses for the 20th century with a horizontal resolution of more than 100 km have been dynamically downscaled with a regional climate model (Consortium for Small-scale Modeling – CLimate Mode; COSMO-CLM) towards a higher resolution of 25 km. The new data sets are first filtered using a dry-day adjustment. Evaluation focuses on intensive widespread precipitation events and related temporal variabilities and trends. The presented ensemble data are within the range of observations for both statistical distributions and time series. The temporal evolution during the past 60 years is captured. The results reveal some longterm variability with phases of increased and decreased precipitation rates. The overall trend varies between the investigation areas but is mostly significant. The predictions for the upcoming decade show ongoing tendencies with increased areal precipitation. The presented regional climate model (RCM) ensemble not only allows for more robust statistics in general, it is also suitable for a better estimation of extreme values.


Introduction
Ongoing climate change affects not only the global scale but also impacts the regional climate. Regarding air temperature, there is a more or less clear trend in the recent past, which reveals a clear anthropogenic signal. However, various climate simulations show distinct differences for precipitation trends, especially for heavy precipitation (e.g., Moberg et al., 2006;Zolina et al., 2008;Toreti et al., 2010). A review of observed variability and trends in extreme climate events states that it is difficult to find significant relations between the greenhouse-gas-enhanced climate change and increases or decreases in extreme precipitation events (Field et al., 2012). This is attributed to their rare occurrence, the general high spatial variability of precipitation, and a lack of long-term high-quality observations.
Magnitude and sign of heavy precipitation trends strongly depend on various factors such as the regarded area or the considered time period (e.g., Easterling et al., 2000). Global tendencies towards more intense precipitation throughout the 20th century between summer and winter seasons also account for precipitation trends. For example, Moberg and Jones (2005) found an increase in winter precipitation across central and western Europe between 1901and 1999, while Pal et al. (2004 found a decrease in summer precipitation for the period 1951-2000. Dittus et al. (2016) found an increasing trend between 1951 and 2005 in extreme total precipitation amounts for Europe in global climate model (GCM) simulations (Coupled Model Intercomparison Project phase 5; CMIP5). Similar trends were found in global reanalyses (e.g., ERA-20C; Poli et al., 2016) but not in observations. In contrast, Primo et al. (2019) found positive trends for two ground-based observational stations in Germany using extreme precipitation indices.
Model resolution is another crucial factor. The use of highresolution regional climate models (RCMs) instead of global data sets revealed a more detailed and orographically related spatial structure of the precipitation fields and trends (e.g., Feldmann et al., 2013). An increase of both areal mean precipitation and extremes in central Europe on the order of 5 %-10 % was found in RCM simulations by Feldmann et al. (2013), which will continue with almost same magnitude for the next decade. Differences in precipitation trends also stem from varying definitions of extreme events such as certain thresholds, percentile-based indices, or return periods (e.g., Maraun et al., 2010). While most of these studies show trends in daily precipitation, just a few deal with subdaily trends. Barbero et al. (2017), for instance, compared trends in subdaily and daily extremes. Although significant increasing trends were found for both time ranges, trends in daily extremes are better detected than in subdaily extremes.
Spatially extended intensive rainfall events are frequently related to widespread flooding along the main river networks of central Europe causing major damage on the order of several billion euros (EUR) per event (e.g., Uhlemann et al., 2010;Kienzler et al., 2015;Schröter et al., 2015;MunichRe, 2017). A prominent example of such an extreme and devastating event is the flood in 2012 along the Elbe and Danube rivers (Ulbrich et al., 2003a, b). Such outstanding events are by definition extremely rare, which makes the risk estimation difficult or almost impossible due to the limited time period with available area-wide observations (e.g., Pauling and Paeth, 2007;Hirabayashi et al., 2013). However, trend analyses of such extreme events and the related risks during the past and for the future are of great importance for insurance purposes or flood protection (e.g., Merz et al., 2014;Schröter et al., 2015;Ehmele and Kunz, 2019).
A possible way of dealing with the unsatisfactory data availability is through century-long simulations using climate models (e.g., Stucki et al., 2016) or stochastic approaches (e.g., Peleg et al., 2017;Singer et al., 2018;Ehmele and Kunz, 2019). The currently used GCMs were found to be in good agreement with the available but limited observations (Fischer and Knutti, 2016). Brönnimann et al. (2013) and Brönnimann (2017) analyzed historical extreme events using century-long reanalysis data sets and concluded that the quality of the reanalyses strongly depends on the number and type of the assimilated observations. The investigated historical events were reproduced, but the magnitudes were underestimated. A possible reason is the decreasing number and quality of observations early in the century and therefore a lack of assimilation data. The suitability of reanalysis data to investigate extreme precipitation for England and Wales was investigated by Rhodes et al. (2015). While time series of daily precipitation totals are well represented in both data sets, timing errors of heavy precipitation events were identified as one of the major problems. Stucki et al. (2012) investigated historical flooding events in Switzerland and indicate that the reanalyses underestimate precipitation in Switzerland which may result from the insufficient representation of the alpine topography. The timing and the exact location of heavy precipitation were also found to be inaccurate.
As shown by van der Wiel et al. (2019) or Martel et al. (2020), large ensembles can have an added value for flood risk estimation and for the calculation of return periods of heavy precipitation. van der Wiel et al. (2019) found a clear benefit in using an ensemble approach for the estimation of changes in hydrological extremes including compound events compared to traditional approaches. Martel et al. (2020) found similar results, namely a reduction in the projected return period of 100-year annual maximum precipitation with the different ensembles, albeit having different model structures and resolutions. Furthermore, it was emphasized that a higher resolution is advantageous to predict climate change signals over complex terrain. Other studies also highlighted the improvements of using high-resolution RCMs for the investigation of climate extremes (e.g., Feser et al., 2011;Feldmann et al., 2008Feldmann et al., , 2013Schewe et al., 2019), especially over complex terrain (e.g., Torma et al., 2015).
The studies mentioned above document partly contrasting results and demonstrate the challenges arising when dealing with extreme precipitation and related phenomena. In this study, a set of different realizations with one RCM is used and combined to the new LAERTES-EU (LArge Ensemble of Regional climaTe modEl Simulations for EUrope), which can be used for more profound statistical analyses. The basis is the global 20th Century Reanalysis (20CR) data set (Compo et al., 2011), which was dynamically downscaled for Europe. LAERTES-EU consists of a handful of 20th century reanalysis data sets and a large ensemble of decadal hindcast simulations mainly for the second half of the century. Although all simulations were performed with the same RCM version and setup, LAERTES-EU is a combination of different external forcings, boundary conditions, and/or assimilation. Predictions for the upcoming decade will round up our analysis. The investigative focus lies on daily values of intensive areal precipitation which can be associated with major flood events in central Europe. As demonstrated, for example, by Schröter et al. (2015), severe flood events along the major river networks in central Europe are related to longlasting and widespread precipitation events of mainly stratiform origin with embedded convective precipitation. Typically, intensities do not reach the most extreme rates of the F. Ehmele et al.: Heavy precipitation in central Europe 471 distribution but are characterized by high spatial mean values.
LAERTES-EU is validated in terms of coincidence with observations regarding temporal variability, statistical distributions, and possible long-term trends. The following research questions will be addressed. A better interpretation of RCM data and a more profound understanding of extreme areal precipitation may have several applications such as risk assessments. Although they are relevant, we do not handle the potential mechanisms behind temporal variance and trends as well as spatial and seasonal differences as this goes beyond the scope of this study. This paper is structured as follows. The data sets which were used in this study are introduced in Sect. 2. Section 3 sums up the methods used for the analysis and the validation. In Sect. 4, LAERTES-EU is validated with observations for a reference period. The investigation of temporal variabilities and trends is given in Sect. 5. Finally, Sect. 6 gives a summary and lists our conclusions.

Data sets
Two different types of data sets are applied in this study: gridded precipitation data based on observations and partly century-long climate model simulations (LAERTES-EU). The observational data sets are primarily available for the second half of the 20th century and serve as reference data for the validation of the ensemble. Furthermore, we compare LAERTES-EU with the forcing global model and also with the global 20CR data set (Compo et al., 2011), which were used as initial data for some of the simulations.

Observations
The European observational data set (E-OBS) version 17 including daily precipitation (Haylock et al., 2008;van den Besselaar et al., 2011) is a gridded data set with a horizontal resolution of 0.22 • (≈ 25 km) covering the years 1950 to 2017. This version shows some improvements towards older versions, since updated algorithms and new stations have been included in some areas (e.g., for Poland). The E-OBS algorithm interpolates observations from weather stations to a regular grid using geostatistical methods (e.g., Journel and Huijbregts, 1978;Goovaerts, 2000). Note that E-OBS is a land-only data set and ocean grid points are set to a missing value. Haylock et al. (2008) stated that rainfall totals in E-OBS are reduced by up to almost one-third compared to the raw station data at the corresponding grid cells. Regarding extremes, the deviation of E-OBS is even more pronounced (Hofstra et al., 2009). Nevertheless, both studies stated that the spatial mean precipitation in E-OBS is very close to other observations.
Although E-OBS has some limitations, we use it as the main reference for this study, as there is no other comparable high-resolution daily precipitation data set available that covers entire Europe for a long time period. Other products like satellite data with a very limited time frame are not helpful and also have limitations. There are single ground-based observations with very long time series, but as the focus of this study is on intensive areal precipitation, these data are of limited usefulness for validation.
In addition to E-OBS, we compare the RCM simulations with the central European high-resolution gridded daily data sets (HYdrological RASter data sets; HYRAS) provided by the German Weather Service (DWD; Rauthe et al., 2013). HYRAS is a gridded precipitation data set with a horizontal resolution of up to 1 km for the time period 1951-2006 and covers Germany and the surrounding river catchments. The HYRAS algorithm also uses ground-based measurements and interpolates the point observations to the regular grid. For this study, the HYRAS data were first aggregated to the E-OBS/RCM 25 km grid. HYRAS hereafter refers to this aggregated 25 km data set.
2.2 Regional climate model simulations LAERTES-EU combines a large number of regional dynamical downscaling simulations for Europe performed with a single RCM. The used RCM is the non-hydrostatic model of the Consortium for Small-scale Modeling (COSMO) CLimate Mode (CLM) version 5 (CCLM5; Rockel et al., 2008), which has a spatial resolution of 0.22 • (≈ 25 km). The model covers the European domain of the Coordinated Downscaling Experiment (EURO-CORDEX) 1 . Overall, the simulations use the same domain, model version, and setup, which was adapted from EURO-CORDEX . According to Feldmann et al. (2008), a dry-day correction is important as climate models tend to overestimate the number of wet days with low intensities below 0.1 mm, known as the drizzle effect (Berg et al., 2012). In order to reduce this typical bias, a dry-day adjustment was first applied to LAERTES-EU. The E-OBS data were used for this correction, as they have the same spatial extension and resolution as the CCLM simulations. All simulations are performed within the BMBF (Federal Ministry of Education and Research of Germany) MiKlip II 2 project (Marotzke 472 F. Ehmele et al.: Heavy precipitation in central Europe et al., 2016) to create and test a decadal prediction system including a regional downscaling component for Europe.
For all downscaling simulations, the boundary conditions were derived from the Max Planck Institute of Meteorology coupled Earth System Model (MPI-ESM). This global model consists of an atmospheric component (ECHAM6) (Stevens et al., 2013), an ocean component (MPI-OM) (Jungclaus et al., 2013), and a land-surface model (JSBACH) (Hagemann et al., 2013).
LAERTES-EU is divided into four different data blocks (Table 1) depending on the setup of the forcing MPI-ESM ensemble simulations. The differences between the four data blocks stem from the setup, external forcing, and initialization of the MPI-ESM simulations. Data blocks 1 and 2 of the RCM ensemble (compare Table 1) obtained the boundary values from the MPI-ESM-LR simulations using a T63 resolution and 47 vertical layers. Data blocks 3 and 4 used the MPI-ESM-HR version (Müller et al., 2018) as their driving model. In this version, the horizontal resolution is T127 and 95 vertical layers are applied. Three types of forcing ensembles can be distinguished: i. MPI-ESM assimilates reanalysis data for long-term simulations (data block 1); ii. long-term historical-type simulations, according to the CMIP5 specifications (data block 3; Taylor et al., 2012); and iii. initialized decadal (10-year) hind-and forecast simulations (data blocks 2 and 4).
In data block 1, the first type (I) is applied. Here, the 20CR data (Compo et al., 2011) are assimilated into the MPI-ESM-LR (Müller et al., 2014). The 20CR data set has a spatial resolution of approximately 2 • (T62) and was generated using the Global Forecast System (GFS; Kanamitsu et al., 1991;Moorthi et al., 2001) of the National Centers for Environmental Prediction (NCEP) 3 . It used a 56-member ensemble Kalman filter approach to assimilate surface pressure, monthly sea surface temperature, and sea-ice observations. Three of the 20CR members are assimilated into MPI-ESM to provide long-term (110 years each) climate reconstruction simulations over the period 1900(Müller et al., 2014. Afterwards, a downscaling with CCLM uses these global simulations as boundary conditions (e.g., Primo et al., 2019).
Data block 3 consists of the second type (II), where five socalled historical simulations of MPI-ESM-HR with CMIP5 observed natural and anthropogenic external climate forcing (Taylor et al., 2012) are used as boundary conditions for CCLM. The ensemble was generated by starting the MPI-ESM from arbitrary dates in a pre-industrial control simulation (Müller et al., 2014). Three of the five CCLM members cover the period 1900-2005 (106 years each). The two ad-ditional simulations cover the period 1960-2005 (46 years each).
Data blocks 2 and 4 consist of initialized decadal simulations (type III). The starting conditions are derived from an observed state (Müller et al., 2012;Marotzke et al., 2016). For each starting year, an ensemble of decadal simulations is generated and then, the initialization point is shifted by 1 year (e.g., 1961-1970, 1962-1971, and so on). Due to the overlap, a specific calendar year may be covered by several decadal hindcasts with different starting years. These decadal hindand forecasts thus represent the current state of the major modes of climate variability compared to the so-called uninitialized historical simulations (data block 3). The downscaling procedure, the skill, and the added value are described in Mieruch et al. (2014, and Reyers et al. (2019).
In data block 2, the starting conditions of the three decadal hindcast members with MPI-ESM-LR are derived from the assimilation experiments in data block 1. The starting years of the CCLM downscaling range from 1910 to 2009. This means the last simulated year is 2019.
Data block 4 consists of two parts. Both of them use the MPI-ESM-HR version. The so-called preop-ensemble has five members. The external climate forcing is derived from CMIP5. The starting years range from 1960 to 2016 (last simulated year is 2026). The so-called dcppA-hindcast ensemble has 10 members and uses the external forcing for CMIP6 (Eyring et al., 2016). The global simulations are a contribution to the Decadal Climate Prediction Project of CMIP6 (DCPP; Boer et al., 2016). The starting years are 1960 to 2018 (last simulated year is 2028).
In total, LAERTES-EU consists of 1183 simulation runs (sample size) with approximately 12 500 simulated years. The number of ensemble members for a specific year varies from six at the beginning of the century to a maximum of 188 members between 1970 and 2000 (see Fig. S1 in the Supplement). The simulations in all four data blocks are affected by the observed external climate forcing, but they differ with respect to the representation of the observed climate variability; whereas data block 1 uses assimilated 20CR data, data blocks 2 and 4 contain initialized hindcasts, which to some degree follow the observed low-frequency variability, and data block 3 only uses the external forcing information. Nonetheless, the four groups of downscaling simulations can be grouped into a large ensemble, since the regional simulations were all performed with the same setup of the RCM. Despite the same initial conditions and model setup, the temporal evolution of the day-to-day weather is (statistically) independent between the members after a few weeks. This is an advantage, since the data set is homogeneous over time but also covers uncertainties in the observations including unknown and not-yet-observed events. The validity of this combination approach is tested within Sect. 4.

Methods
The ability of LAERTES-EU to simulate realistic precipitation amounts and distribution is an important requirement. Moreover, temporal variability and possible trends should also be well represented for trustworthy data sets. The methods were applied to different investigation areas and time periods. Equations and additional information can be found in Appendix A-C. As the focus of this study is intensive areal precipitation, we concentrate on high percentiles of spatially aggregated daily rainfall totals, namely 99 %, and 99.9 %. The percentiles are based on wet days only. First, a spatial aggregation of daily precipitation values was applied. Afterwards, the percentiles of these areal precipitation were calculated for each year separately. In all data sets, ocean grid cells were set to a missing value and therefore neglected.

Validation methods
LAERTES-EU is analyzed and validated using various methods. The intensity spectrum gives the statistical probability of each precipitation amount by taking into account all grid points and all time steps within the investigation area and without any aggregation. Therefore, the range of occurred values is divided into evenly spaced histogram classes, which then are normalized with the total sample size. The resulting intensity-probability curve (IPC) is a good indicator of whether the model is capable to simulate realistic precipitation intensity distributions.
As an extension to the IPCs, the linear error in probability space L (cf. Eqs. A1-A3) is analyzed (e.g., Ward and Folland, 1991;Potts et al., 1996). Therefore, empirical cumulative density functions (ECDFs) are calculated for each simulation run and for the observations. The data basis is the same as for the IPCs. The value C r (Eq. A1) is defined as the difference between the ECDF of a model run r and that of the observation (difference of probabilities) up to a specific precipitation intensity. It is therefore a measure for the overor underestimation of the model. Using C r , the linear error in probability space (L r ; Eq. A2) is the mean of the absolute values | C r | over the entire precipitation range as defined by Déqué (2012) or Wahl et al. (2017). The better both density functions coincide, the lower the value of L r . According to Eq. (A2), L r is always positive. The ensemble mean is given by L (Eq. A3).
The internal variability of LAERTES-EU on different time intervals is compared to that of the observations. Given that the focus of this study is on intensive widespread precipitation, this analysis is performed using spatial mean precipitation amounts averaged over the investigation areas. First, the time series of daily spatial means are aggregated over different intervals, namely monthly, seasonal, and yearly precipitation sums, as well as 5-, 10-, or 30-year running means. In a second step, the standard deviation of a gamma distribution σ is calculated for each of these interval series (see Appendix A; Eq. A4), for every single member of LAERTES-EU, and for the observations. Finally, the ensemble mean of the four data blocks and of the complete ensemble is built. This method enables the analysis of how well the internal variability on different timescales is captured by LAERTES-EU.
The quantile-quantile (Q-Q) plot compares the simulated distribution with the observed one using different percentiles of daily spatial mean precipitation. The Q-Q distributions are used to calculate the coefficient of determination R 2 with R being the Pearson correlation coefficient (Eq. A5).
The added value of the ensemble size is analyzed by using the signal-to-noise ratio (S2N) (Eq. A6). Therefore, we determine a Gumbel distribution (cf. Appendix A) for different sample sizes and the corresponding 90 % confidence interval. The S2N is then the ratio of the return value of the Gumbel distribution divided by the 90 % confidence interval (Früh et al., 2010).

Decadal variability and trend analysis
For the analysis of the temporal evolution of heavy precipitation, we use time series of different percentiles of spatial mean precipitation and quantities introduced and recommended by the Expert Team on Climate Change Detection and Indices (ETCCDI; Karl et al., 1999;Peterson, 2005). Currently, 27 indices for temperature and precipitation are defined by the ETCCDI. These indices can be used from local to global scales. Additionally, they combine extremes with a mean climatological state . In this study, we use the two indices (R95pTOT and R99pTOT; Eqs. B1 and B2), which indicate the amount of precipitation above the 95th or 99th percentile, respectively.
In terms of trend analysis, a Mann-Kendall test (Mann, 1945;Kendall, 1955) is performed with related significance investigations (Appendix C). Regarding possible oscillations, the complete time series is split into subseries with a minimum length of 10 years and up to 130 years (trend matrix). The Mann-Kendall test is applied to each of these subseries.

Investigation areas and time periods
The focus of this study is central Europe, implying the countries Germany, Switzerland, the Netherlands, Belgium, Luxembourg, and parts of France, Poland, Austria, the Czech Republic, and Italy. Following Christensen and Christensen (2007), these countries are mostly coincident with two of the areas defined in the PRUDENCE (prediction of regional scenarios and uncertainties for defining European climate change risks and effects) project, namely the PRUDENCE regions (PRs) Mid-Europe (ME) and the Alps (AL; Fig. 1). Although these boxes contain both land and ocean, the latter was set to a missing value and neglected. During validation, ME and AL were reduced to the HYRAS grid cells lying within the corresponding box, hereafter referred to as ME * and AL * .
The data sets are investigated on different time periods (TPs): TP1 covers the past from 1900 to 2017, which is divided into a subperiod (TP1b) only containing the period 1951 to 2006, with both observations (E-OBS and HYRAS) being available. TP2 is used for the predictions from 2018 to 2028. Note that the simulations were performed within the MiKlip project back in 2018 (using observations until 2017), which is the reason why the prediction period starts in 2018.
For climatological aspects, we use the time period of 1961-1990, hereafter referred to as climTP. A couple of studies (e.g., Cahill et al., 2015;Folland et al., 2018) showed that the climate change signal for global mean temperature significantly increased since the early 1980s. Therefore, using the time period of 1981-2010 as reference would possibly include a strong changing signal to the analysis. Using 1961-1990 reduces the influence of these effects, as this period shows more stable conditions to a certain degree. This also permits more room for the interpretation of the future predictions.

Validation of the RCM ensemble
In the following, the above-described methods are applied in order to validate LAERTES-EU concerning its representativeness with observations. With this aim, data for the investigation period TP1b are used, and the boxes ME and AL (cf. Fig. 1) are limited to the HYRAS area (ME * and AL * ).

Statistical distributions and frequencies
The IPCs give the range of simulated (observed) precipitation intensities at any grid point within the investigation area and its corresponding probability (Fig. 2). For both investigation areas, the IPCs reveal a distinct added value of the RCM compared to the global model. Due to the coarse resolution, intensities greater than approximately 100 mm d −1 are not found in the GCMs, which underestimate by a large degree the probability of the high intensities. The same applies for the global reanalysis 20CR. On the other hand, the RCM tends to overestimate the probability for precipitation intensities above a threshold of approximately 50 mm d −1 but covers the entire range of values as the observations. The wider range of intensities at the upper tail of the distribution may include possibly not-yet-observed events.
For ME * , the IPCs of the RCM are close to HYRAS, but there is a systematic difference between HYRAS and E-OBS (Fig. 2a). As already mentioned by Haylock et al. (2008), E-OBS has a certain negative bias up to −30 % when using grid-point-based quantities. The given deviation of HYRAS and E-OBS is within this range. Similar results can be found for AL * (Fig. 2b). The differences between the RCM simulations and the observations at a given probability are slightly less than those for ME * . For both areas, the range of simulated values is much higher (up to 400 mm d −1 ). Naturally, Figure 2. Intensity-probability curves (IPCs) of daily rainfall totals of the RCM simulations (dry-day adjusted), observations (E-OBS and HYRAS), GCM simulations (forcing MPI-ESM data at two resolutions -LR and HR), and global reanalysis data (20CR) for (a) Mid-Europe (ME * ) and (b) the Alps (AL * ), both limited to the HYRAS area during the investigation period TP1b (1951TP1b ( -2006. For the IPCs, every grid cell value at every time step was taken into account without any aggregation.
higher intensities are more likely in the mountainous AL * region.
In contrast to the grid-point-based IPCs, Fig. 3 shows the mean standard deviation of a gamma distribution (cf. Sect. 3.1 and Appendix A) for the time series of spatial mean precipitation amounts aggregated over different time intervals. For both areas, there is an expectable continuous decrease of internal variability towards longer periods for all data sets/data blocks. For ME * , LAERTES-EU is in good agreement with both observations at least up to a yearly perspective. For longer time periods, data block 1 shows a slightly different behavior compared to the other data blocks and observations. Nevertheless, data blocks 2-4 and the ensemble mean continue to match with the observations up to the 10-year running mean. Note that it is not possible to estimate the 30-year running mean for the decadal simulations of data blocks 2 and 4 given the data availability. For data block 3, only an external climate forcing was used, meaning these so-called historicals are free runs in terms of daily weather evolution. Therefore, it is not expected that the multi-decadal variability is in phase to the observed circulation after a certain time, which can be a reason for slightly higher differences of data block 3 compared to the observations at the longest timescale. Furthermore, note that the results of Fig. 3 do not indicate a perfect match of LAERTES-EU in terms of absolute values, but rather that the internal variability (spread) of spatial mean precipitation totals is well captured. For the mountainous AL * region, the internal variability is higher and all data blocks have a higher standard deviation at all time intervals. This means that the spread of simulated precipitation amounts is increased compared to that of the observation. A possible reason for this difference can emerge from sparse measurements in that region considered for both E-OBS and HYRAS, especially for long-term observations. The more or less constant difference between LAERTES-EU and the observations can be an indicator of a possibly systematic bias in this region.
The Q-Q plots of daily spatial mean precipitation fields for both investigation areas are shown in Fig. S2. Generally speaking, the distribution of the RCM is similar to those of the observations, at least to E-OBS, with little deviations from the optimum (diagonal line) for most of the spectrum and differences at around 10 % for the upper part of the distribution. In comparison to HYRAS, the maximum deviation is higher at around 20 %. For AL * , the differences between the RCM and HYRAS are larger than for ME * (Fig. S2). Even though HYRAS was aggregated to the E-OBS/RCM grid, the more pronounced differences especially for the extremes might be a result of the higher resolution of the HYRAS data, which, in particular, is of greater relevance in the mountainous region of AL * .
The findings of Fig. S2 are confirmed by the determination coefficients R 2 (Table 2). For both E-OBS and HYRAS, the coefficient is very high with R 2 > 0.98. There is a slightly higher R 2 for E-OBS than for HYRAS, which is an artificial effect of the data resolution. The region AL * shows a minimal higher skill compared to ME * in E-OBS and slightly lower values in HYRAS. Table 2 also reveals higher correlations of the CCLM simulations driven by the high-resolution MPI-ESM-HR data compared to those driven by the lowerresolved MPI-ESM-LR data. Even though this seems to be systematic, the differences are marginal. Table 2 also contains the mean linear error in probability space L for the different data blocks. Again, the differences between the data blocks are marginal with all cases being close to L = 0, which indicates good agreement of LAERTES-EU with observations. In contrast to R 2 , L has lower values for the simulations driven by MPI-ESM-LR. For all data blocks, L is considerably higher for the mountainous AL * region. Note that both quantities being close to  TP1b;1951-2006. The four data blocks of LAERTES-EU are considered separately; RCM mean stands for the complete ensemble mean (gray). The results for E-OBS and HYRAS are given in black and magenta. Note that it is not possible to estimate the 30-year values for the decadals of data blocks 2 and 4.

Time series
Besides overall statistics, other properties of LAERTES-EU like the temporal variability should cover the range of observations as well. Therefore, we analyze the time series of yearly values of different percentiles of the spatial mean precipitation for the investigation areas. In Fig. 4, the time series of the 99th percentile for ME * is shown. Both observa-tional data sets have a high year-to-year variability with similar shape. The ensemble mean value of LAERTES-EU is higher, with a relative deviation of 1 %-10 % (TP1b average is 7 %). The spread of both observational data sets is covered by the ensemble spread (minimum to maximum values) of LAERTES-EU except for few extreme peaks (e.g., 1985). In AL * , the E-OBS mean is about 5 % higher than HYRAS but both time series have again a similar shape (Fig. S3). The ensemble mean again is higher with relative deviations of 12 %-23 % (16 % on average) to E-OBS and 18 %-29 % (21 % on average) to HYRAS. The ensemble spread also covers the observed variability. Regarding more extreme values, namely the 99.9th percentile, similar results can be found (Figs. S4 and S5). Again, E-OBS and HYRAS show a similar behavior for both areas with mean value differences of less than 1 %. The ensemble mean shows a mostly positive bias with deviations of less than 10 % (6 % on average during TP1b) compared to E-OBS for ME * and 6 %-18 % (average of 10 %) for AL * . Furthermore, there is a distinctly higher spread and variability of the 99.9 % for both the observations and LAERTES-EU. Except for a few peaks, LAERTES-EU covers the spread of the observations.

Added value of the sample size
In order to demonstrate the added value of the presented LAERTES-EU, we use the S2N (Eq. A6) for different sample sizes and return periods (cf. Appendix A). Sample size, in this case, means the number of simulation runs. Note that the simulations vary in length (number of years) with a minimum length of 10 years and a maximum of 110 years. In order to reduce the influence of the sample length on the results, the single simulation runs of LAERTES-EU were randomly con- catenated using a 100-fold permutation. Observations have a sample size of 1. Again, S2N is calculated for daily spatial mean precipitation amounts during TP1b only using the HYRAS area.
For both ME * and AL * , S2N steadily increases with sample size for all calculated return values, indicating a more statistically robust estimate of the return values (Fig. 5). Furthermore, the S2N is lower for higher return periods which is a result of the increasing uncertainty of the best estimate due to fewer or even no data points for very high return periods. However, S2N also increases with sample size for the very high return periods. The robustness of a 2-year return value estimate of a sample of size 1 is about the same as the 1000year estimate for a sample of size 20. This means that even for extremes, which have not been observed yet, some robust statistical analysis can be carried out.

Long-term variability and trends
The temporal evolution and variability of extreme precipitation throughout the past time period (TP1; 1900-2017) and also for the predictions (TP2; 2018-2028) are evaluated in this section. Besides time series of percentiles, we use climate change indices and statistical distributions. In this section, all land grid cells within the investigation areas ME and AL are used for calculating the daily areal mean precipitation amounts. Figure 6 shows the evolution of the distribution of areal mean precipitation throughout TP1 and TP2 by treating each decade independently. For the core of the distributions, namely medians, interquartile ranges, and upper whiskers, only small variance can be found between the different decades, which means that there is almost no change for the majority of the precipitation amounts. Nevertheless, a marked positive trend for the uppermost extremes of the distributions appears with maximum values around 18 mm d −1 at the beginning of the 20th century and about 24 mm d −1 in the 21st century. The distribution for the upcoming decade (2020-2028) shows only small differences to that of the present decade (since 2010), with an almost equal median and interquartile range but slightly higher maximum values (Fig. 6, green boxplot). Note that the decade of 2010-2019 contains the years 2018 and 2019 from the predictions, and that the last "decade" 2020-2028 is shorter with 9 years.

Precipitation distributions
The boxplot for AL is shown in Fig. S6 and illustrates that not only the high percentiles reveal a decrease in the middle of the century, but the entire distribution is shifted towards lower values. Nevertheless, there is no clear tendency for the maximum values. For the upcoming decade, the distribution is similar to that of the present decade in the case of the median and the upper part of the distribution (Fig. S6,  Figure 6. Boxplot of the distribution of daily spatial mean precipitation values (including dry days) for ME. Each decade was considered separately. The centerline of a box marks the median; the lower and upper ends of the box mark the 25th and 75th percentiles (interquartile range); the whiskers represent approximately the 99.9th percentile; the prediction part is marked in green. green boxplot). The interquartile range is reduced due to a increased lower boundary of the boxplot.

Overview
The overall trend during TP1 and TP2 using a linear regression for both areas and percentiles is given in Table 3. While the ensemble mean shows a significant positive trend for ME for both percentiles, a small but significant negative trend can be found for the 99th percentile of AL, while there is almost no change in the 99.9th percentile of AL. In all cases, the ensemble spread increases due to both a decrease of the minimum values and an increase of the maximum values both being highly significant. The change of the maximums is stronger than the reduction of the minimums and more pronounced in AL than in ME.
Analogous to Table 3, we analyze the trend for TP1b only (Table S1 in the Supplement). The tendencies are the same for all cases but less pronounced, except for the mean 99.9 % of AL where the negative trend during TP1b is slightly stronger than for the whole time series. Figure 7 shows the temporal evolution of the 99th percentile during the 20th and the beginning of the 21st century for the whole LAERTES-EU. As given in Table 3, the lower boundary changes are small, while there is a visible positive trend of the ensemble mean and the upper boundary of the ensemble spread. Note that the larger spread from the 1960s onwards might be artificial due to the decisively larger number of members of data block 4. Nevertheless, there is a clear consistency in the time series for ME. Table 3. Overall trend of daily spatial mean precipitation during TP1 and TP2 (1900-2028) using a linear regression of the yearly series of the 99th and 99.9th percentile (Pct; wet days only) for ME and AL. Given are absolute values and the relative changes (RCs) compared to the climatological mean (climTP;1961-1990 for the ensemble minimum (min), the ensemble mean, and the ensemble maximum (max) percentile values within LAERTES-EU, and the related significance (p value; α = 0.05).

Area
Pct Some differences emerge for AL (Fig. S7). At first, there is a distinct decrease of the ensemble mean between 1960 and 1970, which might be revealed from the rising number of members. As the ensemble matches well with the observations, we presume an overestimation of precipitation in the first half of the 20th century in that region, which could be a result of missing data for the applied dry-day correction. Due to the more complex terrain, the structure of the precip- itation fields is more complex and therefore more sensitive for different types of effects such as the dry-day correction.
The results for the 99.9th percentile are similar for both areas (Figs. S8 and S9). The positive trend for ME is even more pronounced, while the drop in the 1960s for AL is less visible, and therefore the time series is more constant.
For ME, the evolution of the number of days exceeding the climatological mean percentile reveals a strong positive and significant trend for both the 99th (Fig. 8, top) and 99.9th percentile (Fig. S10). The exact values of the climTP mean, the linear regression, the relative change, and the significance can be found in Table 4 (top numbers). For AL, the year-toyear variability is higher and the overall trend is slightly negative (Figs. 8, bottom, and S11) and at least significant for the 99th percentile. Again, we analyze the trend for TP1b separately (Table 4, bottom numbers). The tendencies for TP1b are the same but less pronounced except for the days exceeding the 99th percentile in AL, where there is a stronger trend signal in TP1b compared to the whole time series, which is also significant to a high degree.

Past trends and periodic oscillations
For a more detailed analysis of trends, the Mann-Kendall test described in Sect. 3.2 is applied to the time series of daily spatial mean precipitation percentiles. Figure 9a shows the relative number of LAERTES-EU members that show a positive or negative trend of the 99th percentile for ME. Only cases in which more than 60 % of the complete ensemble members reveal the same tendency are then considered for further investigation. For these cases, the ensemble mean trend is calculated (Fig. 9b) and the relative amount of significant members is displayed (Fig. 9c). All cases in which the ensemble reveals ambiguous tendencies are neglected (gray areas).
To a high degree, the single members show the same behavior, especially for the longer time series where positive trends are dominant. On a decadal timescale (diagonal line in Fig. 9), some oscillations appear with phases of increasing and decreasing precipitation. This signal might be smoothed, as it is not expected that the decadal simulations of data blocks 2 and 4 cover the natural variability at this timescale in detail. Furthermore, these simulations are not expected to be in phase with the long-lasting simulations of data blocks 1 and 3. The trends on this timescale reach rates of up to 0.1 mm a −1 or 1 mm per decade, respectively. The overall trend is weaker with a rate of 0-0.02 mm a −1 or 0-2 mm per century, respectively. Positive trends are more often significant than the negative, while only a small part of the ensemble shows significant trends. Similar results can be found for AL (Fig. S12). The trends on the decadal timescale reach higher rates but the oscillation is less pronounced than in ME. Again, most of the positive trends are significant, while just a few members with negative trends are significant. For the 99.9th percentile of ME, large parts of LAERTES-EU show positive trends (Fig. S13). On the decadal timescale, a clear sequence of positive and negative trends is visible. Both the increases and decreases are more pronounced than for the 99th percentile but only a few members are significant. For AL, even more parts of the ensemble have the same tendency of heavy precipitation and a higher number of members have a significant trend (Fig. S14). These trends exceed rates of decisively more than ±0.1 mm a −1 . In contrast to the results above, the 99.9th percentile for AL seems to have a multidecadal oscillation, while the overall trend of the complete time series is negative.
Further to this absolute change, the number of days exceeding the climatological 99th percentile shows an increase of 4.9 % for ME and 8.4 % for AL, and 6.7 % (ME) and 22.4 % (AL) in the case of the 99.9 % compared to the mean of 2007-2017. This also manifests in the relative anomaly (Figs. 8 and S10-S11; green bars).
Nevertheless, a more detailed trend analysis illustrated in Fig. 9 and also Figs. S12-S14 reveals that LAERTES-EU shows no clear tendency for the 99 % during TP2. Just in a few cases, more than 60 % of the members have a similar mainly positive trend signal, which, however, is not significant. In the case of the 99.9th percentile, 60 %-70 % of the members show a strong positive trend of more than 0.1 mm a −1 with 20 %-40 % of them being significant. Although the tendency for TP2 is ambiguous and less significant, it shows continuity to the present decade.

Climate change indices
The results described in the previous sections also manifest in the considered ETCCDI climate change indices (Table 5). R95pTOT shows a positive trend for ME (Fig. 10a) with a relative change of about 18 % and a strong negative trend of approximately −15 % for AL (Fig. S15). Remarkably, there is a high positive deviation in the first half of the 20th century compared to the climTP amount for AL which might be artificial due to the mentioned problems of the dry-day correction. R99pTOT shows a positive change for ME (Fig. 10b) and a slightly negative trend for AL (Fig. S16). The overestimation for AL in the early century is less pronounced for this index. Considering only the TP1b, the tendencies are the same in all cases. The positive trends for ME are less pronounced, while the negative trends for AL are stronger. The estimated trends are highly significant, except for the R99pTOT of AL for the whole time series.   climTP;1961-1990 of ETCCDI quantities for Mid-Europe (ME) and the Alps (AL), linear regression (LR) and relative change (RC) compared to climTP for different TPs, and related significance (p value; α = 0.05). Both indices are based on wet days only of daily spatial mean precipitation (land only). Compared to the present decade, the predictions show a continuation of the positive trend for ME with an increase of 2 % for R95pTOT and 5 % for R99pTOT. In contrast, both indices show a positive trend for AL with an increase of 7 % for R95pTOT and 8 % for R99pTOT, which is a complete reversal of the overall trend.

Summary and conclusions
We have presented the novel LAERTES-EU ensemble combining various regional climate model simulations done with COSMO-CLM to analyze long-term variability and trends of flood-related intensive areal precipitation across central Europe. The whole RCM ensemble was divided into four data blocks depending on forcing data, assimilation schemes, or the initialization of the driving global model MPI-ESM. The setup of the COSMO model remained the same for all simulations. In total, the presented LAERTES-EU consists of over 1100 simulation runs with approximately 12 500 simulated years on a 25 km horizontal resolution.
The focus of investigation was laid on the PRUDENCE regions Mid-Europe (ME) and the Alps (AL). Regarding intensive areal precipitation, we concentrated on high percentiles, namely 99 % and 99.9 %, of spatially averaged daily precipitation amounts. Note that it was not expected that LAERTES-EU was able to reproduce historical precipitation events on a daily base in detail but have a more accurate performance regarding long-term variations and statistical distributions on a larger scale perspective. Furthermore, the given resolution restricts the consideration of convective processes, so we concentrated on larger-scale phenomena.
With respect to our initial research questions, the following main conclusions can be drawn and summed up out of the presented results, which will be discussed in more detail afterwards: 1. LAERTES-EU is capable of representing the range of extreme areal precipitation similar to the used observational data sets and also fits into the range of previous studies (e.g., Früh et al., 2010). The four data blocks are consistent and have similar precipitation distributions. The ensemble also covers the observed temporal evolution.
2. The benefits of the large ensemble size manifest in a strong increase of the signal-to-noise ratio beyond the typically used ensemble sizes and in high statistical significance of estimated trends for the ensemble mean. Furthermore, the distribution of precipitation totals is represented in a more concise way, taking the limitations of the considered observations into account.

482
F. Ehmele et al.: Heavy precipitation in central Europe 3. Long-term trends reveal spatial differences in sign and strength. These tendencies are partly significant. Despite a quite large ensemble spread, the ensemble mean shows more explicit results. Distinct oscillations can also be found on shorter timescales (e.g., decades).
4. The predictions for the upcoming decade show a continuation of past tendencies in terms of both intensity and occurrence frequency for ME without any discontinuity to the previous time period. On the other hand, LAERTES-EU shows no clear signal for AL.
Regarding the validation (1), grid-point-based IPCs, areal mean precipitation distributions (internal variability σ and linear error in probability space L), and Q-Q distributions have been analyzed. In all cases, the IPCs of the simulations show an overestimation of precipitation on the order of 10 %-20 % compared to E-OBS. Haylock et al. (2008) found that E-OBS can have a certain negative bias of up to 30 % compared to single ground-based punctual observations. Taking this into account, the IPCs are almost coincident. Furthermore, the IPCs of LAERTES-EU show only small deviation compared to the HYRAS data set (aggregated to the model grid). The IPCs and also the Q-Q distributions of all four data blocks are coincident, which was a prerequisite for the combination to one large ensemble. The Q-Q distributions of spatially aggregated mean precipitation reveal fewer differences between LAERTES-EU and E-OBS, but an underestimation of simulated rainfall compared to HYRAS by about 10 %. The linear error in probability space L shows a good agreement of LAERTES-EU with observations in terms of the distribution of daily areal mean precipitation totals. For different aggregation intervals from daily values up to 10-year running means, the internal variability (standard deviation σ ) of LAERTES-EU matches to a high degree with that of both observations. Note that both quantities L and σ do not indicate whether the simulated absolute precipitation values coincide with the observations but rather show the agreement of statistical properties.
Regarding (2), LAERTES-EU reveals a clear added value due to the large sample size. Estimates of long return periods are more robust compared to smaller ensembles which is of importance, for instance, for risk and insurance applications. Furthermore, trends at least in the ensemble mean are highly significant. The IPCs also show a benefit of RCM data compared to the coarser global model (MPI-ESM) or the 20CR global reanalysis. Regarding extremes, LAERTES-EU includes a broader range of precipitation totals with even higher values, which are not covered by observations due to their limited temporal availability. Although the presented results reveal a broad range of realizations within LAERTES-EU, the statistics of the ensemble mean clearly benefit from the large ensemble size with a better signal-to-noise ratio.
Besides a proper representation of precipitation, long-term trends and temporal variations were of special interest. Regarding (3), the presented results show a reasonable agree-ment of LAERTES-EU concerning the temporal evolution of the considered percentiles of spatially aggregated daily precipitation totals for the different investigation areas. The ensemble spread (minimum to maximum) covers the observed variability except a few peaks. The ensemble mean shows a small positive bias compared to both observational data sets. Throughout the complete time period of TP1 , positive and significant trends can be found for ME in both percentiles (99 % and 99.9 %) and also in the number of days exceeding the climatological mean . For AL, there is no clear trend signal in the ensemble mean but an increase in the maximum values. In contrast, the number of days exceeding the climatological mean percentiles is decreasing in this area. Comparing the trends of TP1 to the shorter TP1b , the tendencies are the same but less pronounced in TP1b. On a decadal timescale, some oscillations can be found with periods of increasing precipitation and such with decreasing values. Similar results as for time series of percentiles can be found using climate change indices (ETCCDI).
Regarding (4), the predictions for the next decade (2018-2028; TP2) reveal ongoing tendencies of heavy precipitation indices. A special case is AL where the slightly negative trends in the past (TP1) turn to positive ones. Both the continuity for ME and the reversal for AL appear in all time series, namely the number of days of threshold exceedance, ETCCDI variables, and investigated percentiles. While there is a clear signal and high significance for the ensemble mean, the trends were ambiguous and less significant when the ensemble members were considered separately. However, we conclude that this tendency is likely, as it is a continuation of the results of the present decade. Similar results for parts of LAERTES-EU were found by Reyers et al. (2019). Precipitation remains a challenging task for both reanalyses and climate model simulations of the past and the future with partly contrasting results shown by several previous studies. Furthermore, long-term comprehensive observations are not available, which makes a validation difficult due to the high spatial variability of precipitation. This also affects analyses of trends or climate variability. What is known is a theoretical increase of the water vapor capacity according to the Clausius-Clapeyron (CC) equation of about 6 %-7 % per degree of temperature increase (e.g., Trenberth et al., 2003;Berg et al., 2009), which assumes a near-constant relative humidity. The CC rate is generally thought to be a proxy for future precipitation projections (Westra et al., 2013). A recent discussion about the validity of the CC rate as an estimate for future projections of heavy precipitation can be found in Zhang et al. (2017). They pointed out that besides the thermodynamic responses, changes in heavy precipitation may be also influenced by dynamical effects. Furthermore, Pfahl et al. (2017) and Kröner et al. (2017) showed that precipitation trends can be regionally influenced by contributions from both lapse-rate and circulation effects.
The ensemble mean of LAERTES-EU shows an increase of about 1.9 • C for ME and 2.3 • C for AL for the yearly mean 2 m temperature of spatial means during the 20th century (TP1; 1900-2017). Including the predictions (TP2), the increase is about 2.4 • C for ME and 2.8 • C for AL. For instance, Simmons et al. (2017) found an increase over European land masses of approximately 2 • C in the mean compared to pre-industrial conditions. Moberg et al. (2006) found an increase of about 1 • C for temperature extremes. Thus, LAERTES-EU is within the range of observed changes. The increase in temperature over the entire time period is equivalent to a CC scaling of about 15 %-20 %. The extracted changes of the high precipitation percentiles for ME make up to 50 % compared to the theoretical CC value. However, the negative tendencies for AL do not fit into this theoretical estimate.
The presented LAERTES-EU data set can be used for various applications fields. In particular, the simulations are used as input for hydrological modeling and further applications such as flood risk assessments. The presented ensemble in this case can be used as a stochastic weather generator treating the single simulations independently. This leads to the production of a quasi-stochastic hydrological discharge data set. Due to the large ensemble size, estimates of high return periods become more robust. However, it has to be mentioned that the composition of the four data blocks to one ensemble restricts the temporal homogeneity. Moreover, the validation showed a positive bias of the ensemble mean which, together with the overestimation of low intensities, requires a bias correction to avoid unrealistic discharges. This application as well as the bias correction of LAERTES-EU will be addressed in a consecutive study.
In this study, we have focused on all-year variance, oscillations, or trends. Future investigations can address a seasonal differentiated analysis of trends and oscillations as well as a more detailed investigation of the spatial distribution of these findings and potential mechanisms behind the observed variability. Previous studies indicated that there is a strong relation between precipitation in Europe and the North Atlantic Oscillation (NAO), especially during wintertime (e.g., Hurrell, 1995;Rîmbu et al., 2002;Haylock and Goodess, 2004;Nissen et al., 2010;Pinto and Raible, 2012). Moreover, Casanueva et al. (2014) found a connection between extreme precipitation and the Atlantic Multidecadal Oscillation (AMO) during the whole year. The linear error in probability space L uses the difference of probabilities C defined as where ecdf mod,r is the empirical cumulative density function of the model run r, and ecdf obs that of the observation up to precipitation intensity x. The linear error in probability space L r for a model run r is then defined as (Déqué, 2012;Wahl et al., 2017) L r describes the mean value of C r over the entire range of precipitation intensities x grouped into n classes. Using absolute values avoids a compensation of positive and negative values. The better both distributions coincide, the lower the value of L r . The ensemble mean of L r is given by with M being the total number of simulation runs. The model performance on different frequency intervals is further validated using the standard deviation of a gamma distribution σ (Wilks, 2006), which is given by In this formulation, α is the shape parameter of the gamma distribution, and β its scale parameter. The quantile-quantile analysis uses the Pearson correlation coefficient (Wilks, 2006) given by with the data series x and y of length N. The range of R is R ∈ [−1; +1] with a perfect anti-correlation at R = −1 and a perfect correlation at R = +1.
The S2N ratio in this study is defined as with the return level RV of the Gumbel distribution at return period T divided by its 90 % confidence interval at T (Früh et al., 2010). Small values of S2N indicate a more uncertain estimate; high values indicate a more robust one. The Gumbel distribution (Wilks, 2006) is an extreme value type-I distribution and often used for return period estimation. Its cumulative density function (cdf) is given by with the free parameters β = σ √ 6 · π −1 and α = x − γβ, where σ is the standard deviation of the sample x assuming a normal distribution, and γ = 0.57721 Euler's constant. For x, usually a series of yearly maximum values is used. The relationship between the cdf and the return period T is given by (Wilks, 2006) . (A8)

Appendix B: ETCCDI quantities
Two out of the 27 indices introduced and recommended by the Expert Team on Climate Change Detection and Indices 4 (ETCCDI; Karl et al., 1999;Peterson, 2005) are used in this study. R95pTOT describes the annual total precipitation sum of all values above the climatological 95th percentile of wet days (RR > 1 mm) during the reference period of . The R95pTOT of the year k is defined as where RR w k is the daily precipitation amount on a wet day during year k, RR p95 is the climatological 95th percentile, and W is the total number of wet days in year k. Analogously, the R99pTOT is defined by replacing the 95th with the 99th percentile:

Appendix C: Trends and significance
A Mann-Kendall test (Mann, 1945;Kendall, 1955) is performed for the detection of trends and its related significance. To account for possible oscillations within long time series, we first split the complete time series into subseries with a minimum length of 10 years and up to over 100 years (trend matrix). The Mann-Kendall test uses a standardized test statistic S τ following a standard Gaussian distribution (SGD). S τ is given by Here, τ is known as the Kendall τ and σ 2 τ is the variance of the SGD. A detected trend is significant if S τ lies within the upper and lower quantiles z of the SGD at a given significance level α with S τ ∈ z α 2 σ τ ; z 1− α 2 σ τ , respectively (Yue et al., 2002). Yue et al. (2002) pointed out some weaknesses of the Mann-Kendall test in the case of inherent autocorrelation. To avoid a distortion of the statistic by autocorrelation, Yue et al. (2002) presented the trend-free pre-whitening (TFPW) method. The first step is the estimation of a linear trend between two time steps (t = i and t = j ) using the Theil-Sen approach (TSA;Theil, 1950;Sen, 1968). The slope b of this linear regression is given by In a second step, the original time series x is detrended by subtracting b at each time step t: Afterwards, the lag-1 autocorrelation coefficient r 1 is removed from the trend-free series x : where r 1 is given by The modified TFPW time series x * results by re-adding the TSA slope b: This modified time series conserves the trend but is free of autocorrelation. The Mann-Kendall test is performed on the TFPW time series x * . According to Yue et al. (2002), TFPW has to be considered in cases with non-zero TSA slope and significant lag-1 autocorrelation. The significance of a trend or autocorrelation is tested on the 90 % (α = 0.1), 95 % (α = 0.05), and 99 % (α = 0.01) significance levels.
Author contributions. FE, LAK, HF, and JGP designed the study. HF performed (parts of) the RCM simulations. LAK applied the dry-day correction. FE did the analysis and plots, and wrote the initial draft. All authors contributed with discussions and revisions.
Competing interests. The authors declare that they have no conflict of interest.

Special issue statement.
This article is part of the special issue "Large Ensemble Climate Model Simulations: Exploring Natural Variability, Change Signals and Impacts". It is not associated with a conference.
Acknowledgements. The authors thank the National Centers for Environmental Prediction (NCEP) for providing the 20CR data. We acknowledge the E-OBS data set from the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com, last access: 20 May 2020) and the data providers in the ECA & D project (http://www.ecad.eu, last access: 20 May 2020). We also thank the German Weather Service (DWD) for providing HYRAS. In addition, we thank the Max Planck Institute for Meteorology (MPI-M) and DWD for the global model simulations and the German Climate Computing Center (DKRZ, Hamburg) for computing and storage resources. We also thank Martin Kadlec (AON) for discussions. We thank the reviewers for their valuable comments that helped to improve this study, and the handling editor for guidance throughout the entire process.
Financial support. We thank AON for funding the project "Hydrometeorological extreme events under recent climate conditions". We also thank the BMBF MiKlip project II (FKZ: 01 LP 1518 A/D) and ClimXtreme Project (FKZ 01 LP 1901 A) for partial funding. Joaquim G. Pinto thanks the AXA Research Fund for support (https://axa-research.org/en/project/joaquim-pinto, last access: 20 May 2020). We thank open-access publishing fund of the Karlsruhe Institute of Technology (KIT).
The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association.
Review statement. This paper was edited by Ralf Ludwig and reviewed by Raul R. Wood and two anonymous referees.