Long-term Variances of Heavy Precipitation across Central Europe using a Large Ensemble of Regional Climate Model Simulations

Widespread flooding events are among the major natural hazards in Central Europe. Such events are usually related to intensive, long-lasting precipitation. Despite some prominent floods during the last three decades (e. g. 1997, 1999, 2002, and 2013), extreme floods are rare and associated with estimated long return periods of more than 100 years. To assess the associated risks of such extreme events, reliable statistics of precipitation and discharge are required. Comprehensive observations, however, are mainly available for the last 50–60 years or less. This shortcoming can be reduced using stochastic data 5 sets. One possibility towards this aim is to consider climate model data or extended reanalyses. This study presents and discusses a validation of different century-long data sets, a large ensemble of decadal hindcasts, and also projections for the upcoming decade. Global reanalysis for the 20th century with a horizontal resolution of more than 100 km have been dynamically downscaled with a regional climate model (COSMO–CLM) towards a higher resolution of 25 km. The new data sets are first filtered using a dry–day adjustment. The simulations show a good agreement with ob10 servations for both statistical distributions and time series. Differences mainly appear in areas with sparse observation data. The temporal evolution during the past 60 years is well captured. The results reveal some long-term variability with phases of increased and decreased heavy precipitation. The overall trend varies between the investigation areas but is significant. The projections for the upcoming decade show ongoing tendencies with increased precipitation for upper percentiles. The presented RCM ensemble not only allows for more robust statistics in general, in particular it is suitable for a better estimation of extreme 15 values.


Introduction
Ongoing climate change affects not only the global scale but also impacts the regional climate. Regarding air temperature, there is a more or less clear trend in the recent past, which reveals a clear anthropogenic signal. However, various climate simulations show distinct differences for precipitation trends, especially for heavy precipitation (e.g. Moberg et al., 2006;20 Zolina et al., 2008;Toreti et al., 2010). A review of observed variability and trends in extreme climate events states that it is difficult to find significant relations between the greenhouse gas-enhanced climate change and increases or decreases in extreme precipitation events (Field et al., 2012). This is attributed to their rare occurrence, the general high spatial variability of precipitation, and due to a lack of long-term high-quality observations. Magnitude and sign of heavy precipitation trends strongly depend on various factors such as the regarded area or the con-25 sidered time period (e.g. Easterling et al., 2000). Global tendencies towards more intense precipitation throughout the 20th century were revealed, for example, by Donat et al. (2016). Varying regimes between summer and winter season also account into precipitation trends. For example, Moberg and Jones (2005) found an increase in winter precipitation across central and western Europe between 1901and 1999, while Pal et al. (2004 found a decrease in summer precipitation for the period 1951-2000. Dittus et al. (2016) found an increasing trend between 1951 and 2005 in extreme total precipitation amounts for Europe 30 in GCM simulations (CMIP5). Similar trends were found in global reanalyses (e.g. ERA-20C, Poli et al., 2016), but not in observations. In contrast, Primo et al. (2019) found positive trends for two ground-based observational stations in Germany using extreme precipitation indices.
Model resolution is another crucial factor. The use of high resolution regional climate models (RCM) instead of global data sets revealed a more detailed and orographically related spatial structure of the precipitation fields and trends (e.g. Feldmann 35 et al., 2013). An increase of both areal mean precipitation and extremes in central Europe in order of 5-10 % was found in RCM simulations by Feldmann et al. (2013), which will continue with almost same magnitude for the next decade. Differences in precipitation trends also stem from varying definitions of extreme events such as certain thresholds, percentile-based indices, or return periods (e.g. Maraun et al., 2010). While most of these studies show trends in daily precipitation, just a few deal with sub-daily trends. Barbero et al. (2017), for instance, compared trends in sub-daily and daily extremes. Although significant 40 increasing trends were found for both time ranges, trends in daily extremes are better detected than in sub-daily extremes.
Spatially extended intensive rainfall events are frequently related to widespread flooding along the main river networks of central Europe causing major damage in the order of several billion euro (EUR) per event (e.g. Uhlemann et al., 2010;Kienzler et al., 2015;Schröter et al., 2015;MunichRe, 2017). A prominent example of such an extreme and devastating event is the flood in 2012 along the rivers Elbe and Danube (Ulbrich et al., 2003a, b). Such outstanding events are by definition extremely 45 rare, which makes the risk estimation difficult or almost impossible due to the limited time period with available area-wide observations (e.g. Pauling and Paeth, 2007;Hirabayashi et al., 2013). However, trend analyses of such extreme events and the related risks during the past and for the future are of great importance for insurance purposes or flood protection (e.g. Merz et al., 2014;Schröter et al., 2015;Ehmele and Kunz, 2019).
A possible way of dealing with the unsatisfactory data availability are century-long simulations using climate models (e.g. 50 Stucki et al., 2016) or stochastic approaches (e.g. Peleg et al., 2017;Singer et al., 2018;Ehmele and Kunz, 2019). The currently used GCMs were found to be in good agreement with the available but limited observations (Fischer and Knutti, 2016). Brönnimann et al. (2013) or Brönnimann (2017) analyzed historical extreme events using century-long reanalysis data sets and concluded that the quality of the reanalyses strongly depends on the number and type of the assimilated observations. The investigated historical events were reproduced, but the magnitudes were underestimated. A possible reason is the decreasing 55 number and quality of observations in the early century and therefore, a lack of assimilation data. The suitability of reanalysis data to investigate extreme precipitation for England and Wales was investigated by Rhodes et al. (2015). While time series of daily precipitation totals are well represented in both data sets, timing errors of heavy precipitation events were identified as one of the major problems. Stucki et al. (2012) investigated historical flooding events in Switzerland and indicate that the reanalyses underestimate precipitation in Switzerland which may result from the insufficient representation of the alpine topography. The 60 timing and the exact location of heavy precipitation were also found to be inaccurate.
As shown by van der Wiel et al. (2019) or Martel et al. (2020), large ensembles can have an added values for flood risk estimation and for the calculation of return periods of heavy precipitation. van der Wiel et al. (2019) found a clear benefit in using an ensemble approach for the estimation of changes in hydrological extremes including compound events compared to traditional approaches. Martel et al. (2020) found similar results, namely a reduction in the projected return period of period. The investigation of temporal variabilities and trends is given in Sect. 5. Finally, Sect. 6 gives a summary and lists our conclusions.

95
Two different types of data sets are applied in this study: gridded precipitation data based on observations and partly centurylong climate model simulations (LAERTES-EU). The observational data sets are primarily available for the second half of the 20th century and serve as reference data for the validation of the ensemble. Furthermore, we compare LAERTES-EU with the forcing global model and also with the global reanalysis data set 20CR (Compo et al., 2011), which were used as initial data for some of the simulations.

Observations
The European observational data set E-OBS version v17 including daily precipitation (Haylock et al., 2008;van den Besselaar et al., 2011) is a gridded data set with a horizontal resolution of 0.22 • (≈ 25 km) covering the years 1950 to 2017. This version shows some improvements towards older versions, since updated algorithms and new stations have been included in some areas (e.g. for Poland). The E-OBS algorithm interpolates observations from weather stations to a regular grid using geostatistical 105 methods (e.g. Journel and Huijbregts, 1978;Goovaerts, 2000). Note that E-OBS is a land-only data set and ocean grid points are set to a missing value. Haylock et al. (2008) stated that rainfall totals in E-OBS are reduced by up to almost one third compared to the raw station data at the corresponding grid cells. Regarding extremes, the deviation of E-OBS is even more pronounced (Hofstra et al., 2009). Nevertheless, both studies stated that the spatial mean precipitation in E-OBS is very close to other observations.

110
Although E-OBS has some limitations, we use it as main reference for this study as there is no other comparable highresolution daily precipitation data set available that covers entire Europe for a long time period. Other products like satellite data with a very limited time frame are not helpful and also have limitations. There are single ground-based observations with very long time series but as the focus of this study is on intensive areal precipitation this data is of limited usefulness for validation.

115
Additionally to E-OBS, we compare the RCM simulations with the high-resolved HYRAS data set provided by the German Weather Service (DWD; Rauthe et al., 2013). HYRAS is a gridded precipitation data set with a horizontal resolution of up to 1 km for the time period 1951-2006 and covers Germany and the surrounding river catchments. The HYRAS algorithm also uses ground based measurements and interpolates the point observations to the regular grid. For this study, the HYRAS data was first aggregated to the E-OBS/RCM 25 km grid. HYRAS hereafter means this aggregated 25 km data set. 120 2.2 Regional climate model simulations LAERTES-EU combines a large number of regional dynamical downscaling simulations for Europe performed with a single RCM. The used RCM is the non-hydrostatic model of the Consortium for Small-scale Modelling (COSMO) in climate mode model version 5 (CCLM5; Rockel et al., 2008), which has a spatial resolution of 0.22 • (≈ 25 km). The model covers the EURO-CORDEX 1 domain . Overall, the simulations use the same domain, model version and set-up, 125 which was adapted from EURO-CORDEX . According to Feldmann et al. (2008), a dry-day correction is important as climate models tend to overestimate the number of wet days with low intensities below 0.1 mm, known as the drizzle effect (Berg et al., 2012). In order to reduce this typical bias, a dry-day adjustment was first applied to LAERTES-EU.
The E-OBS data were used for this correction, as they have the same spatial extension and resolution as the CCLM simulations.
All simulations are performed within the BMBF (Federal Ministry of Education and Research of Germany) project MiKlip II 2 130 (Marotzke et al., 2016) to create and test a decadal prediction system including a regional downscaling component for Europe.
For all downscaling simulations the boundary conditions were derived from the Max-Planck Institute of Meteorology coupled Earth System Model (MPI-ESM). This global model consists of the atmospheric component ECHAM6 (Stevens et al., 2013), the ocean component MPI-OM (Jungclaus et al., 2013), and the land-surface model JSBACH (Hagemann et al., 2013).
LAERTES-EU is divided into four different data blocks (Table 1) (Müller et al., 2018) as their driving model. In this version, the horizontal resolution is T127 and 95 vertical layers are applied.
Three types of forcing ensembles can be distinguished:
In data block 1, the first type (I) is applied. Here the 20th Century Reanalysis data (20CR; Compo et al., 2011) are assimilated into the MPI-ESM-LR (Müller et al., 2014). 20CR has a spatial resolution of approximately 2 • (T62) and was generated using 145 the Global Forecast System (GFS; Kanamitsu et al., 1991;Moorthi et al., 2001) of the National Centers for Environmental Prediction (NCEP) 3 . It used a 56 member Ensemble Kalman Filter approach to assimilate surface pressure, monthly sea surface temperature and sea-ice observations. Three of the 20CR members are assimilated into MPI-ESM to provide long-term (110 years each) climate reconstruction simulations over the period 1900(Müller et al., 2014. Afterwards, a downscaling with CCLM uses these global simulations as boundary conditions (e.g. Primo et al., 2019).

150
Data block 3 consists of the second type (II), were five so called historical simulations of MPI-ESM-HR with CMIP5 observed natural and anthropogenic external climate forcing (Taylor et al., 2012) are used as boundary conditions for CCLM.
The ensemble was generated by starting the MPI-ESM from arbitrary dates in a pre-industrial control simulation (Müller et (Müller et al., 2012;Marotzke et al., 2016). For each starting year, an ensemble of decadal simulations is generated and then, the initialization point is shifted by one year (e.g. 1961-1970, 1962-1971, and so on). Due to the overlap, a specific calendar year may be covered by several decadal hindcasts with different starting years. These decadal hind-and forecasts thus represent the current state of the major modes of climate variability compared to the so-called un-initialized historical  In total, LAERTES-EU consists of 1183 simulation runs (sample size) with approximately 12.500 simulated years. The number of ensemble members for a specific year varies from six at the beginning of the century to a maximum of 188 members between 1970 and 2000 (see Fig. S1 in the supplemental material). The simulation in all four data blocks are affected by the observed external climate forcing, but they differ with respect to the representation of the observed climate variability, whereas data block 1 uses assimilated 20CR reanalysis data, data block 2 and 4 contain initialized hindcasts, which to some degree follow the observed low frequency variability, and data block 3 only uses the external forcing information. Nonetheless, the four groups of downscaling simulations can be grouped into a large ensemble, since the regional simulations were all performed with the same setup of the RCM. Despite the same initial conditions and model setup, the temporal evolution of the day-to-day weather is (statistically) independent between the members after a few weeks. This is an advantage, since the data set is homogeneous over time but also covers uncertainties in the observations including unknown and not yet observed events.

180
The validity of this combination approach is tested within Sect. 4.

Methods
The capability of LAERTES-EU to simulate realistic precipitation amounts and distribution is an important requirement. Moreover, temporal variability and possible trends should also be well represented for trustworthy data sets. The methods were applied to different investigation areas and time periods. Equations and additional information can be found in Appendix A-C. As 185 the focus of this study is intensive areal precipitation, we concentrate on high percentiles of spatially aggregated daily rainfall totals, namely 99 %, and 99.9 %. The percentiles are based on wet days only. First, a spatial aggregation of daily precipitation values was applied. Afterwards, the percentile of these areal precipitation were calculated for each year separately. In all data sets, ocean grid cells were set to a missing value and therefore neglected.

190
LAERTES-EU is analyzed and validated using various methods. The intensity spectrum gives the statistical probability of each precipitation amount by taking into account all grid points and all time steps within the investigation area and without any aggregation. Therefore, the range of occurred values is divided into evenly spaced histogram classes, which then are normalized with the total sample size. The resulting intensity-probability-curve (IPC) is a good indicator if the model is capable to simulate realistic precipitation intensity distributions.

195
As an extension to the IPCs, the linear error in probability space L (cf. Eq. A1-A3 in Appendix A) is analyzed (e.g. Ward and Folland, 1991;Potts et al., 1996). Therefore, empirical cumulative density functions (ECDF) are calculated for each simulation run and for the observations. The data basis is the same as for the IPCs. The value ∆C r (Eq. A1) is defined as the difference between the ECDF of a model run r and that of the observation (difference of probabilities) up to a specific precipitation intensity. It is therefore a measure for the over-or underestimation of the model. Using ∆C r , the linear error in probability 200 space (L r ; Eq. A2) is the mean of the absolute values |∆C r | over the entire precipitation range as defined by Déqué (2012) or Wahl et al. (2017). The better both density function coincide, the lower the value of L r . According Eq. A2, L r is always positive. The ensemble mean is given by L (Eq. A3).
The internal variability of LAERTES-EU on different time intervals is compared to that of the observations. Given that the focus of this study is on intensive widespread precipitation, this analysis is performed using spatial mean precipitation amounts 205 averaged over the investigation areas. First, the time series of daily spatial means are aggregated over different intervals, namely monthly, seasonal, and yearly precipitation sums as well as 5, 10, or 30-year running means. In a second step, the standard deviation of a gamma distribution σ Γ is calculated for each of these interval series (see Appendix A; Eq. A4), for every single member of LAERTES-EU, and for the observations. Finally, the ensemble mean of the four data blocks and of the complete ensemble is built. This method enables the analysis of how well the internal variability on different time scales is captured by 210 LAERTES-EU.
The quantile-quantile (Q-Q) plot compares the simulated distribution with the observed one using different percentiles of daily spatial mean precipitation. The Q-Q distributions are used to calculate the coefficient of determination R 2 with R being the Pearson correlation coefficient (Eq. A5).
The added value of the ensemble size is analyzed by using the signal-to-noise ratio S2N (Eq. A6). Therefore, we determine 215 a Gumbel distribution (cf. Appendix A) for different sample sizes and the corresponding 90 % confidence interval. The S2N is then the ratio of the return value of the Gumbel distribution divided by the 90 % confidence interval (Früh et al., 2010).

Decadal variability and trend analysis
For the analysis of the temporal evolution of heavy precipitation, we use time series of different percentiles of spatial mean precipitation and quantities introduced and recommended by the Expert Team on Climate Change Detection and Indices (ETC-220 CDI; Karl et al., 1999;Peterson, 2005). Currently, 27 indices for temperature and precipitation are defined by the ETCCDI.
These indices can be used from local to global scales. Additionally, they combine extremes with a mean climatological state . In this study, we use the two indices R95pTOT and R99pTOT (Eq. B1-B2 in Appendix B), which indicate the amount of precipitation above the 95 % or 99 % percentile, respectively.
In terms of trend analysis, a Mann-Kendall test (Mann, 1945;Kendall, 1955) is performed with related significance investi-225 gations (Appendix C). Regarding possible oscillations, the complete time series is split into sub-series with a minimum length of 10 years and up to 130 years (trend matrix). The Mann-Kendall test is applied to each of these sub-series.

Investigation areas and time periods
The focus of this study is central Europe, implying the countries Germany, Switzerland, the Netherlands, Belgium, Luxembourg, and parts of France, Poland, Austria, the Czech Republic, and Italy. Following Christensen and Christensen (2007), 230 these countries are mostly coincident with two of the areas defined in the PRUDENCE project (prediction of regional scenarios and uncertainties for defining European climate change risks and effects), namely the PRUDENCE regions (PR) Mid-Europe (ME) and the Alps (AL; Fig. 1). Albeit these boxes contain both land and ocean, the latter was set to a missing value and neglected. During validation, ME and AL were reduced to the HYRAS grid cells lying within the corresponding box, hereafter referred to as ME * and AL * .  In the following, the above described methods are applied in order to validate LAERTES-EU concerning its representativeness with observations. With this aim, data for the investigation period TP1b is used and the boxes ME and AL (cf. Fig. 1) are limited to the HYRAS area (ME * and AL * ).

Statistical distributions and frequencies
The IPCs give the range of simulated (observed) precipitation intensities at any grid point within the investigation area and its 250 corresponding probability (Fig. 2). For both investigation areas, the IPCs reveal a distinct added value of the RCM compared to the global model. Due to the coarse resolution, intensities greater than approximately 100 mm d −1 are not found in the For ME * , the IPCs of the RCM are close to HYRAS, but there is a systematic difference between HYRAS and E-OBS ( Fig. 2a). As already mentioned by Haylock et al. (2008), E-OBS has a certain negative bias up to -30 % when using grid point based quantities. The given deviation of HYRAS and E-OBS is in between this range. Similar results can be found for AL * (Fig. 2b). The differences between the RCM simulations and the observations at a given probability are slightly less than 260 for ME * . For both areas the range of simulated values is much higher with up to 400 mm d −1 . Naturally, higher intensities are more likely in the mountainous AL * region.
In contrast to the grid point based IPCs, Fig. 3 shows the mean standard deviation of a gamma distribution (cf. Sect. 3.1 and Appendix A) for the time series of spatial mean precipitation amounts aggregated over different time intervals. For both areas, there is an expectable continuous decrease of internal variability towards longer periods for all data sets/data blocks. For ME * ,

265
LAERTES-EU is in good agreement with both observations at least up to a yearly perspective. For longer time periods, data block 1 shows a slightly different behavior compared to the other data blocks and observations. Nevertheless, data blocks 2-4 and the ensemble mean continue to match with the observations up to the 10-year running mean. Note that it is not possible to estimate the 30-year running mean for the decadal simulations of data block 2 and 4 given the data availability. For data block3, only an external climate forcing was used meaning these so-called historicals are free runs in terms of daily weather 270 evolution. Therefore, it is not expected that the multi-decadal variability is in phase to the observed circulation after a certain time, which can be a reason for slightly higher differences of data block 3 compared to the observations at the longest time scale. Furthermore, note that the results of Fig. 3 do not indicate a perfect match of LAERTES-EU in terms of absolute values, but rather that the internal variability (spread) of spatial mean precipitation totals is well captured. For the mountainous AL * region, the internal variability is higher and all data blocks have a higher standard deviation at all time intervals. This means 275 that the spread of simulated precipitation amounts is increased compared to that of the observation. A possible reason for this difference can emerge from sparse measurements in that region considered for both E-OBS and HYRAS, especially for long-term observations. The more or less constant difference between LAERTES-EU and the observations can be an indicator of a possibly systematic bias in this region.
The Q-Q plots of daily spatial mean precipitation fields for both investigation areas are shown in Fig. S2 in the supplemental 280 material. Generally speaking, the distribution of the RCM is similar to those of the observations, at least to E-OBS, with little deviations from the optimum (diagonal line) for most of the spectrum and differences at around 10 % for the upper part of the distribution. In comparison to HYRAS, the maximum deviation is higher with around 20 %. For AL * , the differences between the RCM and HYRAS are larger than for ME * (Fig. S2). Even though HYRAS was aggregated to the E-OBS/RCM grid, the more pronounced differences especially for the extremes might be a result of the higher resolution of the HYRAS data, which, 285 in particular, is of greater relevance in the mountainous region of AL * .
The findings of Fig. S2 are confirmed by the determination coefficients R 2 (Table 2). For both E-OBS and HYRAS, the coefficient is very high with R 2 > 0.98. There is a slightly higher R 2 for E-OBS than for HYRAS, which is an artificial effect of the data resolution. The region AL * shows a minimal higher skill compared to ME * in E-OBS and slightly lower values in HYRAS. Table 2 also reveals higher correlations of the CCLM simulations driven by the high-resolution MPI-ESM-HR data 290 compared to those driven by the lower resolved MPI-ESM-LR data. Even though this seems to be systematic, the differences are marginal.
Table 2 also contains the mean linear error in probability space L for the different data blocks. Again, the differences between the data blocks are marginal with all cases being close to L = 0 which stands for a good agreement of LAERTES-EU with observations. In contrast to R 2 , L has lower values for the simulations driven by MPI-ESM-LR. For all data blocks, L 295 is considerable higher for the mountainous AL * region. Note that both quantities being close to its optimum value does not indicate a perfect model. It rather means that the overall statistics regarding the entire range of intensities to a high degree coincide with the observations.

Time series
Beside overall statistics, other properties of LAERTES-EU like the temporal variability should cover the range of observations 300 as well. Therefore, we analyze the time series of yearly values of different percentiles of the spatial mean precipitation for the investigation areas. In Fig. 4, the time series of the 99 % percentile for ME * is shown. Both observational data sets have a high year-to-year variability with similar shape. The ensemble mean value of LAERTES-EU is higher, with a relative deviation of 1-10 % (TP1b average is 7 %). The spread of both observational data sets is covered by the ensemble spread (minimum to maximum values) of LAERTES-EU except for few extreme peaks (e.g. 1985). In AL * , the E-OBS mean is about 5 % higher 305 than HYRAS but both time series have again a similar shape (Fig. S3). The ensemble mean again is higher with relative deviations of 12-23 % (16 % on average) to E-OBS and 18-29 % (21 % on average) to HYRAS. The ensemble spread also covers the observed variability.
Regarding more extreme values, namely the 99.9 % percentile, similar results can be found ( Fig. S4 and S5). Again, E-OBS and HYRAS show a similar behavior for both areas with mean value differences of less than 1 %. The ensemble mean shows 310 a mostly positive bias with deviations of less than 10 % (6 % on average during TP1b) compared to E-OBS for ME * and 6-18 % (average of 10 %) for AL * . Furthermore, there are a distinctly higher spread and variability of the 99.9 % for both, the observations and LAERTES-EU. Except for a few peaks, LAERTES-EU covers the spread of the observations.

Added value of the sample size
In order to demonstrate the added value of the presented LAERTES-EU, we use the signal-to-noise ratio (S2N , Eq. A6) for 315 different sample sizes and return periods (cf. Appendix A). Sample size, in this case, means the number of simulation runs.
Note that the simulations vary in length (number of years) with a minimum length of 10 years and a maximum of 110 years. In order to reduce the influence of the sample length on the results, the single simulation runs of LAERTES-EU where randomly concatenated using a hundredfold permutation. Observations have a sample size of 1. Again, S2N is calculated for daily spatial mean precipitation amounts during TP1b only using the HYRAS area.

320
For both ME * and AL * , S2N steadily increases with sample size for all calculated return values meaning a more statistically robust estimate of the return values (Fig. 5). Furthermore, the S2N is lower for higher return periods which is a result of the increasing uncertainty of the best estimate due to less or even no data points for very high return periods. However, S2N also increases with sample size for the very high return periods. The robustness of a 2-year return value estimate of a sample of size 1 is about the same as the 1000-year estimate for a sample of size 20. This means that even for extremes, which have not 325 been observed yet, some robust statistical analysis can be carried out.

Long-term variability and trends
The temporal evolution and variability of extreme precipitation throughout the past time period TP1 (1900-2017) and also for the predictions (TP2; 2018-2028) are evaluated in this section. Beside time series of percentiles, we use climate change indices and statistical distributions. In this section, all land grid cells within the investigation areas ME and AL are used for calculating 330 the daily areal mean precipitation amounts. The shown S2N is the mean of this permutation.

Precipitation distributions
The boxplot for AL is shown in Fig. S6 and illustrates that not only the high percentiles reveal a decrease in the middle of the century, but the entire distribution is shifted towards lower values. Nevertheless, there is no clear tendency for the maximum values. For the upcoming decade the distribution is similar to that of the present decade in case of median and the upper part of the distribution (Fig. S6, green boxplot). The interquartile range is reduced due to a increased lower boundary of the boxplot.
345 Figure 6. Boxplot of the distribution of daily spatial mean precipitation values (including dry days) for ME. Each decade was considered separately. The centerline of a box marks the median; the lower and upper end of the box mark the 25 % and 75 % percentile (interquartile range); the whiskers represent approximately the 99.9 % percentile; the prediction part is marked in green.

Overview
The overall trend during TP1 and TP2 using a linear regression for both areas and percentiles is given in Table 3. While the ensemble mean shows a significant positive trend for ME for both percentiles, a small but significant negative trend can be found for the 99 % of AL, while there is almost no change in the 99.9 % of AL. In all cases, the ensemble spread increases due 350 to both a decrease of the minimum values and an increase of the maximum values both being highly significant. The change of the maximums is stronger than the reduction of the minimums and more pronounced in AL than in ME.
Analogous to Table 3 we analyze the trend for TP1b only (Table S1 in the supplemental material). The tendencies are the same for all cases but less pronounced except for the mean 99.9 % of AL where the negative trend during TP1b is slightly stronger than for the whole time series. Figure 7 shows the temporal evolution of the 99 % percentile during the 20th and the beginning of the 21st century for the whole LAERTES-EU. As given in Table 3, the lower boundary changes are small, while there is a visible positive trend of the ensemble mean and the upper boundary of the ensemble spread. Note that the larger spread from the 1960s onwards might be artificial due to the decisively larger number of members of data block 4. Nevertheless, there is a clear consistency in the time series for ME.

355
360 Table 3. Overall trend of daily spatial mean precipitation during TP1 and TP2 (1900-2028) using a linear regression of the yearly series of the 99 % and 99.9 % percentile (pct; wet days only) for ME and AL; Given are absolute values and the relative changes (RC) compared to the climatological mean (climTP;1961-1990  Some differences emerge for AL (Fig. S7). At first, there is a distinct decrease of the ensemble mean between 1960 and 1970 which might reveal from the rising number of members. As the ensemble matches well with the observations, we presume an overestimation of precipitation in the first half of the 20th century in that region, which could be a result of missing data for the applied dry-day correction. Due to the more complex terrain, the structure of the precipitation fields is more complex, and therefore more sensitive for different types of effects such as the dry-day correction.

365
The results for the 99.9 % percentile are similar for both areas (Fig. S8 and S9). The positive trend for ME is even more pronounced, while the drop in the 1960s for AL is less visible and therefore, the time series is more constant.
For ME, the evolution of the number of days exceeding the climatological mean percentile reveals a strong positive and significant trend for both the 99 % (Fig. 8, top) and 99.9 % percentile (Fig. S10). The exact values of the climTP mean, the linear regression, the relative change, and the significance can be found in Table 4 (top numbers). For AL, the year-to-year 370 variability is higher and the overall trend is slightly negative (Fig. 8, bottom, and S11) and at least significant for the 99 % percentile. Again, we analyze the trend for TP1b separately (Table 4, bottom numbers). The tendencies for TP1b are the same but less pronounced except for the days exceeding the 99 % percentile in AL, where there is a stronger trend signal in TP1b compared to the whole time series, which is also significant to a high degree.

Past trends and periodic oscillations 375
For a more detailed analysis of trends, the Mann-Kendall test described in Sect. 3.2 is applied to the time series of daily spatial mean precipitation percentiles. Figure 9a shows the relative number of LAERTES-EU members that show a positive or negative trend of the 99 % percentile for ME. Only cases in which more than 60 % of the complete ensemble members reveal the same tendency are then considered for further investigations. For these cases, the ensemble mean trend is calculated (Fig. 9b) and the relative amount of significant members is displayed (Fig. 9c). All cases in which the ensemble reveals ambiguous tendencies respectively. The overall trend is weaker with rate of 0-0.02 mm a −1 or 0-2 mm per century, respectively. Positive trends are more often significant than the negative, while only a small part of the ensemble shows significant trends. Similar results can be found for AL (Fig. S12). The trends on the decadal time scale reach higher rates but the oscillation is less pronounced than in ME. Again, most of the positive trends are significant, while just a few members with negative trends are significant.  For the 99.9 % percentile of ME, large parts of LAERTES-EU show positive trends (Fig. S13). On the decadal time scale a clear sequence of positive and negative trends is visible. Both the increases and decreases are more pronounced than for the 99 % percentile but only a few members are significant. For AL, even more parts of the ensemble have the same tendency of heavy precipitation and a higher number of members have a significant trend (Fig. S14). These trends exceed rates of decisively more than ± 0.1 mm a −1 . In contrast to the results above, the 99.9 % percentile for AL seems to have a multidecadal oscillation, 395 while the overall trend of the complete time series is negative.
Further to this absolute change, the number of days exceeding the climatological 99 % percentile shows an increase of 4.9 % for ME and 8.4 % for AL, and 6.7 % (ME) and 22.4 % (AL) in case of the 99.9 % compared to the mean of 2007-2017. This also manifests in the relative anomaly (Fig. 8, and S10-S11; green bars).
Nevertheless, a more detailed trend analysis illustrated in Fig. 9 and also Fig. S12-14 reveals that LAERTES-EU shows no 405 clear tendency for the 99 % during TP2. Just in a few cases, more than 60 % of the members have a similar mainly positive trend signal, which, however, is not significant. In case of the 99.9 % percentile, 60-70 % of the members show a strong positive trend of more than 0.1 mm a −1 with 20-40 % of them being significant. Although the tendency for TP2 is ambiguous and less significant, it shows continuity to the present decade.

Climate change indices 410
The results described in the previous sections also manifest in the considered ETCCDI climate change indices (Table 5).
R95pTOT shows a positive trend for ME (Fig. 10a) with a relative change of about 18 % and a strong negative trend of approximately -15 % for AL (Fig. S15). Remarkably, there is a high positive deviation in the first half of the 20th century compared to the climTP amount for AL which might be artificial due to the mentioned problems of the dry-day correction.
R99pTOT shows a positive change for ME (Fig. 10b) and a slightly negative trend for AL (Fig. S16). The overestimation for 415 AL in the early century is less pronounced for this index. Considering only the TP1b, the tendencies are the same in all cases.
The positive trends for ME are less pronounced, while the negative trends for AL are stronger. The estimated trends are highly significant except for the R99pTOT of AL for the whole time series.
Compared to the present decade, the predictions show a continuation of the positive trend for ME with an increase of 2 % for R95pTOT and 5 % for R99pTOT. In contrast, both indices show a positive trend for AL with an increase of 7 % for R95pTOT 420 and 8 % for R99pTOT, which is a complete reversion of the overall trend.

Summary and Conclusions
We have presented the novel ensemble LAERTES-EU combining various regional climate model simulations done with COSMO-CLM to analyze long-term variability and trends of flood related intensive areal precipitation across central Europe. The whole RCM ensemble was divided into four data blocks depending on forcing data, assimilation schemes, or the 425 initialization of the driving global model MPI-ESM. The setup of the COSMO model remained the same for all simulations.
In total, the presented LAERTES-EU consists of over 1100 simulation runs with approximately 12.500 simulated years on a 25 km horizontal resolution.
The focus of investigation was laid on the PRUDENCE regions Mid-Europe (ME) and the Alps (AL). Regarding intensive areal precipitation, we concentrated on high percentiles, namely 99 % and 99.9 % of spatially averaged daily precipitation 430 amounts. Note that it was not expected that LAERTES-EU was able to reproduce historical precipitation events on a daily base in detail, but have a more accurate performance regarding long-term variations, and statistical distributions on a larger scale perspective. Furthermore, the given resolution restricts the consideration of convective processes, so we concentrated on larger scale phenomena.
With respect to our initial research questions, the following main conclusions can be drawn and summed up out of the 435 presented results, which will be discussed more detailed afterwards: (1) LAERTES-EU is capable of representing the range of extreme areal precipitation similar to the used observational data sets and also fits into the range of previous studies (e.g. Früh et al., 2010). The four data blocks are consistent and have similar precipitation distributions. The ensemble also covers the observed temporal evolution.
(2) The benefits of the large ensemble size manifests in a strong increase of the signal-to-noise ratio beyond the typically 440 used ensemble sizes and in high statistical significances of estimated trends for the ensemble mean. Furthermore, the distribution of precipitation totals is represented in a more concise way taking the limitations of the considered observations into account.
(3) Long-term trends reveal spatial differences in sign and strength. These tendencies are partly significant. Despite a quite large ensemble spread, the ensemble mean shows more explicit results. Distinct oscillations can also be found on shorter 445 time scales (e.g. decades).
(4) The predictions for the upcoming decade show a continuation of past tendencies in terms of both intensity and occurrence frequency for ME without any discontinuity to the previous time period. On the other hand, LAERTES-EU shows no clear signal for AL.
Regarding the validation (1), grid point based intensity-probability-curves (IPCs), areal mean precipitation distributions 450 (internal variability σ Γ and linear error in probability space L), and Q-Q distributions have been analyzed. In all cases, the IPCs of the simulations show an overestimation of precipitation in order of 10-20 % compared to E-OBS. Haylock et al. (2008) found that E-OBS can have a certain negative bias of up to 30 % compared to single ground based punctual observations. Taking this into account, the IPCs are almost coincident. Furthermore, the IPCs of LAERTES-EU show only small deviation compared to the HYRAS data set (aggregated to the model grid). The IPCs and also the Q-Q distributions of all four data blocks are 455 coincident which was a prerequisite for the combination to one large ensemble. The Q-Q distributions of spatially aggregated mean precipitation reveal less differences between LAERTES-EU and E-OBS, but an underestimation of simulated rainfall compared to HYRAS by about 10 %. The linear error in probability space L shows a good agreement of LAERTES-EU with observations in terms of the distribution of daily areal mean precipitation totals. For different aggregation intervals from daily values up to 10-year running means, the internal variability (standard deviation σ Γ ) of LAERTES-EU matches to a high degree 460 with that of both observations. Note that both quantities L and σ Γ do not indicate whether the simulated absolute precipitation values coincide with the observations, but rather show the agreement of statistical properties.
Regarding (2), LAERTES-EU reveals a clear added value due to the large sample size. Estimates of long return periods are more robust compared to smaller ensembles which is of importance, for instance, for risk and insurance applications.
Furthermore, trends at least in the ensemble mean are highly significant. The IPCs also show a benefit of RCM data compared  . For AL, there is no clear trend signal in the ensemble mean but an increase in the maximum values. In contrast, the number of days exceeding the climatological mean percentiles is decreasing in this area. Comparing the trends of TP1 to the shorter TP1b , the tendencies are the same but less pronounced in TP1b. On a decadal time scale, some oscillations can be found with periods of increasing precipitation and such with decreasing values. Similar results as for time series of percentiles can be found using climate change indices (ETCCDI).
A special case is AL where the slightly negative trends in the past (TP1) turn to positive ones. Both the continuity for ME and the reversion for AL appear in all time series, namely the number of days of threshold exceedance, ETCCDI variables, and investigated percentiles. While there are a clear signal and high significance for the ensemble mean, the trends are ambiguous and less significant when the ensemble members were considered separately. However, we conclude that this tendencies are 485 likely as it is a continuation of the results of the present decade. Similar results for parts of LAERTES-EU were found by Reyers et al. (2019). Precipitation remains a challenging task for both reanalyses and climate model simulations of the past and the future with partly contrasting results shown by several previous studies. Furthermore, long-term comprehensive observations are not available which makes a validation difficult due to the high spatial variability of precipitation. This also affects analyses of trends 490 or climate variability. What is known is a theoretical increase of the water vapor capacity according to the Clausius-Clapeyron (CC) equation of about 6-7 % per degree of temperature increase (e.g. Trenberth et al., 2003;Berg et al., 2009), which assumes a near constant relative humidity. The CC rate is generally thought to be a proxy for future precipitation projections (Westra et al., 2013). A recent discussion about the validity of the CC rate as an estimate for future projections of heavy precipitation can be found in Zhang et al. (2017). They pointed out that beside the thermodynamic responses, changes in heavy precipitation 495 may be also influenced by dynamical effects. Furthermore, Pfahl et al. (2017) and Kröner et al. (2017) showed that precipitation trends can be regionally influenced by contributions from both lapse-rate and circulation effects.
The ensemble mean of LAERTES-EU shows an increase of about 1.9 • C for ME and 2.3 • C for AL for the yearly mean 2 m-temperature of spatial means during the 20th century (TP1; 1900-2017). Including the predictions (TP2), the increase is about 2.4 • C for ME and 2.8 • C for AL. For instance, Simmons et al. (2017) found an increase over European land masses of 500 approximately 2 • C in the mean compared to pre-industrial conditions. Moberg et al. (2006) found an increase of about 1 • C for temperature extremes. Thus, LAERTES-EU is within the range of observed changes. The increase in temperature over the entire time period is equivalent to a CC scaling of about 15-20 %. The extracted changes of the high precipitation percentiles for ME make up to 50 % compared to the theoretical CC value. However, the negative tendencies for AL do not fit into this theoretical estimate.

505
The presented LAERTES-EU data set can be used for various applications fields. In particular, the simulations are used as input for hydrological modeling and further applications such as flood risk assessments. The presented ensemble in this case can be used as a stochastic weather generator treating the single simulations independently. This leads to the production of a quasi-stochastic hydrological discharge data set. Due to the large ensemble size, estimates of high return periods become more robust. However, it has to be mentioned that the composition of the four data blocks to one ensemble restricts the temporal 510 homogeneity. Moreover, the validation showed a positive bias of the ensemble mean which, together with the overestimation of low intensities, requires a bias correction to avoid unrealistic discharges. This application as well as the bias correction of LAERTES-EU will be addressed in a consecutive study.
In this study, we have focused on all-year variances, oscillations, or trends. Future investigations can address a seasonal differentiated analysis of trends and oscillations as well as a more detailed investigation of the spatial distribution of these 515 findings and potential mechanisms behind the observed variability. Previous studies indicated that there is a strong relation between precipitation in Europe and the North Atlantic Oscillation (NAO), especially during wintertime (e.g., Hurrell, 1995;Rîmbu et al., 2002;Haylock and Goodess, 2004;Nissen et al., 2010;Pinto and Raible, 2012). Moreover, Casanueva et al. (2014) found a connection between extreme precipitation and the Atlantic Multidecadal Oscillation (AMO) during the whole year.

525
The linear error in probability space L uses the difference of probabilities ∆C defined as: where ecdf mod,r is the empirical cumulative density function of the model run r, and ecdf obs that of the observation up to precipitation intensity x. The linear error in probability space L r for a model run r is then defined as (Déqué, 2012;Wahl et al., 2017): L r describes the mean value of ∆C r over the entire range of precipitation intensities x grouped into n classes. Using absolute values avoids a compensation of positive and negative values. The better both distributions coincide, the lower the value of L r .
The ensemble mean of L r is given by:

535
with M being the total number of simulation runs.
The model performance on different frequency intervals is further validated using the standard deviation of a gamma distribution σ Γ (Wilks, 2006), which is given by: In this formulation, α is the shape parameter of the gamma distribution, and β its scale parameter.

540
The quantile-quantile analysis uses the Pearson correlation coefficient (Wilks, 2006) given by: with the data series x and y of length N . The range of R is R ∈ [−1; +1] with a perfect anti-correlation at R = −1 and a perfect correlation at R = +1.
The signal-to-noise ration S2N in this study is defined as: with the return level RV of the Gumbel distribution at return period T divided by its 90 % confidence interval at T (Früh et al., 2010). Small values of S2N indicate a more uncertain estimate, high values a more robust one. The Gumbel distribution (Wilks, 2006) is an extreme value type-I distribution and often used for return period estimation. Its cumulative density function (cdf) is given by: with the free parameters β = σ √ 6 · π −1 and α = x − γβ, where σ is the standard deviation of the sample x assuming a normal distribution, and γ = 0.57721 Euler's constant. For x, usually a series of yearly maximum values is used. The relationship between the cdf and the return period T is given by (Wilks, 2006):  Karl et al., 1999;Peterson, 2005) are used in this study. R95pTOT describes the annual total precipitation sum of all values above the climatological 95 % percentile of wet days (RR > 1 mm) during the reference period . The R95pTOT of the year k is defined as: where RR wk is the daily precipitation amount on a wet day during year k, RR p95 is the climatological 95 % percentile, and W the total number of wet days in year k. Analogously, the R99pTOT is defined replacing the 95 % with the 99 % percentile: Appendix C: Trends and Significance 565 A Mann-Kendall Test (Mann, 1945;Kendall, 1955) is performed for the detection of trends and its related significance. To account for possible oscillations within long time series, we first split the complete time series into sub-series with a minimum length of 10 years and up to over 100 years (trend matrix). The Mann-Kendall Test uses a standardized test statistic S τ following a standard Gaussian distribution (SGD). S τ is given by: Here, τ is known as the Kendall's τ and σ 2 τ is the variance of the standard Gaussian distribution (SGD). A detected trend is significant if S τ lies within the upper and lower quantile z of the SGD at a given significance level α with S τ ∈ z α 2 σ τ ; z 1− α 2 σ τ , respectively (Yue et al., 2002). Yue et al. (2002) pointed out some weaknesses of the Mann-Kendall test in case of inherent autocorrelation. To avoid a distortion of the statistic by autocorrelation, Yue et al. (2002) presented the Trend-Free Pre-Whitening (TFPW) method. The 575 first step is the estimation of a linear trend between two time steps t = i and t = j using the Theil-Sen Approach (TSA;Theil, 1950;Sen, 1968). The slope b of this linear regression is given by: In a second step, the original time series x is detrended by subtracting b at each time step t: Afterwards, the lag-1 autocorrelation coefficient r 1 is removed from the trend-free series x : where r 1 is given by: The modified TFPW time series x * results by re-adding the TSA-slope b: This modified time series conserves the trend, but is free of autocorrelation. The Mann-Kendall Test is performed on the TFPW time series x * . According to Yue et al. (2002), TFPW has to be considered in cases with non-zero TSA-slope and significant lag-1 autocorrelation. The significance of a trend or autocorrelation is tested on the 90 % (α = 0.1), 95 % (α = 0.05), and 99 % (α = 0.01) significance level. Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937-1958., Schädler, G., Panitz, H.-J., Keuler, K., Jacob, D., and Lorenz, P.: Evaluation of the precipitation for South-