Present and future European heat wave magnitudes: climatologies, trends, and their associated uncertainties in GCM-RCM model chains

. This study investigates present and future European heat wave magnitudes, represented by the Heat Wave Magnitude Index-daily (HWMId), for regional climate models (RCMs) and their driving global climate models (GCMs) over Europe. A subset of the large EURO-CORDEX ensemble is employed to study sources of uncertainties related to the choice of GCMs, RCMs, and their combinations. We initially compare the evaluation runs of the RCMs driven by ERA-interim reanalysis to E-OBS (observation-based 5 estimates), ﬁnding that the RCMs can capture most of the observed spatial and temporal features of HWMId. With their higher resolution compared to GCMs, RCMs can reveal spatial features of HWMId associated with small-scale processes (e.g., orographic effects); moreover, RCMs represent large scale features of HWMId in satisfactorily (e.g., by reproducing the general pattern revealed by E-OBS with high values at western coastal regions and low values at the eastern part). Our results indicate a clear added value of the RCMs compared to their driving GCMs. Forced with the emission scenario RCP8.5, all the 10 GCM and RCM simulations consistently project a rise in HWMId at an exponential rate. However, the climate change signals projected by the GCMs are generally attenuated when downscaled by the RCMs, with the spatial pattern also altered. The uncertainty in a simulated future change of heat wave magnitudes following global warming can be attributed almost equally to the difference in model physics (as represented by different RCMs) and to the driving data associated with different


Introduction
Heat waves can have adverse effects on health and have been reported to lead to excessive mortality rates among people in many regions of the world (Guo et al., 2017). In Europe, the heat waves in summer 2003 (Robine et al., 2008) and 2010 (Barriopedro et al., 2011) are prominent examples. Even in high-latitude areas, such as Scandinavia, heat waves can lead to excess mortality as reported by Åström et al. (2019) for the long and warm Scandinavian summer 2018 (Wilcke et al., 2020). 25 In addition to health problems atmospheric heat waves are often related to water shortages, a decline in agricultural production, and increased risk of forest fires or dieback, all of which can have severe impacts both on natural ecosystems and human society (IPCC, 2014).
The Intergovernmental Panel on Climate Change (IPCC) concluded that the frequency and duration of heat wave lengths have increased in the observed past on a global scale (IPCC, 2018;Seneviratne et al., 2021). Part of the observed changes in 30 frequency and intensity of daily temperature extremes on the global scale since the mid-20 th century has been attributed to increasing anthropogenic forcing of the climate system. Furthermore, projections for the future under scenarios of increasing greenhouse gas forcing indicate a continued increase in both intensity and duration of heat waves (Molina et al., 2020). To alleviate future problems, adaptive measures are needed, even under scenarios with strong mitigation (IPCC, 2022). Climate change adaptation, in turn, requires relevant information about changes both in geographical and temporal extent as well as in 35 intensity and duration of heat waves.
Climate model projections constitute the most prominent information about future climate. International coordinated Climate Model Intercomparison Projects (CMIP), such as CMIP5 (Taylor et al., 2012) and CMIP6 (Eyring et al., 2016), provide large ensembles of global climate model (GCM) projections. GCMs, however, are most often run at coarse horizontal resolutions, implying that they have a crude representation of relevant local and regional processes and often come with strong biases at the 40 regional scale (Luo et al., 2020). As a remedy, to improve the quality of the simulated climate and add value compared to GCMs, empirical statistical downscaling employing a wide range of approaches (Benestad et al., 2008(Benestad et al., , 2018Hertig et al., 2019;Soares et al., 2019) and dynamical downscaling using regional climate models (RCMs) (Torma et al., 2015;Rummukainen, 2016;Strandberg and Lind, 2021) are used, each with its relative strengths. This study focuses on dynamical downscaling. Operated at a higher resolution than their driving GCMs, RCMs can more realistically represent orography, land-sea contrasts, and 45 atmospheric processes such as mid-latitude cyclones. Large ensembles of RCM climate change projections have been produced for different continents under the auspices of CORDEX (the CoOrdinated Regional climate Downscaling EXperiment, Giorgi et al., 2009). Specifically, for Europe, joint efforts in EURO-CORDEX (Jacob et al., 2020) have resulted in more than 100 RCM projections at 0.11°grid spacing under a range of different future scenarios that are now being used for building climate services Rennie et al., 2021). 50 Even if RCMs add value compared to GCMs, they do not come without biases of their own. Notably, when given input from coarse-resolution GCMs that have biases in their representation of the large-scale circulation and sea-surface conditions, RCMs also show biases (e.g., Jacob et al., 2007;Vautard et al., 2020). As RCMs often describe physical processes differently, not only biases in the historical climate, but also future climate change signals can differ from those of the underlying GCM (Coppola et al., 2021). As an example, most EURO-CORDEX RCMs have a more rudimentary treatment of aerosols and their interaction with radiative fluxes and clouds than the CMIP5 GCMs they are forced with. Consequently, discrepancies in future trends for GCM-RCM chains as reported by (Sørland et al., 2018) have been suggested to be related to differences in aerosols and their impact on downwelling shortwave radiation (Jerez et al., 2021).
A univocal and optimal definition of heat wave can be for debate depending on impacts of interest (Perkins and Alexander, 2013;Horton et al., 2016). Most existing heat wave-related indicators describe only a single characteristic of heat waves. The

60
Heat Wave Magnitude Index-daily (HWMId, Russo et al., 2015), a dimensionless magnitude that was designed to take into account both heat wave duration and intensity, represents an integrative approach in classifying heat waves. Indeed, it has been successfully used in a growing number of heat wave studies (e.g., Russo et al., 2016;Zampieri et al., 2016;Ceccherini et al., 2017;Dosio et al., 2018;Molina et al., 2020). Being fairly established, therefore, the HWMId is utilized here for representing heat wave magnitudes.

65
The aim of this study is fourfold. First, we examine GCM-RCM combinations for Europe in a subset of the EURO-CORDEX The HWMId is described as the maximum magnitude of heat waves occurring in a year, where a heat wave is defined as a period of at least three consecutive days with maximum temperature (T max ) above a percentile-based daily threshold for a 75 reference period. Specifically, for a given day-of-year d (from 1 to 366), the threshold is the 90 th percentile of the set of data where ∪ denotes the union of sets; Y ref represents the years within the reference period; W −15,15 d is the 31-day window centered at day d; and, T max,y,i is the daily T max of day i in year y.

80
For each day in an identified heat wave the daily magnitude, M d , is calculated following: for the calculation of the climate change signals in simulated heat wave magnitudes (∆HWMId), termed "recent past" (1981-105 2020), "nearest decades" (2021-2060), and "end of the century" (2061-2100). The uncertainty across simulations is roughly described here by the spread of values (maximum − minimum) considering the limited size of the simulation matrix. The spread along the RCM dimension represents the uncertainty associated with the RCM model physics while the spread along the GCM dimension corresponds to the uncertainty associated with the driving GCM simulations.
In addition to the GCM-RCM combinations that compose the matrix, the EURO-CORDEX initiative also provides a set of 110 evaluation runs, for which the participating RCMs are forced with boundary conditions from the ERA-Interim reanalysis (Dee et al., 2011). This type of simulation, which is often referred to as perfect-boundary conditions run, allows for an in-depth comparison with the observed climate including also its temporal evolution. In the first part of the study, we compared these evaluation runs to observations, as well as to ERA-Interim. The observational data used in the study are from the E-OBS daily gridded data set version 20.0e of the European Climate Assessment & Data (Cornes et al., 2018, ECA&D, www.ecad.eu), 115 which covers Europe at a 0.1°regular grid spacing for the period from 1950 to July 2019. When RCMs are driven by ERA-Interim the reference period for HWMId is the 20-year period 1989-2008 as limited by the short evaluation runs, and when driven by GCMs it is the 30-year period 1981-2010 following Russo et al. (2015). For each dataset, HWMId was calculated upon the original grid points. The area of investigation is bounded by 10°W-30°E and 35°N-70°N and is land-only.
Mean bias error (MBE), root mean square error (RMSE), and Pearson correlation coefficient (r) were adopted as perfor-120 mance indicators for a model in simulating HWMId compared to E-OBS. They quantify the degree of an overall overestimation/underestimation, the degree of closeness in values, and the association in variations, respectively. When these indicators were applied for spatial patterns, they were calculated after HWMId values of all the datasets accounted for other than ERA-Interim were conservatively remapped to the ERA-Interim grids. RMSE and r were also used to determine the similarity in spatial patterns between simulations.

125
To better understand the underlying processes of the simulated ∆HWMId, a preliminary effort was made by exploring the simulated climate change signals in dry conditions and their connection with HWMId. We investigated atmospheric and soil dry conditions, as represented by dry days (with precipitation < 1 mm) and precipitation − evaporation (P − E), respectively.
The simulated climate change signals in annual mean T max were also examined, showing the direct impact of warming.

Evaluation of reanalysis-driven RCMs in simulating historical heat waves
Among various aspects of heat waves, we want to answer the following questions: i) Does the spatial pattern of the climatological mean reveal observed local information? and ii) Does the regional mean show the same signal/response of large-scale climate variations as observations? level of HWMId values (7-9) over eastern Europe. As the RCMs were driven by the same reanalysis, the differences among the RCM simulations should be related to differences in model physics which is an important source of uncertainty in simulating extreme heat waves. Ignoring the spatial pattern and focusing on the percentage of the land area exceeding certain HWMId values ( Fig. 1b), we find a close agreement between the RCM simulations, although a slight overestimation was found at the high-end tail of the HWMId distribution (HWMId≥9 or 10) compared to E-OBS and ERA-Interim. The overestimation for 145 RACMO22E is apparent for most of HWMId values (7-10). According to RMSE and r, REMO2015 has the best performance among the selected four RCMs in representing the spatial pattern of the climatological mean HWMId (Table 2). We also note that the multi-RCM mean has a generally smaller RMSE and a higher r than most of the individual RCMs, probably due to a compensation of deficiencies of each RCM in representing different processes.
Averaged in space, the RCM evaluation runs reproduce generally, but not perfectly, the temporal evolution of the E-OBS 150 HWMId as shown in Fig. 2a. Years with high HWMId values (1994, 2003, 2006, and 2007) are captured by all the RCMs.
However, the RCMs fail in reproducing the ranking of these years by occasionally overestimating or underestimating the HIRHAM5 outperforms the others in reproducing the temporal evolution of the regional mean (Table 3). Like the case of  spatial measures, the multi-RCM mean has also a generally smaller RMSE and a higher r of temporal measures than most of the individual RCMs.

Evaluation of heat waves in GCM-driven RCMs under the recent past climate
A similar comparison, as for the ERA-Interim-driven runs discussed above, was conducted for the GCM-driven runs but focused only on the spatial patterns. The influence of the shift of driving data from ERA-interim to GCM simulations can be 165 seen by comparing Table 4 with depend strongly on the model (e.g., RCA4 has a relatively weak spatial r).  RCMs improves the representation of the observed spatial pattern by reproducing the west-east gradient, as well as revealing small-scale processes such as orographic effects which are not well represented by the GCMs due to their coarse resolution.
In general, the RCMs (other than a few cases of RACMO22E and RCA4) show smaller RMSE and higher r compared to their driving GCMs (Table 5). Similar to the evaluation runs (Table 2), the multi-RCM (row) means driven by a GCM simulation show better performance compared to most of the individual RCMs. This is also true for the GCM dimension; i.e., for each 185 RCM, the ensemble mean across the three driving GCMs outperforms the individual ensemble members.
Furthermore, we also looked into the differences along the GCMs in representing HWMId and how the RCMs respond to them when driven by these GCMs. We want to point out that there is no accordance on which GCM that performs best among the three in reproducing the spatial pattern of HWMId in E-OBS when considering different performance indicators; e.g., EC-EARTH has the smallest MBE/RMSE, whereas HadGEM2-ES has slightly higher r (Table 5). Meanwhile, the "best" GCM 190 according to a performance indicator may not necessarily secure a best downscaling among the different driving GCMs for an RCM. We also observed a large spread in the regional average across the GCMs, which is reduced by all RCMs. Moreover, for  each RCM driven by the three GCMs, the downscaling behaves consistently despite the large difference in the spatial pattern of HWMId between the driving GCMs. This can be reflected by the fact that, for each RCM, we observed lower RMSE and higher r (of spatial measures) between the RCM simulations than between its driving GCMs (Table S1). Likewise, when driven by one 195 GCM, the simulations of different RCMs also most often tend to be more similar to each other than to the driving GCM (i.e., higher RMSE and lower r within the "GCM" column compared to other columns in Table S2). Thus, it is interesting to explore the influences of driving data versus model physics on the uncertainty for the RCMs in simulating heat wave magnitudes under the recent past climate.
Along the RCM dimension of the matrix (i.e., RCMs with the same driving GCM), the ensemble spread of HWMId values 200 is on average close to one fifth of the ensemble mean on a continental basis (Fig. 4). A slightly lower spread/mean ratio is observed along the GCM dimension (Fig. 5), indicating that the uncertainties associated with driving data are of similar magnitude as those associated with model physics. However, different features can be observed in the spatial pattern of the ensemble spread along the two dimensions. As presented in Fig. 4, the ensemble spread along the RCM dimension shows the orographic effects on heat wave magnitudes across the RCMs as one of the major sources of uncertainty that is associated with model physics. Aggregating on the GCM dimension (i.e., calculating mean and spread for each RCM with different GCMs), the ensemble means (first row of Fig. 5) reveal a similar spatial pattern, whereas the spreads (second row of Fig. 5) show considerable differences in the spatial pattern. A quantitative analysis regarding the similarity based on RMSE and r is presented in Table S3. Higher r values between RCMs than between any RCM and its driving GCMs, in combination spatial pattern in each of the simulations. The HWMId increases strongly at the end of the century, with approximately more than a factor of three compared the nearest decades. The HWMId patterns do generally stay similar within the two observed time periods, according to the spatial according to the spatial r (Fig. 6b) for the two periods. Similar to the results of the 220 recent past climate (Sect. 3.2), a large spread in ∆HWMId is seen across the GCMs (5.8-12.8 and 21.6-48.4 for the nearest decades and the end of the century, respectively; Fig. 6). The RCMs decrease the spread (4.9-8.7 and 15.0-31.9 for the nearest decades and the end of the century, respectively; Fig. 6), reflecting less change than two out of the three driving models (i.e., HadGEM2-ES, NorESM1-M). The spatial pattern of ∆HWMId among the RCM simulations is similar both along the RCM dimension and the GCM dimension: indicating a stronger rise in northern and southern Europe but comparatively moderate for 225 the central domain. This partly differs from the GCM simulations, which tend to show a more pronounced south-north gradient in the increase in HWMId. It also differs from the climatological mean HWMId under the recent past climate, which displays a generally west-east gradient (Fig. 1). The ensemble spread, along either the RCM dimension (Fig. 7) or the GCM dimension ( Fig. 8), has a magnitude most often exceeding half of the ensemble mean on a continental basis and to some extent follows the spatial distribution of the ensemble mean (with spatial r from 0.50 to 0.92; not shown). This suggests that the driving In the end of the century (2061-2100), the entire domain is projected to reach HWMId values higher than those experienced until now under the high emissions scenario RCP8.5. The rise occurs across the whole probability range at an approximately 240 exponential rate; i.e., increases between every two neighbor periods are approximately the same as mapped to the logarithmic scale applied to Fig. 9. The rise is even stronger for the tail of HWMId distribution. As indicated in Fig. 6 the RCMs show a tendency to smoothen the change signal in the driving GCM simulations, particularly when driven by HadGEM2-ES and NorESM1-M. The only exception is RCA4 downscaling EC-EARTH. Again, we note that the RCMs generally are more alike than their driving GCMs, though RCA4 shows a pattern closest to the GCMs. Finally, we observe that the spread between the 245 RCMs, and hence the uncertainty, increase a lot over time.
Probability distributions are also investigated for the region-wide annual HWMId values in the defined three periods, as shown in Fig. 10. As revealed by E-OBS and ERA-Interim, the region-wide HWMId for the recent past climate is represented by a positively-skewed distribution, or rather a quasi-log-normal distribution (as demonstrated by the quasi-normal shape of the distribution in Fig. 10b with a logarithmic-scale x-axis). As discussed for the ERA-Interim driven simulations (Fig. 2), also the 250 GCM-driven RCMs can to some extent capture the shape of the distribution as well as the median, but with a wider value range for each simulation. Most of the simulations keep the shape of distribution as the values increase in the future (especially for the nearest decades), whereas some (e.g., HadGEM2-ES, RCA4 driven by EC-EARTH, and RACMO22E driven by NorESM1-M) display a distribution even with a negative skewness for the end of the century (mostly visible with logarithmic axis presented in Fig. 10b), which indicates that high HWMId values would become more common under the high emissions scenario RCP8.5.

255
In addition to increasing levels of HWMId, also the spread defined by the ranges increase in the future. With a logarithmic-

Added value of RCMs compared to GCMs
With more small-scale processes being resolved, added value is expected from dynamical downscaling with an RCM compared to its driving GCM on the regional scale. Indeed, a large number of previous studies (e.g., Torma et al., 2015;Rummukainen, 2016;Strandberg and Lind, 2021) have reported such added value by RCMs in many respects. Thus, it is of great interest if RCMs create added value also for heat wave magnitudes. To realistically represent HWMId, a model must be able to capture 265 not only the mean magnitude but also the intra-annual temporal evolution of daily T max . Before discussing any potential added value we assess to which degree the studied RCMs can represent the observed HWMId under the historical climate when driven by reanalysis data (i.e., perfect boundary conditions). We showed in Sect. 3.1 that the ERA-interim-driven RCM runs generally reproduce the spatial and temporal patterns of the observed annual heat wave magnitudes over Europe. However, the RCMs clearly add their own signatures to the results leading to a larger variability in the spatial distribution of HWMId than reflected 270 in E-OBS and ERA-Interim (Fig. 1). Even larger variability was found in the HWMId spatial distribution of the years 1994, 2003, 2006, and 2007 (Fig. S4-S7) than for the climatological mean, although these events, which were reported by the news headlines (Russo et al., 2015), are generally captured (Fig. 2a) by the RCMs when driven by ERA-interim. Another interesting observation is that RCA4, which performs worse than the other RCMs (  RCMs, and which events are triggered inside the RCM domain and therefore possibly missed or reproduced differently by the different RCMs. One example of a large-scale-driven event could be the year 2003, where the HWMId is picked up clearly by all the RCMs in its high intensity (Fig. 2a).
The three GCM simulations accounted for, roughly reproduce the observed patterns for the recent past climate, but miss 280 out on details in structure and amplitude (Fig. 3). The RCMs downscaling these GCMs do well in adding more detailed geographical patterns and, furthermore, pull the results closer to the observations. Again, the RCMs add their own pattern, e.g., the lower HWMId values over Eastern Europe in REMO2015. Signatures of the RCMs can also be demonstrated by the high similarity in HWMId spatial pattern between the simulations for each RCM despite a large difference between the driving GCMs (Table S1). Certainly, the RCMs also inherit signals from their driving GCMs. This is clearest for RACMO22E among 285 the four RCMs, as the simulations with RCAMO22E show the highest r (except when driven by NorESM1-M) and the lowest RMSE against its driving GCMs (Table S2). Despite the inherited signals from their driving GCMs, the simulations of different RCMs when driven by the same GCM generally share higher similarities than with the driving GCM (Table S2) Some studies (e.g., Schiermeier, 2010;Kerr, 2011) show that uncertainty may increase when downscaling GCMs with RCMs 290 as biases from the GCMs are conveyed to the RCMs and RCMs additionally add their own biases, referred to as the 'cascade of uncertainty' (e.g., Wilby and Dessai, 2010). However, many other studies (e.g., Torma et al., 2015;Di Luca et al., 2016;Rummukainen, 2016;Sørland et al., 2018;Strandberg and Lind, 2021) indicate that RCMs can also add value upon the driving GCM simulations. This study demonstrates the added value for heat wave magnitudes that have so far not been studied as extensively as for other aspects of climate and climate change, reflected in adding more detailed geographical patterns and 295 pulling the results closer to the observations ( Fig. 3; Table 4  Moreover, RCMs may also more realistically represent some atmospheric processes relative to the GCMs (e.g., Prein et al.,300 2016). Our analysis of the ensemble spread along the GCM dimension, reflecting uncertainty associated with driving data, reveals that the RCMs alter the spatial HWMId pattern from their driving GCM simulations, and that the alteration is different between the RCMs (Fig. 5 and Table S3). This, on the other hand, suggests that the uncertainties of GCMs in simulating heat wave magnitudes would be transformed by RCMs in a complex manner, rather than simply inherited, due to the nonlinear nature of model dynamics and physics, thus questioning the concept of 'cascade of uncertainty'.

305
To reveal the specific factors/processes behind the added value of RCMs in simulating heat wave magnitudes, however, further analysis is required. Table 4 and Fig. 3 clearly show that RCMs capture the HWMId better than the GCMs for the recent past climate. Some spatial features of HWMId showing orographic traces, thereby considered to be related to orographic effects, can be seen as one aspect of the added value of RCMs as GCMs cannot represent such features due to too coarse spatial resolution. An interesting finding of this study is that the orographic effects are however represented differently across the 310 RCMs (i.e., the large ensemble spread along the RCM dimension; Fig. 4), suggesting that the representation of orographic effects, possibly related to dynamical/thermodynamical interaction with parameterizations, is meanwhile one of the major sources of uncertainty. This supports the statement by Sørland et al. (2018) that increasing the spatial resolution should not be the single factor contributing to the added value of RCMs.

Possible processes behind the simulated climate change signals 315
The RCMs, as well as the driving GCMs, project a rise in HWMId values at an exponential rate under RCP8.5 on the European continent, as we can observe the linear shape of time series when plotting on a logarithmic scale (Fig. S8). As a result, heat waves more severe than the most severe one that has been recorded until now are projected to occur almost every year at the end of the century if we follow the high-end emission pathway (RCP8.5). According to the definition of HMWId, the approximately exponential rise can be expected because the projected warming will on one hand increase the daily magnitude (Eq. 2) and on 320 the other hand extend the duration simultaneously. Apart from the agreement on the future severity of heat wave magnitudes under RCP8.5, the RCMs modify the future climate change signals projected by the driving GCMs, tending to moderate the rise in HWMId values and also deliver some different features in the spatial pattern. The underlying drivers of heat waves may be related to land-atmosphere interactions as well as atmospheric processes (Horton et al., 2016;Liu et al., 2020). For example, plant physiological CO 2 response may have a positive effect on T max (Schwingshackl et al., 2019) and thereby on HWMId 325 values. Understanding how RCMs differ from GCMs in representing processes that modify the climate change signals can have implications for how to utilize model projections in studies on climate change mitigation and adaptation.
Here, rather than directly looking into the associated processes, we investigate the corresponding climate change signals in annual mean T max (Fig. S9), dry days (Fig. S10), and P − E (Fig. S11). As the HWMId is directly calculated from daily T max , the increase in annual T max provides a warming background that the surge in HWMId is associated with. An increase in the 330 annual number of dry days may indicate a higher tendency for longer warm spells and hence a rise in HWMId. Compared to the number of dry days, P − E (or effective precipitation if we do not consider the variation in runoff) is more strongly related to dry soil conditions; the correlation between effective precipitation and HWMId may reveal the effect of land-atmosphere feedbacks, as the deficit in soil moisture may reduce latent heat flux allowing temperatures to rise further (Zhang et al., 2020).
The RCMs dampen the increase, and to some extent modify the spatial pattern, in the annual mean of daily T max in 335 HadGEM2-ES and NorESM1-M ( Fig. S9 and Table S5). This could potentially explain the more moderate rise in HWMId simulated by the RCMs compared to their driving GCMs. EC-EARTH and its downscaling by the RCMs show a similar level of increase as well as a similar spatial pattern in the annual mean of daily T max (Fig. S9 and Table S5), and we recognize that they also project a similar rise in HWMId (Fig. 6). According to Schwingshackl et al. (2019), these results may be linked to the positive effect of plant physiological CO 2 response on T max , since the GCMs except for EC-EARTH consider this response but all the RCMs do not. That RCMs are better at representing near-surface processes that are important for reducing the drying feedback is also considered a reason (Sørland et al., 2018). For the annual number of dry days (Fig. S10), on regional average, the RCMs also reduce the increase from their driving GCMs, but with a few exceptions (e.g., HIRHAM5 downloading EC-EARTH and HadGEM2-ES). For annual P − E (Fig. S11), on a continental basis, the RCMs, except for HIRHAM5, show a smaller decrease (or larger increase) than the driving GCMs. In general, these results are consistent with Coppola et al. (2021) 345 who discussed similar indices.
We further examined the spatial correlation between the change in HWMId and that in each of the three indices accounted for, to find any trace whether, and if so how, the drying processes (of either atmospheric or soil) regulate the spatial pattern of ∆HWMId. For the detailed spatial pattern of ∆HWMId, it cannot be explained by the change in the annual mean of daily T max alone, even though HWMId is calculated based only on daily T max . This is especially true for GCM simulations as they 350 have a poor spatial correlation between ∆HWMId and ∆T max (Fig. S9). All the GCM and RCM simulations agree on the high ∆HWMId in southern Europe (Fig. 3), which may be amplified by the local drying trend projected ( Fig. S10 and S11), while in northern Europe where rapid warming is projected (Fig. S9), high ∆HWMId is seen in the RCM simulations only (Fig. 3). The r values in Fig. S9-S11 read that the general warming, compared to drying, plays a small role in regulating the spatial pattern of ∆HWMId in GCM simulations, different from the case of the RCMs. This echoes the varying importance of 355 drying in influencing ∆HWMId in northern Europe for GCM and RCM simulations; i.e., the wetting trend in northern Europe, projected by both the GCM and RCM simulations ( Fig. S10 and S11), seems to dampen the local ∆HWMId within the GCM simulations, but has little impact on the local ∆HWMId within the RCM simulations. It is interesting and deserves further study which factors cause the varying importance of drying in influencing heat wave magnitudes and how differently GCMs and RCMs represent these factors.

360
As a preliminary effort, concerning only the spatial pattern, the above analysis is however far from building clear causal links. Moreover, we are not yet clear about what is leading to the weaker drying trend in the RCM simulations. Atmospheric blocking, with adiabatic warming of sinking air and anomalous clear-sky radiative forcing, is an important driver of heat waves (e.g., Bieli et al., 2015;Schaller et al., 2018). According to Masato et al. (2013), the three CMIP5 GCMs assessed show a decrease in summertime North Atlantic blockings and an increase in blockings over eastern Europe or Russia indicating 365 an eastward shift of the blocking activity. This implies that the underlying processes of ∆HWMId are possibly beyond the atmospheric dynamics. It is of interest whether, and if so how, the RCMs modify the climate change signals of atmospheric blocking from their driving GCMs, and whether this modification is related to the differences in ∆HWMId patterns between GCMs and RCMs as presented here, which is worth further study, though representing atmospheric blocking is considered a challenge (e.g., Masato et al., 2013;Davini and D'Andrea, 2016;Jury et al., 2019). Regarding the representation of the effect 370 of land-atmosphere interactions, investigation detailed into different sets of parameterizations in GCMs and RCMs as well as some additional sensitivity experiments may be necessary for a better understanding.

Other matters
A full simulation matrix without gaps facilitates a fair comparison after aggregating along either the GCM or the RCM dimension. This requisite limits the size of the RCM simulation matrix available for analysis. The limited size of the GCM-RCM 375 simulation matrix could be a shortcoming influencing the robustness of the results in the study, especially for the uncertainty analysis where the uncertainty is described by the spread (maximum − minimum) across three or four ensemble members. In fact, we conducted the same analyses upon another GCM-RCM simulation matrix (GCMs: EC-EARTH, HadGEM2-ES, and MPI-ESM-LR; RCMs: CCLM4-8-17, HIRHAM, RACMO22E, and RCA4) and derived similar results (not shown), which can alternatively support the conclusions herein.

380
The HWMId can also be applied to other temperature variables, but with different processes/impacts involved. For example, the HWMId applied to daily minimum temperature serve a measurement of heat wave magnitude taking into account also the nighttime cooling effect (Russo et al., 2015). As another example, Apparent Heat Wave Index (AHWI, Russo et al., 2017) is the HWMId applied to daily apparent temperature, which considers also the impact of air humidity on human beings. Such variants of HWMId are being considered for future studies.

Summary and conclusions
By deploying the HWMId index, the study addresses how four different RCMs downscaling i) reanalysis and ii) three different GCMs, represent European heat wave magnitudes.
Initially, the performance of the RCMs in reproducing historical heat wave magnitudes is evaluated by comparing the ERA-Interim-driven evaluation runs of the RCMs with E-OBS. It shows that the RCMs generally capture most spatial and temporal 390 features of the observed HWMId when considering climatological mean and regional mean, respectively. Our results also prove the added value of RCMs over GCMs in representing the observed heat wave magnitudes. Compared to the driving GCMs, the RCMs generally have lower RMSE and higher r against the observational data for the climatological mean of HWMId values under the recent past climate. The RCMs generally improve the spatial pattern of HWMId across the European continent compared to the driving GCMs. In addition, the RCMs reveal some small-scale features (e.g., relating to orographic 395 effects) that the GCMs fail likely due to their coarse resolutions. The closest agreement with observations is seen for the RCM ensemble mean.
A rise in HWMId at an exponential rate is projected consistently by all the GCM and RCM simulations accounted for in the study, probably because the warming boosts both the intensity and duration of heatwaves. However, the RCMs modify some feature of the climate change signals in the driving GCM simulations. A somewhat more moderate rise across the European 400 continent is projected by the RCMs, as a corresponding result of the reduced warming. The RCMs also differ from the driving GCMs in the spatial pattern of the climate change signals of HWMId.
We also analyzed the uncertainties of the GCM-RCM simulation matrix in simulating heat wave magnitudes. The results show that the uncertainty associated with choice of RCM is of similar importance as with driving data. The ensemble spread-/mean ratio is approximately one fifth for the present-climate HWMId and over half for the climate change signals. A major 405 source of the uncertainty associated with the RCMs appears to be associated with the representation of orographic effects.
The RCMs reduce the large ensemble spread across the GCM simulations though, especially for the climate change signals in HWMId. Moreover, no consistent spatial pattern is observed in the ensemble spreads along the GCM dimension for different RCMs. Consequently, the results indicate that the uncertainties of GCMs in simulating heat wave magnitudes would not be simply inherited by RCMs but are transformed in a complex manner due to the nonlinear nature of model dynamics and 410 physics.
Code availability. All the analyses were done using Python packages.
Author contributions. CL set up the analysis framework with the scientific contributions of EK and RW. CL produced the figures and tables.
CL wrote the publication with important contributions from EK, RW and DC.