|I think that the revised manuscript is in a good shape and should be ready for publication if considering a few remarks that I describe in the following.|
It remains still unclear to me how the model ensemble was chosen. When considering Vautard et al. (2020), also MPI would have simulations for all 4 RCMs selected. Why is thus, e.g., the MPI model not included? If only a subset of available models is selected for a study, a thorough justification is needed to guarantee that not only models that support some results are chosen, while excluding others that would yield contradict these results. I thus strongly encourage the authors to clearly explain and justify the selection of models in the manuscript.
I appreciate the efforts that the authors made to relate the observed changes in HWMId to drying trends, but I think that the discussion of this remains too short. Currently, it is basically summarized in only one sentence “The r values in Fig. S9–S11 read that the general warming, compared to drying, plays a small role in regulating the spatial pattern of HWMId in GCM simulations, different from the case of the RCMs.”
I would thus ask the authors to expand the discussion about the influence of dryness on HWMId. In particular, I think that the varying importance of drying in the south and north (strong influence in southern Europe but rather unimportant in Northern Europe) deserves some more discussion.
Furthermore, the manuscript should be thoroughly scrutinized regarding grammar, orthography, and sentence structure. Although I acknowledge that the text has improved and is well understandable in general, it lacks clarity in some instances, which I think can mostly be resolved by improving the language.
Specific remarks (all line numbers refer to the manuscript version with tracked changes):
Line 5: Add "observation-based estimates" and/or shortly explain what E-OBS is.
Line 9: It is unclear what “west-east gradient” refers to, as this has not been introduced before and it is also unclear to which dataset “reproducing” refers to.
Line 44: I would rather use „regional climate models“ instead of “a regional climate model” to highlight that there are several RCMs available (and not just one)
Line 67: “the EURO-CORDEX collection”: This sentence implies that GCM-RCM combinations of the whole collection are used, but afterwards only a subset is applied. I would suggest making clear from the beginning that only a subset is used (see also my general comment above)
Line 75: Add “maximum temperature (Tmax)”, otherwise it remains unclear what Tmax is.
Line 84: Add “the calculation of Tmax,ref,25p and Tmax,ref,75p…”
Line 86: “and somehow makes calculation more stable.” In my eyes, this statement does not create trust in the results. What exactly does it mean? Or can it be removed? I think that it should also be noted here whether any tests have been performed to check that the results are in fact similar when using the two different approaches to calculate Tmax,ref.
Lines 97-99: see my general remark about justification of the selected model ensemble.
Lines 99-100: This sentence should be revised, as it contains duplicate statements.
Lines 115-117: The text should shortly explain why the different reference periods are necessary. And I think it would be good to also highlight that the reference period for ERA-Interim-driven runs is 20 years, while it is 30 years for the GCM-driven runs.
Line 124-125: What does “As a background of warming” mean?
Line 194: Replace “in no way” by “not well”, otherwise it sounds exaggerated.
Line 233: Here, it should be highlighted that RCP8.5 was used.
Line 234: I can still see some differences in the maps, and thus I would suggest replacing this sentence by something like: "The HWMId patterns do generally stay similar within the two observed time periods, according to the spatial r."
Line 240-241: I would suggest changing “compared to the driving HadGEM2-ES and NorESM1-M” to “than two out of the three driving models (i.e., HadGEM2-ES, NorESM1-M)”
Line 257: I would again highlight that this is only true for RCP8.5, thus add “under the high emissions scenario RCP8.5” at the end of the sentence
Line 272: Again, I would change: “become more common” to “would become more common under the high emissions scenario RCP8.5”
Line 287: I would add “RCM” before “runs”
Lines 317-319: I think, "information" is not the right term, as it is much more than just "information" that is added. Maybe “details” or “additional processes”? Or maybe you have a better idea?
Line 319-320: I think, this statement needs a reference.
Line 324: I don't think that one study is enough to "reject" this hypothesis in general. Maybe better use "questioning"?
Lines 328-330: But isn't that the most known and most prominent added value of RCMs? I would not see this as a new finding.
Line 337: “The exponential increase is patent“: This sound rather exaggerating to me, better find a more neutral formulation.
Line 343: Replace „this scenario“ by “RCP8.5”
Line 409: Which are the GCMs and RCMs that have additionally been analysed? These should be indicated in any case, as otherwise the statement cannot be tested or reproduced.
Lines 413-415: Check sentence structure
Line 421: I cannot fully support this statement. If one considers, e.g., the summer of 2003, it is obvious that RCMs do not fully replicate the observed HWMId patterns (Figure S5).
Line 429: Add „ of heatwaves” at the end of sentence.
Table 1: I think it would be good to indicate the resolution in degree latitude and degree longitude as well. I personally cannot directly translate T159L62 or N96L38 into a grid resolution in lat/lon.
Figure 3: red rectangle -> blue rectangle (in caption)
Figure 4: 2020 -> 2010 (in caption)
Present and future European heat wave magnitudes: climatologies, trends, and their associated uncertainties in GCM-RCM model Chains by Lin et al.
Lin et al. used dynamical downscaling to analyse heatwaves based on simulations carried out with regional climate models (RCMs) from the Euro-CORDEX programme. A particularly relevant topic of this paper is that the authors investigated if there is any added value in the representation of heat waves in the RCMs compared to the driving GCMs. It is an interesting topic definitely worth pursuing.
A general remark is that all researchers discussing evaluation and use of GCM results on regional scales ought to read the paper by Deser et al (2012; DOI:10.1038/nclimate1562), and citing it in a study like this thus should be required. The findings of Deser et al. suggest that the small number of GCMs selected here is insufficient for a proper analysis of future outlooks and model evaluation, due to pronounced chaotic regional variability on decadal scales.
The regional climate modelling community also still seems to exhibit a ‘silo thinking’ behaviour, and in order to try to make som progress in the general thinking about downscaling, I would urge that this paper by Lin et al. also includes work based on empirical-statistical downscaling (ESD). Many papers on RCMs ignore ESD, which becomes invisible and under-appreciated, and this unfortunately seems to create an attitude that RCMs suffice - hence many of the climate services in Europe do not consider ESD. I suspect most people working with RCMs don’t read the literature on ESD, but I think there are benefits from consolidating the two approaches - in particular when it comes to the evaluation of RCMs. There are also a few examples of ESD applied to heatwave statistics that merit a mention in the context of this paper (e.g. DOI:10.5194/ascmo-4-37-2018). Nevertheless, ignoring ESD is a weakness, although Lin et al. give a good summary of the limitations of RCMs. RCMs and ESD make use of different sets of assumptions and have different strengths and weaknesses independent of each other, and hence a combination of the two makes the results more robust.
Often the most severe effects of heatwaves are connected with night-time temperatures not cooling off. It is therefore also of interest to use a heatwave index based on daily minimum temperatures and not the daily maximum. The most pronounced temperature trends also are those of the nights.
It would be interesting to see the statistical distribution of yearly HWMId values - are they normally distributed? (E.g. is the central limit theorem valid for this statistic aggregated over Europe?) One way to evaluate the models is to compare their statistical distributions (e.g. Kolmogorov-Smirnov Test).
I was a bit surprised by Fig.1 that seems to indicate more heatwave activity in the Nordic countries and less further south on the continent. This also seems to be the case for EOBS and ERAINT - does that mean that perhaps HWMId doesn’t represent the typical heatwave reported by the news headlines? It’s defined in terms of local variability (IQR) and autocorrelation - and not on any threshold value, as far as I read this paper. At least, this warrants some comments.
Does the result that all RCMs show less agreement with E-OBS in RMSE and r compared to that of ERA-Interim suggest that these RCMs don’t add value to that of the global model? Or could it be differences in heat fluxes, cloudiness and topography of the driving and nested models? Perhapst the model domain is so large that the RCMs generate their own dynamics within the interior of their lateral boundaries? Or have they involved spectral nudging to avoid that? See e.g. DOI:10.1007/s00382-022-06219-y (it’s also a useful paper to discuss in this context). These questions certainly merit some discussion. The results are nevertheless useful and interesting as they suggest that differences between the RCMs matter.
I’m not sure that I understand Table 4 and the use of MBE, RMSE and correlation for results derived from GCMs since we don’t expect the GCMs to be synonymous with the real world and hence no correlation with observed heatwaves. The only way to evaluate the downscaled results from GCMs is through statistical properties such as statistical distributions and parameters. But perhaps Table 4 shows the correlation in space rather than over time? If so, this ought to be explained more explicitly and clearly. Also if the appearance of the number of heatwaves more or less follows a random process, then we’d expect that it over a given period will follow a Poisson distribution - this can be assumed to be true for both models and the real world. Then the number of observed heatwaves can be compared to a statistical distribution of corresponding number of heatwaves based on the model ensemble by assuming a Poisson distribution (this works if the ensemble is considerably greater than 30 independent runs). Is this possible, or does the HWMId statistic suffice? Also, so-called ‘common Empirical Orthogonal Functions’ can be used to compare spatial structures and the covariance structures in different data sets - it’s an elegant maths-based approach that is surprisingly uncommon. However, this is more general and not specific for a small selection of extreme events. But regarding my comment on Fig 1, I’m a bit unsure what HWMId really represents. Perhaps it also may be of relevance here to mention that one indicator of trends in extremes, including an increasing severity of heatwaves, can also involve an analysis of record-breaking events. There is some literature on this subject connected to climate change.
The most rapid warming in northern Europe is during winter, but maximum daily temperatures are highest in summer, and it’s only summer that defines HWMId? (L348)
The point about ‘cascade of uncertainty’ is a myth and forgets that each step of analysis also introduces new information (or constraints) in addition to uncertainty. It’s only sensible with several model stages as long as we introduce more information than uncertainty for each step (see e.g. DOI:10.1038/NCLIMATE3393). In fact, downscaling can be considered as an act of adding new information to that already provided by GCMs: information about how local geography influences the local climate (as in this case) and information about how local climates depend on the ambient large-scale conditions and teleconnections that the GCMs skillfully reproduce.
In summary, the tiny sample of GCMs in this study severely limits the application of these results and there were some points which were unclear and needed elaboration, as pointed out above. One way to improve this is to extend the ensemble of GCMs to the whole of CMIP5 (CMIP6?), and then compare those three selected here in this study with the larger set of GCMs. There are also some issues that merit more discussion, as mentioned above. I also think it’s useful to discuss other definitions of heatwaves than HWMId, even if this paper focuses on just this fairly established indicator. Furthermore, it’s important to consider ways to connect these results with what can be delivered by ESD (e.g. much larger ensembles than Euro-CORDEX), and in general I suggest that papers on downscaling that ignore one of these strategies do not merit publication.
L52 “hace” is misspelt.
Fig. 2 caption: ‘Scott’s rule’ needs a reference.
L.188: Missing “there” in “shows a similar pattern to the ensemble mean (first row of Fig. 5) but exists considerable differences in the spread (second row Fig. 5) of the RCM ensembles”?