|Review of revised manuscript “Evaluation of convection-permitting extreme precipitation simulations for the south of France” by Luu et al. (2021)|
The revised manuscript represents a faithful response to most of the comments. The issue of comparing model and observational data at different spatial scales remains unresolved.
To recap, my previous criticism was that the authors were directly comparing results from the 12 km and 3 km resolution models with station data and 1 km resolution observations, before judging which model performs better. I argued that this is not a fair way to judge added value as the 12 km model is designed to represent grid box means at the 12 km scale, not values at the point (station data) or 1 km (gridded observations) scale. Closer agreement of the 3 km model with the aforementioned observations does not, therefore, necessarily mean that the 3 km model is “better” than the 12 km model, but rather likely simply reflects that the observations’ resolution are closer to that of the 3 km model. My argument was that to identify if the 3 km model “adds value” to the 12 km model, one must first upscale the model and observational data to the resolution of the coarsest data (in this case, the 12 km model).
The authors disagreed with my above criticism. Their arguments are summarized below in “Authors C1-4”. My responses follow in “Reviewer C1-4”.
*Authors C1. Model users regularly use coarse-resolution data (e.g. 5 to 50 km) for local climate studies. The 3 km model’s higher spatial variability and improved precipitation at small scales thus represent added value for these users. The authors only focus on to what extent the 3 km model improves the extreme precipitation at local scale.
**Reviewer C1. It is true that some users directly use low-resolution climate data for point- or local-scale studies. This, however, does not mean that those users are correct to do so and is, anyway, of secondary importance to my criticism. If the stated aim of the research is “evaluation” and to “investigate the added value” (see title/abstract), then the fact remains that it is not appropriate to do this at a spatial scale that the 12 km model is not intended to represent. It is trivial that the 3 km model exhibits higher spatial variability (simply because it has more grid cells); added detail is not added value.
The further the model resolution is away from the observation’s resolution, then the less appropriate the comparison. Hence, if the 3 km and 12 km models were to be perfect at their own spatial scales, then the 3 km model must be in better agreement with the point- and kilometre-scale observations, compared to the 12 km model. This does not mean that the 3 km model “adds value”; it simply reflects the different scales the models are intended to represent.
In short, it is not possible to make conclusions on added value if the two models are being compared at different spatial scales.
*Authors C2. Their goal is to “assess the overall improvement against observed station data”, not to disentangle the causes of 3 km model improvement, i.e. resolution or physics. Comparing the 12 km and 3 km models at the same resolution (i.e. 12 km) would only answer whether (or why) the fine-scale resolution (3 km) can improve the larger scale (12 km).
**Reviewer C2. I accept that disentangling the contributions of different resolution and physics to any added value is not the aim of the study, so no problems there. I also agree that comparing the 3 km and 12 km models at the same (12 km) resolution will “only” answer whether the fine-scale resolution adds value at the 12 km scale. For the reasons outlined above, this (12 km) is however the minimum scale at which you can assess the added value.
*Authors C3. There is no “standard” way of evaluating model added value: the appropriate method depends on the scientific question.
**Reviewer C3. I agree, but the scientific question also has to be appropriate. Asking whether 3 km simulations add value over 12 km simulations for representing point- or kilometre-scale observations (without upscaling) is, in my view, not an appropriate scientific question.
*Authors C4. The authors provide an additional analysis in their response where they upscale the 3 km data to the 12 km grid (observations are not upscaled) and re-compare the seasonal maxima (3 h, 1 day) against observations (1 km and stations). Based on this comparison, the 3 km model is deemed to still outperform the 12 km data.
**Reviewer C4. I would like to know how the authors performed the upscaling for the results shown in Figure 4 of the response (this was unfortunately not mentioned). I ask this because the boxmean values for CPS (Figures 2/3, manuscript) and CPS-11 (Figure 4, response) are identical within a rounding error of 0.1 mm, which seems highly implausible.
The correct way to do the upscaling would be to upscale all the 3 hourly (daily) data to the EUR-11 grid, and then *after* that compute the Rx3hour (Rx1day) values. In Figure 4 it looks like the upscaling has simply been performed on the final Rx3hour (Rx1day) results of the original CPS grid. What else could explain the identical boxmean values between CPS and CPS-11?
As a final point, I’d like to add that in my last review I listed a large number of CPM evaluation studies which all, before assessing the added value, upscale their observations and higher-resolution simulations to the scale of the lowest-resolution model. I would be interested if the authors can point to any published CPM evaluation papers where what they are proposing has been done, i.e. no precipitation upscaling prior to evaluation of added value.
- Section 2.2 / scaling. In your description of how the binning works you should also add that you set a minimum of 300 observations per bin in order to avoid undersampling, as stated in your response. This is important for the reproducibility and interpretation of your results.
- L301-303: I suggest adding some of this information to the caption of Figure 7, so that the figure can be understood on its own.