Selecting a climate model subset to  optimise key ensemble properties

Herger, Nadja; Abramowitz, Gab; Knutti, Reto; Angélil, Oliver; Lehmann, Karsten; Sanderson, Benjamin M.

doi:https://doi.org/10.5194/esd-9-135-2018

Articles | Volume 9, issue 1

https://doi.org/10.5194/esd-9-135-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 3.0 License.

https://doi.org/10.5194/esd-9-135-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 3.0 License.

Articles | Volume 9, issue 1

Research article

|

21 Feb 2018

Research article |

| 21 Feb 2018

Selecting a climate model subset to optimise key ensemble properties

Nadja Herger, Gab Abramowitz, Reto Knutti, Oliver Angélil, Karsten Lehmann, and Benjamin M. Sanderson

Download

Final revised paper (published on 21 Feb 2018)
Supplement to the final revised paper
Preprint (discussion started on 03 Apr 2017)
Supplement to the preprint

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review', Anonymous Referee #1, 16 May 2017
- AC1: 'Response to Referee #1', Nadja Herger, 25 Jul 2017
RC2: 'Review Selecting a climate model subset to optimise key ensemble properties', Anonymous Referee #2, 28 Jun 2017
- AC2: 'Response to Referee #2', Nadja Herger, 25 Jul 2017

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

ED: Reconsider after major revisions (11 Aug 2017) by Fubao Sun

AR by Nadja Herger on behalf of the Authors (22 Aug 2017) Author's response Manuscript

ED: Referee Nomination & Report Request started (10 Sep 2017) by Fubao Sun

RR by Anonymous Referee #2 (13 Oct 2017)

RR by Anonymous Referee #3 (24 Oct 2017)

Suggestions for revision or reasons for rejection

General comments

In this paper the authors develop a framework to evaluate the process of selecting a subset of simulations from a large multi-model ensemble (CMIP5). This subject is very important because climate model data users often need to use only a small subset of simulations given limited resources for data processing. However, research groups often face many difficulties when selecting simulations because there is no widely accepted approach to do that, and this highly depends on the climate application. Overall, I found the paper to be insightful and well written, and it should ultimately be considered for publication in ESD after the following minor modifications.

The paper introduces three approaches to select a subset of N simulations from a larger ensemble. These approaches are compared within a common benchmark, that is the equally-weighted ensemble averaging of historical climatology among the selected simulations. The first selection method consists of randomly choosing a N-subset of simulations. This procedure can be repeated several times in order to cover the range of uncertainty associated with a random selection, for instance according to the error of the ensemble mean compared with observed climatology. The second method is the "performance ranking ensemble" in which the N best simulations are selected (according to their individual error in reproducing the observed climatology). The third approach is the "optimal ensemble" by which simulations are selected by minimizing a cost function based on 3 terms: 1) mean square error of the ensemble mean, 2) mean square error of individual members, 3) a measure of model dependence.

An important result in this study is that the ensemble mean of a performance ranking ensemble will perform poorly compared with that of an optimal ensemble, and is comparable with the mean of randomly selected ensembles. This is due to the fact that selecting only the best simulations will leave common biases among them, which will not cancel through the averaging procedure. On the other hand, the optimal ensemble allows to minimize the error of the ensemble mean, discard poorly performing models and maximize the cancellation of errors among simulations. Hence, more independence can be expected between simulations of a same optimal ensemble, while several initial conditions members of a single good model can be part of the same performance ranking ensemble. The paper shows well that the optimal ensemble is the best approach compared with both random and ranking ensemble selections. The way the optimal ensemble is selected is based on a flexible cost function, where weight can be applied to the existing terms while new terms can be added as well (e.g. maximizing the spread among future projections to reduce overconfidence).

I think this paper allows to get new insights on the impact of selecting a subset of simulations from a large ensemble. However, one weak point is the lack of information about how to use this tool in real life for practitioner and what an "optimal ensemble" actually means in this context. For instance, let us investigate the example of regional climate modelling as given in the manuscript. We first assume that a group can only afford to dynamically downscale 5 GCM simulations with their RCM and that they have defined their own cost function. If they hence use the current tool to optimally select a 5-GCM ensemble to downscale with their RCM, and that a few months later after starting the RCM simulations, they discover that there was a bug in the experiment of GCM #5 and that it should be discarded. It is very likely that GCM #1-4 will no more be an optimal subset of size 4. Similar situation would happen if they realize they can afford producing one more simulation, so they would need to select one more GCM and so the new 6-GCM ensemble will neither be optimal. The concept of an optimal ensemble implies that the selection of one member depends on the other ensemble members. I think this is an important limitation in the applicability of the current method to real-life situations. Moreover, the fact that for each ensemble size there is a ranking of several ensembles is difficult to interpret. It seems to me that for slight differences in RMSE and in the cost function, many other different ensembles are possible. So it should be explained in more details how practitioners should deal with the complexity of coexisting similarly optimal ensembles.

Another point that should be improved is the last part of the introduction, which doesn't clearly explain the structure of the paper (as there are many subsections) and what is aimed to be achieved. There are also few explanations in the text that are not very clear or lacking some details. See specific comments below.

Specific comments

- P3L19-21 "Regional dynamical downscaling presents a slightly different problem to the one stated above, as the goal is to find a small subset that reproduces certain statistical characteristics of the full ensemble." I know what the authors mean but this paragraph is lacking context about regional climate models, whose goal is to obtain high resolution climate simulations based on lateral boundary conditions taken from GCMs or reanalyses. See for instance:
+ Laprise, R. (2008) Regional Climate Modelling, Journal of Computational Physics, 227(7), 3641–3666. http://dx.doi.org/10.1016/j.jcp.2006.10.024

- P4L4-34 I think there are too many technical details in the last part of the introduction. As said previously, it should better explain the whole structure of the document. For instance, the three sub-sampling strategies are not explicitly mentioned here. Moreover, the paper contains many subsections, so giving the general plan in the introduction would be useful to the reader.

- P5L24 How did the authors determined that 100 iterations were enough ? Would the error bars change by a lot if one would use 200, or 1000 iterations instead ?

- P6 What is the reason for the drop between 30-35 members for the performance ranking ensemble ? Could it be due to the fact that some models have several members ?

- P6L31 Is there any relationship between the minimum of the optimal ensemble curve (in Fig. 1; between 5 and 8 members for temperature and around 12 for precipitation) and the effective number of independent models in the ensemble ?

- P10L17 Why did the authors choose f3 to be the pairwise MSE rather than the pairwise correlation of errors as shown in figure 2 ?

- P10L20 "This is a way to address dependence in ensemble spread." Would be worth adding here that it prevents from selecting several members from the same model as well.

- P10L27 "the members of the optimal ensemble", I guess we mean the 3-term one but it is not clear. Also "have a better average performance", it is not clear at all in figure 1 that the 3-term optimal selection is better than the 1-term one, and even the RMSE of the 3-term selection seems a bit higher (triangle are a bit higher than the circles).

- P11L19 What do the authors exactly mean by "whether the approach is fitting short term variability" ?

- P12L1-2 The different metrics (trend and space+time) should be defined more explicitly.

- P12L14 "all available runs" Do we mean 81 runs or one per institute ? Please clarify here and elsewhere in the text.

- P13L31 "mean of all 81 model runs" As the authors use this example very often in the paper, it should I think be stated somewhere that this is really bad practice to average all models and realizations in a CMIP experiment because we arbitrarily give more weight to the models represented by the largest number of members. It should also be made more clear why they use this benchmark rather than simply the multi-model ensemble mean based on one realization per model (that is 38 models) ?

- P15L3-4 Similarly to the issue of a selection based on multiple variables and observational dataset (as pointed in the previous reviews), many applications such as impacts assessments or regional climate modelling should imply an ensemble selection based on a specific region. So the fact that the optimal ensemble will depends of the region where it is calibrated should be discussed as well.

- Figure 2: The label of the y-axis is misleading because a high error correlation rather implies model-model similarity.

- Figure 3: Regional downscaling should be as well in the "Application to the future", not only in the "historical data" blue box.

Hide

ED: Reconsider after major revisions (07 Nov 2017) by Fubao Sun

AR by Nadja Herger on behalf of the Authors (20 Nov 2017) Author's response Manuscript

ED: Reconsider after major revisions (18 Dec 2017) by Fubao Sun

ED: Referee Nomination & Report Request started (18 Dec 2017) by Fubao Sun

ED: Publish as is (23 Jan 2018) by Fubao Sun

AR by Nadja Herger on behalf of the Authors (23 Jan 2018) Manuscript

Download

Article (7366 KB)
Full-text XML

Short summary

Users presented with large multi-model ensembles commonly use the equally weighted model mean as a best estimate, ignoring the issue of near replication of some climate models. We present an efficient and flexible tool that finds a subset of models with improved mean performance compared to the multi-model mean while at the same time maintaining the spread and addressing the problem of model interdependence. Out-of-sample skill and reliability are demonstrated using model-as-truth experiments.