STITCHES: creating new scenarios of climate model output by stitching together pieces of existing simulations
- 1Lawrence Berkeley National Laboratory, Berkeley, CA
- 2Joint Global Change Research Institute, Pacific Northwest National Laboratory and University of Maryland, College Park, MD
- 1Lawrence Berkeley National Laboratory, Berkeley, CA
- 2Joint Global Change Research Institute, Pacific Northwest National Laboratory and University of Maryland, College Park, MD
Abstract. Climate model output emulation has long been attempted to support impact research, mainly to fill-in gaps in the scenario space. Given the computational cost of running coupled Earth System Models (ESMs) an effective emulator would be used to create climatic impact-driver information under scenarios that could not be run by ESMs. Lately, the necessity of accounting for internal variability has also made the availability of initial condition ensembles important, increasing further the computational demand. However, at least so far, emulators have always been limited to simplified ESM output, either seasonal, annual or decadal averages, and/or basic quantities, like temperature and precipitation, often emulated independently of one another. With this work, we propose a more comprehensive solution to climate model output emulation. Our emulator, STITCHES, uses existing archives of Earth System Models' (ESMs) scenario experiments to construct new scenarios, or enrich existing initial condition ensembles, which is what other emulators do. Importantly, its output has the same characteristics of the ESM output it set out to emulate: multivariate, spatially resolved and high frequency as the original ESM output is. STITCHES extends the idea of time-sampling – by which climate outcomes are stratified by the global warming level at which they occur, irrespective of the scenario and time associated to them – to the construction of a continuous Global Surface Air Temperature (GSAT) trajectory over the whole 21st century that replicates a target trajectory to be emulated. STITCHES does so by stitching together decade-long windows within a model simulation when GSAT has similar characteristics to the target GSAT trajectory, but in doing so STITCHES creates a series of pointers to a sequence of decades within existing scenarios in the ESM archived output, and the emulator can thus recover any type of output, at any frequency and spatial scale available from the original ESM's experiment that produced each decade. We show that the stitching does not introduce artifacts, in the great majority of cases, even when the criteria for the identification of the decades to be stitched together are not strictly tailored to the specific ESM emulated. We show this is the case for the variable that we expect to be smoother and less noisy than many variables commonly used for impact analysis, annual GSAT. Our results also suggest that most other surface atmospheric variables commonly used for impact analysis would be similarly unaffected by the stitching procedure. We successfully test the method's performance over many CMIP6/ScenarioMIP-participating ESMs and experiments. Only a few exceptions surface, but these less-than-optimal outcomes are always associated with a scarcity of the archived simulations from which to gather the decade-long windows that form the emulated GSAT trajectory. In the great majority of cases, STITCHES performance remains satisfactory according to metrics that reward consistency in trends, interannual and inter-ensemble variance, and autocorrelation structure of the time series stitched together. The method therefore can be used to create new scenarios with different GSAT pathways than existing simulations, and to increase the size of existing initial condition ensembles. There are aspects of our emulator that will immediately disqualify it for specific applications, like when climate information is needed whose characteristics result from accumulated quantities over windows of times longer than those used as building blocks by STITCHES. But for many applications, we argue that a stitched product can satisfy the needs of impact researchers. Thus, we think it could open up the possibility of designing the next scenario experiments within CMIP7 according to new principles, relieved of the need to produce a number of similar trajectories that vary only in radiative forcing strength.
Claudia Tebaldi et al.
Status: final response (author comments only)
-
RC1: 'Comment on esd-2022-14', Anonymous Referee #1, 24 May 2022
The STITCHES algorithm presents a unique time-sampling based approach that enables exploration of different, arbitrary climate scenarios. Its added benefit of not being limited to specific climate variables or spatial/temporal scales makes it a powerful tool in comparison to existing simple climate models/emulators. Overall, it is extremely relevant to the climate modelling and impact/integrated assessment societies and suitable for the Earth System Dynamics journal. Some comments are as follows:
High-level comments:
- The “outside the lower-end emission scenario bracket” application of STITCHES should be clarified, there is discussion surrounding overshoot however not for low-emission scenarios with near equilibrated climate by 2100.
- Some discussion on choice of tuning parameters (X and Z) for different temporal scales (annual vs monthly) should also be given. Since non-linear warming could manifest more strongly at monthly timescales (due to e.g. snow-albedo feedbacks), this could limit the values of X or Z to be used (or otherwise the fineness of temporal resolution). Given that decadal oscillatory patterns such as El-Nino are aimed to be conserved, implications of having X>9 and the compromise this has on fidelity of representation for finer temporal resolutions should furthermore be explored (e.g. looking at performance on monthly timescales with different X values).
- Although discussion of application of STITCHES is given, readers would be curious for more discussion on future developments and improvements that could be made.
Below are more specific comments
Specific comments:
L4: the link between emulators and computational demand should be clarified
L19: This may be confusing to readers: the use of GSAT to create the pointers from which all other climate variables at different spatial and temporal scales will be stitched together should be clarified (i.e. pointer is not climate variable specific).
L113: This suggestion is a bit strong given that emulators already mentioned (Link et al. 2019, Beusch et al. 2020,2021) circumvent the need for initial condition ensembles by providing stochastically generated imitations of the expected internal variability. Furthermore, scenario exploration to look at climate under equilibrated or overshoot state is still extremely important and this should be clarified.
L115-L135: Very well explained background to the rationale!
L146: what about scenarios lower than the lowest emission scenarios or overshoot scenarios?
L197-L205: Z is dependent on X which is also a tuning parameter, this may introduce additional caveats in choosing X so as to avoid “jumps” between the seams. Have sensitivity tests been performed on this? Some explanation on how to jointly pick the optimal combination of X and Z should be provided.
L211: Is the ensemble size the sole thing considered when choosing which ESMs to display? Looking at ESMs of different genealogies would also be interesting especially for the (T, XdT) space (if not that is also O.K., just curious about why the above criteria).
Figure 1: it seems that for most models around -0.01degC the rate of historical warming is higher than that at 0-0.01 degC, is there a reason for this? It also raises the question of the genearlisability of this approach for time windows with major volcanic events (e,g, Mt Pinatubo which has a distinct fingerprint in the GMT trajectory) and some elaboration on this may be required.
L227-L230: Great that this is elaborated upon here! Providing this elaboration earlier could benefit and provide more structure to the text however.
Figure 2: It seems that all ESMs in this figure have a mismatch in the GSAT trajectories after 2050 for ssp 2-4.5 (and also BSS-CSM2-MR and CMCC-ESM2 in Figure 4), some elaboration on this may be needed e.g. transient vs equilibrated state. In general some consideration of how to stitch together cases where X*dT ~ 0 should be elaborated as nearest neighbors could have both a positive or a negative trend.
L306: It would be interesting to see month specific trends (e.g. the decadal trend for Jan and Jul). It seems here it is only the decadal trend of the whole monthly time series, if not this should be clarified as well.
Figure 6: There seems to be systematic overestimation of monthly variance around central Africa (also for models in the appendix), are there reasons for this (e.g. vegetation/land cover changes where SSP 5-8.5 imposes quite high deforestation which may lead to spurious variabilities)
L321: The argument that internal variability explains the mismatch in the Arctic is not so convincing. It could for instance be due to the AMOC or otherwise due to a non-linear increase in summer time temperatures during ice-free arctic summers.
L346: Figure 7, it may be difficult to visually gauge similarity in magnitude and oscillatory behaviour. Although this is made more obvious in Figure 8, it may be a good idea to apply a power spectral decomposition instead and show their results for a clearer overview. Very good idea to look at SOI within the analysis otherwise!
L400: Does the Z_cutoff value generalize to all values of X? The calculation of Z_cutoff is already a very useful exercise so this is a minor detail, just curious.
L438: The term envelope collapse should be clarified and how it related to the Z value as well (i.e. how best to know at which Z envelope collapse has been approached?
Table 5: Is there a relationship (e.g. linear) between between E_r and Z_cutoff, or are they stable and then jump to above 10% after a certain cutoff?
Table E1: The E_1 and E_2 values for CanESM5 tend to be higher for 20 archive members and then drop lower at 25 archive members. More so for SSP 3-7.0 the E_1 values are 0 at 25 archive members for both 2010 and 2050. Is there a reason for this?
Conclusion and Discussion: the recommendation for looking at less scenarios and focusing on more initial condition ensembles may be quite strong: perhaps there should be elaboration on which scenarios are more useful to explore (i.e. ones where interpolation becomes difficult such as overshoot or equilibrated climate). The applicability of STITCHES across different temporal scales should also be clarified (i.e. limitations when applying it to annual vs monthly vs subdaily timescales).
Editorial comments:
L35: support the climate information needs of the impact research community
L44: bias-correcting them. Alternatively just bias-correction could also work
L120: perhaps “scenario-independence” would be a term more consistent with the terms already introduced
L147: “the STITCHES algorithm”
Figure 1: Lovely plots, very informative! Font size needs to be increased however.
- AC1: 'Reply on RC1', Claudia Tebaldi, 11 Jul 2022
-
RC2: 'Comment on esd-2022-14', Anonymous Referee #2, 27 May 2022
Review of “STITCHES: creating new scenarios of climate model output by stitching together pieces of existing simulations” by Tebaldi et al.
Climate model analyses have been limited to some extent by the scenarios used in projects such as CMIP6 and this study seeks to provide a framework for filling in some of the gaps left by the set of scenarios that exist. The authors perform a comprehensive evaluation of their framework primarily focussed on global mean temperatures and demonstrate its potential utility.
This study addresses an important issue and is a major contribution to the field. I only have minor comments for the authors to consider which I list below. I will admit that it took me a while to understand the methodology which isn’t to fault the explanation given here, but I would suggest that the authors carefully read through the manuscript with a view to making the framework more easily understood where possible.
Minor comments:
L62-64: I agree that the SSP-RCPs span a range of forcings that probably covers the real-world outcome over this century but I think this sentence sounds a bit over-confident and could be dialled back a touch as “exhaustive” seems too strong a descriptor.
L71: Could also cite (Hawkins and Sutton 2009) as the paper where the method used in Lehner et al. originates.
L98: The focus on “transient” warming levels is introduced rather abruptly and I suspect the significance of this point may not be obvious to some readers. Perhaps a sentence or two explaining this could help. Papers that may be of use for an explanation include (Manabe et al. 1991; King et al. 2020; Callahan et al. 2021).
L127: “dimension” should be “dimensions”
Figure 1: It might be worth reminding the reader either in the plot or caption that this is global mean temperature.
L227-228: Technically there is a lower bound of the level of global warming at the start of the simulations too presumably.
L259: “do” should be “does”
L387-388: This sentence needs to be rewritten.
L473-475: Remove “If” before “ENSO” and add “but” before “there exist”.
L501: “haven’t” should be “have not”
References
Callahan, C. W., C. Chen, M. Rugenstein, J. Bloch-Johnson, S. Yang, and E. J. Moyer, 2021: Robust decrease in El Niño/Southern Oscillation amplitude under long-term warming. Nat. Clim. Chang. 2021 119, 11, 752–757, https://doi.org/10.1038/s41558-021-01099-2.
Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Am. Meteorol. Soc., 90, 1095–1107, https://doi.org/10.1175/2009BAMS2607.1.
King, A. D., T. P. Lane, B. J. Henley, and J. R. Brown, 2020: Global and regional impacts differ between transient and equilibrium warmer worlds. Nat. Clim. Chang., 10, 42–47, https://doi.org/10.1038/s41558-019-0658-7.
Manabe, S., R. J. Stouffer, M. J. Spelman, and K. Bryan, 1991: Transient Responses of a Coupled Ocean–Atmosphere Model to Gradual Changes of Atmospheric CO2. Part I. Annual Mean Response. J. Clim., 4, 785–818, https://doi.org/10.1175/1520-0442(1991)004<0785:TROACO>2.0.CO;2.
- AC2: 'Reply on RC2', Claudia Tebaldi, 11 Jul 2022
-
RC3: 'Comment on esd-2022-14', Anonymous Referee #3, 10 Jun 2022
Review of “STITCHES: creating new scenarios of climate model output by stitching together pieces of existing simulations” by Tebaldi et al.
This paper presents a procedure to create surrogate trajectories of climate model ensembles. The authors provide tests on a set of CMIP6 simulations and discuss the sensitivity to two key parameters of the procedure.
I have no reason to doubt that the authors know what they do. My main concern with the paper is that I neither understand the general picture nor the details.
My first concern is on the format of the paper and its suitability for ESD. The abstract, introduction, and conclusions are written by and for IPCC insiders, as the authors use a lot of IPCC jargon, which is obscure to most human beings, including me. This style of writing seems to go against the interdisciplinary nature of ESD. Not only the paper does not report new understanding of the climate system, but the authors do not discuss that their procedure might help do so (or how). Another example is the use of the term “emulator” or “emulation”. Of course, this remark is not limited to this manuscript. I yet have to see a reasonably clear definition of what is called a “climate emulator”. For some authors, an emulator is a regression between some predictand variable and a predictor. Here, it is obviously something else, that looks akin to analog modelling. Making a proper bibliographic search could help relate the procedure described in the manuscript to existing work, which might not appear in the IPCC reports. The notion of “creating new scenarios” is not clear. The IPCC seems to use SSP scenarios, which are relevant for the economy. What the authors do is obviously something else. So, using this terminology might be confusing. The simple (acknowledged) fact that the emulation procedure cannot produce relevant GHG (or any forcing fluxes) should plead against the use of “creating scenarios”. My understanding is that the procedure creates surrogate trajectories that are constrained by GSAT values. Why should those trajectories be called “scenarios” in the IPCC sense?
My second concern is that the procedure description seems inappropriately vague. Ideally, I should be able to reproduce the procedure by reading the manuscript (provided I have access to the data). The first step (l. 148) suggests that *one* time series of GSAT is created for each model by dumping together all ensembles, scenarios, etc. ([…] “the time series is made […]”). I guess/hope that the authors do differently. The fourth step (l. 157) is not clear: what is a target scenario? The authors allude to “target scenarios” in several places, but do not define what those are. I believe that the authors could design a diagram that explains how the procedure works. In practice, I understand that one needs to know the target scenario (i.e., have GSAT data). Hence, I do not understand how the authors can reconstruct “unknown” scenarios (e.g., SSP2) from just SSP1 and SSP5, which suggests that intermediate scenarios can be deduced from two extreme SSP scenarios. This might be true, but I would like to understand this miracle (at least for me).
My third major concern is on the results or the performance tests. The authors seem to be happy with the results reported in Figs 1-8. Indeed, the “emulated” time series are close to “targets” (whatever how the targets were designed). But is this desirable? The GSAT time series have no decadal or interdecadal variability (which might due to the procedure itself). This is not discussed, but I would doubt any procedure that creates trajectories that do not yield long term variability are so useful, or really account for climate variability (e.g., the so-called butterfly effect). For me, the SOI results are “good” by construction, since they are excerpts of existing simulations. How would this emulation procedure be able to emulate changes in ENSO variability, which would be a key issue for impact modelling? My feeling is that the simulated trajectories give overconfidence about (the lack of) climate internal variability. The conclusion that this procedure can replace numerical model simulations hence seems overconfident.
Minor issues
In the search of nearest neighbors in the (T, dT) space (step 5, l. 165), are there different weights on T or dT, in the distance definition?
In step 6 (l. 170), what is a “pointer”?
l. 211: “(see 1)”, what is “1”?
Figure 1: I can’t read the labels on a printed version of the manuscript.
I feel that there should be a separate section that describes the experimental set ups, tests, etc.
Figures 2-4: the captions should only keep descriptive statements, not comments that already appear in the text.
Equation (1) (l. 360): all symbols should be introduced. What is the bar? I think that \hat y should be the synthetic and y the truth, not the other way around, as suggested in l. 362. E2 is certainly not a ratio of variances (but a ratio of standard deviations). The denominator of E2 should be: <(y - \bar y)^2 > (no \hat).
In conclusion, my feeling is that the manuscript would be much more appropriate in GMD, which incidentally has a better impact factor than ESD. Of course, this decision is left to the authors and the editor.
- AC3: 'Reply on RC3', Claudia Tebaldi, 11 Jul 2022
Claudia Tebaldi et al.
Data sets
STITCHES Data Generated Snyder and Dorheim https://doi.org/10.5281/zenodo.6461693
Model code and software
STITCHES software Snyder and Dorheim https://github.com/JGCRI/stitches/releases/tag/v0.9.0
Claudia Tebaldi et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
636 | 125 | 20 | 781 | 9 | 8 |
- HTML: 636
- PDF: 125
- XML: 20
- Total: 781
- BibTeX: 9
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1