Relating Climate Sensitivity Indices to projection uncertainty

Abstract. Can we summarize uncertainties in global response to greenhouse gas forcing with a single number? Here we assess the degree to which traditional metrics are related to future warming indices using an ensemble of simple climate models together with results from CMIP5 and CMIP6. We consider Effective Climate Sensitivity (EffCS), Transient Climate Response at CO2 quadrupling (T140) and a proposed simple metric of temperature change 140 years after a quadrupling of carbon dioxide (A140). In a perfectly equilibrated model, future temperatures under RCP(Representative Concentration Pathway)8.5 5


Introduction
Summarizing the response of the Earth System to anthropogenic forcers with metrics has long been practised as a way to illustrate uncertainty in Earth system response to greenhouse gases. For example, the concept of the Equilibrium Climate 15 Sensitivity (ECS), the equilibrium global mean temperature increase which would be observed in response to a doubling of atmospheric carbon dioxide concentrations (Hansen et al., 1984) has existed for over 50 years (Charney et al., 1979) and significant amount of literature has been devoted to constraining its value (Knutti et al., 2017).
The Earth system responds to a step-change in forcing on timescales ranging from days to millennia (Knutti and Rugenstein, 2015), so an 'Effective Climate Sensitivity' (EffCS hereon) is often used as a proxy for decadal to centennial feedbacks. EffCS 20 is generally calculated in a coupled atmosphere-ocean model from the output of the 'abrupt4xCO2' simulation, a standard experiment in which CO2 concentrations are quadrupled instantaneously from pre-industrial levels and the model is allowed to evolve (Gregory et al., 2004).
EffCS is calculated by assuming that a model is associated with a single feedback parameter (i.e. a rate of change of top of atmosphere radiative flux per unit surface temperature increase), allowing the equilibrium temperature response to a step 25 change forcing to be predicted by linear extrapolation (we refer to this approach henceforth as the Constant Feedback (CF) 1 approximation, with EffCS referring to the estimate of ECS made using this approach). Another metric, the Transient Climate Response at CO 2 doubling (TCR) or quadrupling (T140) is calculated from an '1pctCO2' idealized experiment in which CO 2 concentrations are increased by 1 percent each year, starting from a pre-industrial state, resulting in linearly increasing forcing.
Although it was generally assumed that TCR would be a better predictor of transient warming under a high emissions 30 scenario such as RCP8.5 , a complication has arisen due to the fact that EffCS seems to be better correlated than TCR with 21st Century warming from present day levels under a business-as-usual scenario (Grose et al., 2018). The reason for this is not yet well understood given the radiative pathway in RCP8.5 leading up to 2100 is relatively similar to that of the 1 percent annual increase experiment used to measure T140. Furthermore, neither EffCS nor TCR is well correlated with end of century temperatures in a mitigation scenario (Grose et al., 2018) such as RCP2.6 (Van Vuuren et al., 2011), which calls 35 in to question the relevance of such summary metrics in the discussion of mitigation adaptions.
Similarly, a number of studies have shown that the EffCS approximation does not well describe the true equilibrium behaviour of most models (Knutti et al., 2017). When GCM abrupt-4xCO2 simulations are continued for thousands of years, many are found to deviate significantly from the linear trend-line one would fit to a 150 year simulation (Andrews et al., 2015;Knutti et al., 2017;Senior and Mitchell, 2000;Rugenstein et al., 2016). 40 The conceptual models representing the evolving feedbacks as a function of timescales vary slightly between studieseither modulating the efficacy of deep ocean heat uptake (Geoffroy et al., 2013;Winton et al., 2010;Held et al., 2010) or by representing the climate system as sum of warming patterns which emerge on different adjustment timescales (Armour et al., 2013;Rugenstein et al., 2016), each associated with their own feedback parameter. However, the analytical set of solutions for the temperature response to a step change in forcing is the same in either case -a superposition of decaying exponential 45 modes with different timescales varying between a few years and a few centuries (Proistosescu and Huybers, 2017). It has been shown that the implications of these additional degrees of freedom, and ambiguity over contributions from different timescales of response might imply that EffCS may not be strongly constrained by temperature change over the last century (Proistosescu and Huybers, 2017;Andrews et al., 2018), and that the Long Term Equilibrium (LTE) sensitivity may be greater than that implied by estimates which use the CF framework (Otto et al., 2013;Lewis, 2013).

50
This state of understanding leads to a number of emerging critical questions which we discuss in this paper -can we explain the non-intuitive result that EffCS is a better predictor then T140 of end-of-century temperatures under RCP8.5? Which summary metrics of global sensitivity to greenhouse gas forcing are most useful for effective policy decisions? Finally, do the implicit structural assumptions underpinning the applicability of these metrics to the real world cause us to mis-categorize and potentially underestimate future warming risk? 55 1 A simple model example We begin by considering an idealized ensemble of climate model simulations. We use a two timescale thermal response model, conceptually representing the deep ocean (with a response timescale of a century or more) and shallow ocean response timescales (with a response timescale of 10 to 50 years). Such a model, although simple, is capable of resolving evolving feedback amplitudes and can emulate the climatological responses of complex Earth System Models on two timescales. Such a 60 model makes a structural assumption that the Earth can be modelled as a discrete sum of linear decaying exponential responses to forcing, but this model has been found to well describe GCM evolution on a century timescale (Proistosescu and Huybers, 2017;Geoffroy et al., 2013) and is sufficiently complex to illustrate the limitations of defining system sensitivity through TCR or EffCS.
To efficiently describe the response of the system to a generic forcing, this study employs a linear Green's function which 65 describes the forcing by convolution with an impulse response Ruelle (1998) (in this case, the step change in CO2 forcing). This approach can be used to approximate and simplify global climate dynamics Ragone et al. (2016); Lucarini et al. (2017), and its computational efficiency allows Markov-Chain Monte Carlo parameter estimation for the physical parameters. Furthermore (and critically for this study), the pulse-response formulation can be used to self-consistently relate different metrics of climate sensitivity on a range of timescales Lucarini et al. (2017).

Model Formulation
The two-timescale impulse response model follows the thermal feedback-timescale implementation from the FAIR simple climate model (Smith et al., 2018;Millar et al., 2017), which follows Hasselmann et al. (1993): where T n is global mean temperature and for each timescale n. T n is the component of warming associated with that 75 timescale, q n is the feedback parameter and d n is the response timescale. We consider the heat flux into the shallow and deep ocean to be functions of the same timescale: R n = r n (F − T n /q n ); R = n R n ; n r n = 1; n = 1, 2 where r n is an efficacy factor for heat absorbed by the deep (n = 1) or shallow (n = 2) ocean, which sum to unity given the boundary condition that R(0) = F (0) = F 4xCO2 at t = 0 (allowing just one degree of freedom r 1 -the fraction of heat which 80 is allocated to deep ocean storage).
The particular solutions for temperature and radiation response to a step change in forcing F 4xCO2 at time t = 0 can be expressed as a sum of exponential decay functions: where T P (t) is the annual global mean temperature and R p (t) is the net top-of atmosphere radiative imbalance at time t, and F 4xCO2 is the instantaneous global mean radiative forcing associated with a quadrupling of CO 2 , taken here to be 3.7W m −2 (Myhre et al., 2013).
We define a historical forcing timeseries as a function of CO 2 concentrations C(t) and a non-CO2 forcing timeseries F nonCO2 (t) (both taken from (Meinshausen et al., 2011)): where f r is a free parameter to allow scaling of aerosol forcing (conceptually allowing for forcing uncertainty in the historical timeseries), and F otherAnt is all other anthropogenic and and natural forcers (summed from (Meinshausen et al., 2011)). The thermal response is calculated by expressing the numerical time derivative of the forcing timeseries F (t) where the change in forcing in a given time-step in a given year The forcing timeseries can thus be expressed a series 95 of step functions, and T p from equation 3 can be used to calculate the integrated thermal response.
Heat fluxes into the deep (D(t)) and shallow (H(t)) ocean components are represented by numerical integration of the slow (n=1) and fast (n=2) pulse response components of R p (t) in Equation 4: 100

Model Optimization
The model input time-series for calibration are observed CO 2 concentrations, along with radiative estimates from Meinshausen et al. (2011) of non-CO2 forcing agents. We optimize the thermal model parameters for 2 timescales and the non-CO2 forcing factor (see Table 1).

105
A Markov-Chain Monte-Carlo (MCMC) optimization procedure produces an ensemble of parameter configurations such that the density of the simulations in parameter space reflects the likelihood as reflected in a cost function (as represented by a number of pre-defined likelihood metrics). MCMC algorithms employ a random walk in parameter space which ultimately seeks to produce a representative sample of the distribution.
The classical approach to this random walk is the Metropolis Hastings algorithmMacKay and Mac Kay (2003), which iter-   Flat priors are used for all parameters, with an additional prior on true equilibrium climate sensitivity using the likely 125 value and upper bound on Equilibrium Climate Sensitivity from Goodman and Weare (2010) to specify the median and 90th percentile of a gamma distribution for equilibrium sensitivity (i.e. warming as t→ ∞).
We demonstrate that this technique is able to capture the broad uncertainty associated with future projections of CMIP models by using pre-2020 temperatures in RCP8.5 to calibrate the simple model outlined above ( Figure S4). In most cases, the future projection for each scenario falls within the distribution arising from the MCMC ensemble fit, with some specific 130 exceptions -FIO-ESM, FGoals-G2, CCSM4 (which share some common heritage) and the GISS models. As such, the observationally fitted MCMC ensemble explores broadly comparable uncertainty to that seen in the bulk of the CMIP ensemble, with the caveat that the ensemble tends to under-sample cases where there is little or no long term warming response to emissions.  (Meinshausen et al., 2011), which are not the focus of this study). The posterior parameter distribution for the model can then be used to project the corresponding range of response in probabilistic projections of the future scenarios or in idealized experiments which simulate a range of self-consistent values for various climate sensitivity metrics.

Idealized Simulations
Effective Climate Sensitivity is measured by implementing a step-change abrupt CO 2 quadrupling, and following (Gregory Resulting EffCS values (to a doubling of CO2) range from 2.4 to 4.6K (5th and 95th percentiles), and values of TCR from 1.6 to 2.2K ( Figure 1(b,e)). This results in a range of 21st century warming under two scenarios considered, RCP2.6(RCP8.5) 2100 warming ranges from 1.4 to 2.4 K (3.8 to 5.1K) respectively (5th and 95th percentiles, see Figure 1(a)).
We then consider in the context of this observationally constrained ensemble of simple models, what idealized metrics of sys-150 tem response are most informative for describing 21st century warming. We consider four metrics: the EffCS, TCR/T140 (transient warming under an annual compounded 1 percent increase in CO 2 concentrations at time of CO 2 doubling/quadrupling, corresponding to years 70 and 140 of the simulation). We also introduce A140 as a possible metric for consideration, defined as the global mean warming above pre-emission levels in the abrupt4xCO2 simulation calculated 140 years after time of CO 2 quadrupling (here and throughout estimated as the mean from years 131-150). Figure 2 illustrates how ensemble spread would 155 be impacted for a set of different scenarios if each of these metrics were constrained to lie within a narrow range (nominally the 45-55th percentile range of values present in the entire observationally constrained ensemble).
In the high emissions, RCP8.5 scenario , 2000-2100 warming is nearly perfectly described (R 2 = 0.99) by T140, the transient climate response after 140 years in a 1 percent CO2 simulation (Figure 1(c) and Figure 2(k)). The corresponding response after only 70 years, TCR, is a much poorer predictor at R 2 = 0.31).

160
These results are physically intuitive. The climate forcing and rate of change of forcing in RCP8.5 at the end of the 21st century are of similar magnitude to those in year 140 of the 1 percent CO2 simulation, and so it is unsurprising that T140 is an efficient predictor for RCP8.5. TCR is a poor predictor in the simple model ensemble largely because TCR itself is already highly constrained by historical warming (Figure 1(e)), and thus the ensemble is effectively conditioned on a value of TCR and it has little additional explanatory value in explaining the ensemble variance in the RCP projections (Figure 2(f,g)).
For the mitigation scenario RCP2.6, the most effective predictor of 2000-2100 warming is A140 (R 2 = 0.91). Both EffCS and T140 are weakly correlated (R 2 = 0.62 and 0.65 respectively), and TCR shows no significant correlation. To help understand these relationships, we can perform a regression analysis of the metrics as a function of model ensemble parameters (Figure 1(f)) which suggests A140 and RCP2.6 warming from 2000 to 2100, are controlled by the difference 170 between the slow and fast components of sensitivity. We can understand this in the context of the way the model is constrained by historical temperatures.
There is a weak trade-off between fast and slow components of climate sensitivity in the posterior parameter distribution of the ensemble (see supplementary figure S3), which broadly determines the fraction of equilibrium warming associated with current forcing levels that has already been experienced. If a greater fraction of today's observed warming is explained with the 175 faster component of model response, there is less unrealized warming in a mitigation scenario later in the century. This causes large uncertainties in RCP2.6 evolution, even if EffCS, TCR or T140 are known (Figure 2b,g,l).
The constrained distribution for fast-timescale sensitivity is near-Gaussian, and non-zero in all ensemble members, whereas slow-timescale sensitivity is more weakly constrained by the observations ranging from near-zero to large (20K/Wm −2 ) long term equilibrium responses. The slow feedback component strongly controls A140 and RCP2.6 warming (Figure 1(d,f), Figure   180 2q).
RCP8.5 warming and T140, however are associated with a near-linear increase in forcing throughout the simulation which results in a near-linear temperature increase. The relative fraction of warming associated with fast-and slow-timescale feedbacks remains constant over time, and thus warming to date (effectively fixing TCR, subject to aerosol forcing uncertainty) better constrains relative error in future response in a non-mitigation scenario (Figure 2f).  Figure S2 and (Grose et al., 2018)).
To understand this, we need to consider how the properties of the simple model ensemble differ from the CMIP archive.
Although the thermal response of the simple model is broadly able to represent the climatological response of CMIP models to Both of these assumptions are not true for CMIP5 or CMIP6. Measurement of EffCS and TCR are complicated by internal variability (Knutti and Rugenstein, 2015), and many models still exhibit some temperature drift in the control simulation 205 from which the '1pctCO2' simulations and 'abrupt4xCO2' simulations are branched (Figure 3). This creates uncertainty from two sources -firstly, it is not always apparent at what point during the control simulations the 1pctCO2 simulation has been branched, thus there is uncertainty in how the anomaly should be measured. Secondly, there is the potential for an unknown contribution of control drift to be erroneously included in the temperature evolution of the 1pctCO2 and abrupt4xCO2 simulations.

210
To assess the contribution of control drift bias in sensitivity metrics, we implement idealized representations of nonequilibration into our simple model from Section 1. We then create an idealized distribution of drift similar to that seen in the CMIP ensembles in the simple model ensemble by initializing the model 500 years before the experiment begins, defining an effective 'baseline' period from which anomalies are measured to be the average temperature between years 400 and 500.
Climate internal variability is represented by a 2nd order autoregressive model, which is fitted to each CMIP model in turn. The 215 ensemble-mean autoregressive parameters are used to create artificial 'noisy' simulations by linearly adding noise generated from the autoregressive model to the output of the simple model.
We consider the range of control drifts observed in the CMIP5 and CMIP6 ensembles (illustrated in Figure 3(L)) which range from -.3 to +.6K /century in the CMIP5 and CMIP6 models considered in this study. An idealized distribution of drift in the simple model ensemble is created by initializing the model 500 years before the abrupt4xCO2 or 1pctCO2 simulation 220 with a non-zero, constant forcing drawn from a flat distribution ranging from −1 to +1W m −2 , which results in a distribution of control drift of -.4K to +.4K per century (i.e. broadly comparable to the CMIP case). For each simulation we consider a baseline for temperature to be defined by the average global mean temperature in years 400-500.
To represent the first order effect of climate noise, we fit a 2nd order autoregressive model to the detrended global mean temperature timeseries in each available model in the CMIP5/6 ensemble. Taking CMIP mean parameters for the variance and 225 autoregressive parameters, we generate noise for each realization of the simple model (though we note, in practise that the noise characteristics vary by CMIP model).
The results are illustrated in Figure 4(a), where the simple model ensemble is initialized in a non-equilibrium state with additive Gaussian noise. With these additional sources of error, both EffCS and A140 are not strongly impacted when measured in the noisy/unequilibrated model variants (Figure 4(b,c)), but the T140 measurement is strongly degraded (Figure 4(d)).

230
Indeed, in this ensemble the biased measurements of EffCS or A140 are slightly better correlated with true T140 than the biased measurement of T140 itself. This provides a possible explanation for why T140 may be a poor predictor of RCP8.5 warming in CMIP.
In our simple framework, the reasons for the more accurate measurement of EffCS are primarily associated with the lack of equilibration. Simply adding noise from the autoregressive model has little effect on the accuracy of EffCS, T140 or A140 235 (where both T140 and A140 are estimated using the average of years 131 to 150 in the simulation, see Table 2).   Figure 1(b), but models are initialized in a non-equilibrium state such that the baseline period is subject to some control drift, and model output is also subject to interannual variability of a similar magnitude to models in the CMIP archive. (a) shows global mean temperature evolution for the control period (gray), abrupt4xCO2 simulation (blue) and 1pctCO2 simulation (green). (b,c) show the true value of (EffCS,A140) as calculated in the noise-free, equilibrated simulations, plotted as a function of the measured value of (EffCS,A140) in a noisy, non-equilibrated simulations.
(d,f,g) shows the true value of (T140,RCP2.6,RCP8.5 2000-2100 warming) plotted as a function of the measured values of T140, EffCS and A140 respectively. Both A140 and EffCS are less sensitive to non-equilibrated initial states than T140. The former experiences the same variance due to the uncertain climate drift, but the absolute value of A140 tends to be larger than T140, thus there is less relative error in its estimation. The effect on the drift on EffCS is muted because the near-linear climate drift primarily biases the estimation of slow rather than fast feedbacks (see Supplementary Figure S1). Because EffCS is primarily a measure of fast-mode feedback 240 strength (see Figure 1(f)), its value is less impacted if experiments are started from a non-equilibrium state.
There is some evidence that the lack of equilibration has an outsized effect on the estimation of TCR in the CMIP models.
In Figure 5, we attempt to unbias the estimate of TCR in two ways. Firstly, we estimate the baseline temperature by regressing the temperatures in the first 20 years of the 1 percent CO2 ramp experiment as a function of time (see Supplementary Figure   S5). Anomalies in temperature (and TOA fluxes for ECS) are measured relative to the corrected baselines derived from the 245 1pctCO2 simulation, and estimated linear pre-industrial trends are subtracted from the 1pctCO2 and abrupt4xCO2 timeseries.
This pre-processing of the temperature timeseries improves the correlation between TCR and 21st century warming under RCP8.5 from 0.86 to 0.89. It also improves the correlation between EffCS and 21st century warming slightly from 0.94 to 0.95 (and A140 from 0.89 to 0.91).
These 'corrected' values (listed in Table 3) are estimates only, given we would expect the regression estimate based on a 250 short 20 year period to be itself subject to internal variability noise, and we are assuming that the abrupt4xCO2 simulation and 1pctCO2 simulation have the same baselines. However, the improvement in correlation with future warming seen over the    case with the pre-industrial average baseline supports the hypothesis that control drift adds uncertainty to the estimation of all quantities (and particularly TCR). However, it is not a complete explanation -and even after this adjustment, EffCS remains better correlated to RCP8.5 transient warming than TCR in the multi-model ensemble.

Conclusions
The question of which metric of climate sensitivity is most useful for summarizing uncertainty in future projections is conditional on a number of factors. Any single metric of sensitivity, even if known perfectly, cannot constrain Earth System response on all timescales and scenarios. We have shown here that one can produce a number of model variants which can exhibit the same value of EffCS or TCR, but with a range of responses, especially in a mitigation scenario such as RCP2.6.

260
In an idealized environment where models can be brought to a complete equilibrium control state, and ensemble sizes for '1pctCO2' simulations are large enough to avoid the effects of internal variability, the T140 metric would be the best idealized warming measure for century-scale warming under a high emissions scenario. However, the presence of even moderate control drift can act as a significant source of error in the measurement of T140, and so here we find that EffCS is likely to be a more accurate practical sensitivity metric in Earth System Model applications where full equilibration is difficult to achieve.

265
EffCS itself has limitations, it is relatively insensitive to slow timescale feedbacks, which means that it poorly correlated with century-scale warming under RCP2.6 (where a large fraction of warming occurs due to slow feedback response to historical emissions), and for warming on multi-century timescales under a high emissions scenario (where concentrations stabilize post-2100). We find that a simple, but useful alternative is to simply use the mean warming from years 131-150 of the abrupt-4xCO2 simulation -which is comparably skilled to EffCS in predicting RCP8.5 warming in 2100, but more sensitive to century 270 timescale feedbacks than EffCS -so therefore it is better correlated with RCP2.6 end of century warming.
It is notable that the most common metrics of sensitivity, EffCS, T140 and TCR, provide very little guidance on peak warming expected under climate mitigation. The focus on these metrics has also given rise to the issue that slow feedbacks in and only a limited set of CMIP-class models have run models for long enough to be informative about equilibrium response (Rugenstein et al., 2019).
It should be noted that these conclusions are derived from the consideration of a relatively simple two-timescale pulse response model. In this model, we can show that certain sensitivity metrics are insufficient to constrain future projections, and that non-equilibration may confound measurement. However, the constrained distributions for the metrics are subject to the The diversity of simulated global mean dynamical response to greenhouse gas forcing over the coming centuries can be represented in simple models with a relatively small number of parameters (Smith et al., 2018;Meinshausen et al., 2011), 285 but we cannot reduce uncertainty in climate projections on all timescales to a single degree of freedom. Summary metrics of climate response have value if the context of those metrics (and their range of applicability in relation to projection uncertainty) is well understood, but their limitations should be kept in mind.
Data availability. CMIP5 and CMIP6 data are available through a distributed data archive developed and operated by the Earth System Grid Federation (ESGF).

290
Code and data availability. Code for this study is available on Github at https://github.com/benmsanderson/matlab_pulse Author contributions. The author performed all analysis and writing for this project Competing interests. The author declares no competing interests