Bayesian estimation of Earth’s climate sensitivity and transient climate response from observational warming and heat content datasets

. Future climate change projections, impacts and mitigation targets are directly affected by how sensitive Earth’s 10 global mean surface temperature is to anthropogenic forcing, expressed via the effective climate sensitivity (ECS) and transient climate response (TCR). However, the ECS and TCR are poorly constrained, in part because historic observations and future climate projections consider the climate system under different response timescales with potentially different climate feedback strengths. Here, we evaluate ECS and TCR by using historic observations of surface warming, since the mid-19 th century, and ocean heat uptake, since the mid 20 th century, to constrain a model with independent climate feedback 15 components acting over multiple response timescales. Adopting a Bayesian approach, our prior uses a constrained distribution for the instantaneous Planck feedback combined with wide-ranging uniform distributions of the strengths of the fast feedbacks (acting over several days) and slow feedbacks (acting over decades). We extract posterior distributions by applying likelihood functions derived from different combinations of observational datasets. The resulting TCR distributions are similar when using different historic datasets: from a TCR of 1.5 (1.3 to 1.7 at 5-95% range) °C, up to 1.7 (1.4 to 2.0) °C. 20 However, the posterior probability distribution for ECS on a 100-year response timescale varies depending on which combinations of temperature and heat content anomaly datasets are used: from ECS of 2.2 (1.5 to 4.5) °C, for datasets with less historic warming, up to 2.8 (1.8 to 6.1) °C, for datasets with more historic warming. Our results demonstrate how differences between historic climate reconstructions imply significant differences in expected future global warming.


Introduction 25
A key goal in climate science is to evaluate how sensitive global mean temperature anomaly is to radiative forcing from greenhouse gasses and aerosols (e.g. Knutti et al., 2017;IPCC, 2013). This sensitivity of climate may be explored by considering how a global surface temperature anomaly affects Earth's radiation balance. The effective climate feedback, "## (Wm -2 K -1 ), expresses the how surface warming increases outgoing radiation at the top of the atmosphere. "## at some time is calculated from the total radiative forcing, &'&() (Wm -2 ), the net top-of-atmosphere energy imbalance, (Wm -2 ), 30 where both &'&() and Δ are defined as zero at some preindustrial state. The Effective Climate Sensitivity at some time , 35 ECS( ) in K, is then defined as the radiative forcing for a doubling of CO2, 4×674 , divided by "## ( ), ECS and "## may be evaluated from estimates of historic radiative forcing and observational constraints on Δ and , eqns. 40 (1, 2); noting that Earth's energy imbalance, , can be observationally constrained as a time-average through reconstructing the heat content changes in the Earth system dominated by the ocean (e.g. Cheng et al., 2017;Levitus et al., 2012).
Many previous studies evaluating ECS from historical observational data and radiative forcing estimates, eq. (2), have either calculated a single constant climate sensitivity (see Annan, 2015;Anan and Hargreaves, 2020;Bodman and Jones, 2016;45 Lewis and Curry, 2014;Sherwood et al., 2020;Skeie et al, 2018;Otto et al., 2013), or have evaluated ECS for specific historic periods (e.g. Tokarska et al., 2020), acknowledging that the value for the specific historical period may not apply for all timescales into the future.
There is also the possibility that, at any given time or timescale, the climate feedback may be different for different sources of radiative forcing, such as well mixed greenhouse gasses and volcanic aerosols (e.g. Marvel et al., 2015).

55
The aim here is to perform Bayesian probabilistic evaluations of both ECS and transient climate response (TCR in K), using observational constraints on global surface temperature and ocean heat content anomalies to constrain a model framework that includes time-varying climate feedbacks, eqns. (1, 2). Our estimates are independent of simulated warming responses in complex climate models. 60 We utilise a numerical model where multiple climate feedbacks each respond to radiative forcing over different timescales (Goodwin, 2018), allowing "## to vary over time (eqns. 1,2). Generating a prior model ensemble with varying fast and slow climate feedback strengths, we extract four posterior ensembles using a Bayesian comparison to observational https://doi.org/10.5194/esd-2020-79 Preprint. Discussion started: 20 October 2020 c Author(s) 2020. CC BY 4.0 License.
reconstructions. Each posterior ensemble applies a different combination of historic reconstructions of global surface temperature anomaly (either HadCRUT4: Morice et al., 2012; or Cowtan&Way version 2.0 'HadCRUT4 infilled by kriging': 65 Cowtan & Way, 2014) and reconstruction of ocean heat content anomaly (either NODC: Levitus et al., 2012;or Cheng et al: Cheng et al., 2017). All our posterior ensembles are extracted using the additional constraints from HadSST3.1 (Kennedy et al., 2011) andGlobal Carbon Budget (le Quéré et al., 2018) for sea surface temperature and ocean carbon uptake anomalies respectively.

Model of surface warming from time-varying climate feedback 70
Quisque Equation (1) considers surface warming via a single effective climate feedback response to total radiative forcing, where the effective climate feedback represents an aggregated response to multiple climate feedbacks to multiple sources of radiative forcing. Here, surface warming is modelled as an extended energy balance response to sources of radiative forcing by climate feedbacks operating over different response timescales (Goodwin, 2018), The combinations of climate feedbacks processes considered here are: (1) #(I& , the combined fast feedbacks operating over response timescales approximately linked to the residence timescale of water vapour in the atmosphere (van der Ent and Tuinenberg, 2017), including clouds, water vapour-lapse rate, snow and 80 sea-ice surface albedo; and (2) I)'J , the combined slow feedbacks operating over a multi-decadal timescale that may, for example, be linked to a surface warming pattern adjustment (e.g. Andrews et al., 2015).
When radiative forcing from source is not increasing in magnitude between times and + , | ? ( + )| ≤ | ? ( )|, the th combination of climate feedback processes evolves according to, This model of climate feedbacks responding to imposed radiative forcing over multiple response timescales, eqns. (3), (4) and (5), produces a time-evolving effective climate feedback, (1), and time-evolving Effective Climate Sensitivity, (2), in 115 response to a prescribed forcing scenario. Here, the transient climate response, TCR, is calculated as the 11-year average warming centred at the year of CO2 doubling for a scenario with a 1 per cent per year rise in CO2 and no other forcing (hereafter: 1pctCO2 scenario).

Generation of the prior and posterior ensembles 120
We generate probabilistic prior and posterior model ensembles with varied model input parameters using Bayes' theorem.
The joint posterior probability that the climate system parameters have a specific set of values ′ given background information and observations of the climate system { }, ( = ′|{ }, ), is expressed using Bayes' theorem, where: (1) ( = ′| ) is the joint prior probability that = ′ for climate system parameter values (Supplementary Table S1;  Table S2).

135
Here, we use large ensemble simulations of the Warming Acidification and Sea level Projector (WASP) model (Goodwin, 2016), adopting the updated version of Goodwin (2018) with explicitly time-evolving climate feedbacks (eqns. 3, 4 and 5).
This version of WASP does not contain a single parameter for ECS or "## at some time , eqns. (1, 2). Instead, the values of ECS and "## emerge over time in the model in response to the forcing scenario from a combination of multiple prescribed climate system parameters (eqns. 3, 4, 5). The WASP model contains a 5-box representation of ocean heat and 140 carbon uptake, with an ocean circulation that is varied between ensemble members but remains constant in time within each ensemble member (Supplementary Table S1).
We form a prior model ensemble where a total of 25 model input parameters independently varied between simulations (Supplementary Table S1), to represent the prior climate system parameter distribution , eq. (6). Five of the input 145 parameters within describe how climate feedback responds to an imposed radiative forcing ( @)(ABC , #(I& "\]?) , I)'J "\]?) , w(I& and x)'J ) with a 6 th input parameter (the radiative forcing coefficient for CO2) converting this climate feedback to Effective Climate Sensitivity (Supplementary Table S1, eq. 2).
A further thirteen of the 25 model input parameters varied within relate to uncertainty in historic radiative forcing 150 (Supplementary Table S1). The WASP model is historically forced until 2014 (following the ssp585 scenario thereafter:  Table S1) to approximate historic uncertainty (Myhre et al., 2013;Etminan et al., 2016;Smith et al., 2018;155 Gregory et al., 2016). https://doi.org/10.5194/esd-2020-79 Preprint. Discussion started: 20 October 2020 c Author(s) 2020. CC BY 4.0 License. Table S1) are used to represent historic uncertainty in: the radiative forcing sensitivity to greenhouse gas concentrations (Myhre et al., 2013;Etminan et al., 2016); the direct radiative forcing sensitivity to anthropogenic aerosol emissions for six separate aerosol types (Myhre et al., 2013), and the radiative forcing sensitivity to 160 volcanic aerosol optical depth (Gregory et al., 2016). However, a skew-normal input distribution is used to represent historic uncertainty in the indirect radiative forcing from anthropogenic aerosols (Supplementary Table S1), since there is a long tail of possibly strong-negative radiative forcing from this effect (IPCC, 2013).

Normal input distributions (Supplementary
We generate prior ensembles containing 6.825 × 10 | ensemble members, with the 25 input parameters independently varied 165 such that the relative frequency distributions of each input parameter are set to the assumed prior probability distribution, Table S1 Table S2).

170
There are = 12 observational constraints within { } (Supplementary Table S2). The probability of obtaining the th observational constraint given = ′ and is calculated assuming Gaussian uncertainty in the observable (e.g. Annan and Hargreaves, 2020), where C and C are the observational mean and standard deviation uncertainty of observable (Supplementary Table 2), and C is the simulated value of the observable for = ′ and . To calculate the overall probability of obtaining all observational constraints within { } given = ′ and , we multiply the probabilities for all { } C , Four different ensembles are generated using different combinations of surface temperature (HadCRUT4 and Cowtan&Way:  (8), is greater than a number randomly drawn between 0 and some number greater than the maximum value of ({ }| = ′, ) achieved in that prior ensemble. We adopt a normal prior distribution for @)(ABC , informed by Earth's global mean surface temperature (Jones and Harpham,190 2013) and radiation budget (Trenberth et al., 2014) (Fig. 2, solid black line). We adopt uniform prior distributions of w(I& "\]?) and x)'J "\]?) (Fig. 2, solid blue and red lines), thus assuming that any value within the boundaries is equally likely before we consider the observations, { } (eq. 6). Our boundaries for the uniform distributions of w(I& "\]?) and x)'J "\]?) are set wide enough such that the posterior distributions are not significantly affected by the boundaries (Fig. 2, red and blue), but narrow enough such that the problem is computationally tractable. 195

Results
In From the initial 6.825 × 10 | simulations in each prior ensemble, a total of 23804 simulations are accepted into the  Table S2).
The posterior distributions reveal both the Planck feedback and fast feedback strengths are insensitive to the combination of temperature and heat content datasets used within the likelihood function: @)(ABC = 3.2 ± 0.1 Wm -2 K -1 for all ensembles (mean ± standard deviation), while w(I& "\]?) = −1.2 ± 0.6 Wm -2 K -1 for the HadCRUT4 + NODC ensemble and w(I& "\]?) = 205 −1.2 ± 0.5 Wm -2 K -1 for the other three ensembles. However, the different combinations of warming and heat content datasets used in the likelihood function result in different distributions for the slow feedback strengths. "\]?) values contribute more positive climate feedback, the interpretation considered here is that the increased recent warming in Cowtan&Way relative to HadCRUT4 (Fig. 1) implies that surface warming will receive more amplification, or 215 less damping, from slow climate feedbacks into the future (Fig. 2, compare red dashed and dotted lines).

The Effective Climate Sensitivity and Transient Climate Response
The ECS is analysed by forcing the four posterior ensembles with an instantaneous step-function quadrupling of atmospheric CO2 (hereafter: 4xCO2 scenario) and applying eq. (2). The value of ECS changes over time (Figs. 3,4) as the fast and slow climate feedbacks evolve in response to the imposed radiative forcing (eqns. 3, 4, 5). 220 For each combination of datasets used, the ECS is best constrained from the historic observational reconstructions on 20- year timescale (Figs. 3, 4a, Table 1). These 20-year response timescale ECS estimates are also similar between different datasets: varying from 2.0 °C (1.6 to 2.5 °C at 90% range) for the HadCRUT4 and NODC datasets to 2.2 °C (1.8 to 2.7 °C) for the Cowtan&Way and Cheng et al. dataset combination. 225 The distributions see a general increase in ECS out to 50-year and 100-year timescales, with greater uncertainty (Figs. 3,4) due to the uncertainty in how slow climate feedback will evolve (Fig. 2). The TCR is analysed by forcing our posterior ensembles with a scenario with a 1pctCO2 scenario and recoding the surface warming for each ensemble member for the 11year average centred on the year in which CO2 reaches twice its initial value (Fig. 5). 230

Variation in the posterior model ensembles
The observational records provide constraints on the parameters of the posterior ensembles that manifest not only as posterior distributions for these parameters but also as relationships between them, as well as between model parameters and key model outputs of interest (such as ECS(t)). While the correlation structure of the 25 parameters' joint posterior 235 distribution is generally quite complex, some key structures emerge that indicate how ECS and TCR uncertainties might be reduced.

Correlations of model parameters and outputs
We assemble the four observationally-consistent ensembles into a single meta-ensemble, where each model realization is reference temperature (1960-1979 vs. 1980-1999) strongly compensate ( = −0.92); ocean heat content from 700-2000m is well-correlated with total ocean heat content ( = 0.64). The ECS on a 50-year timescale, ECS50, is well correlated with ECS20 ( = 0.50) and especially with ECS100 ( = 0.93); we therefore focus on ECS20 and ECS100 hereafter. ECS100 is correlated with I)'J "\]?) ( = −0.56), and ocean heat and carbon content are correlated with the upper ocean tracer exchange timescales and the warming ratios r1 and r2 ( ∈ (0.52,0.63)). 19th-century sea surface temperature anomaly is also strongly 265 correlated with r1 (and hence strongly anticorrelated with r2). Other correlations between model outputs or between model outputs and parameters are significant due to the large meta-ensemble size but are weaker than the above (| | < 0.48) so are not discussed here.

Principle components 270
Correlations between model parameters' posteriors imply that the dimensionality of the parameter space can be reduced and that the observational constraints collapse the posterior solution into a parameter space with fewer degrees of freedom.
Principal Component Analysis, PCA (Jolliffe, 1986; n.b. we do not describe the method here as it is well-described in many textbooks such as Jolliffe,1986) is a straightforward, ubiquitous means to identify these degrees of freedom, and is justifiable here in the absence of strongly nonlinear model equations and given the Gaussian or near-Gaussian likelihoods and priors. 275 We perform a PCA on the model parameters' joint posterior; the results are presented in Figure 6. In the scree plot ( fig. 6a) there is an obvious break point at the fifth principal component (PC), indicating the first four PCs are interpretable and the remaining are unstructured variations (Cattell, 1966). These PCs are shown in fig. 6b-e, with loadings of parameters grouped into feedbacks, oceanic parameters, aerosol sensitivities, and greenhouse gas sensitivities (we include the full PCs in the Supplemental Information (Supplementary Figures S1-S4) for completeness). The first three of these PCs are dominated by 280 feedbacks, radiative forcing from aerosols and radiative forcing from GHGs; the fourth PC is dominated by compensation between the ventilation timescales of different ocean fractions. (PC5 is similarly ocean-dominated; see Figure S5.) https://doi.org/10.5194/esd-2020-79 Preprint. Discussion started: 20 October 2020 c Author(s) 2020. CC BY 4.0 License.
Altogether these PCA results suggest that the observational constraints used herein collapse the 25 model parameters around a four-dimensional subspace, and that these four dimensions reflect the balance between the effects of climate feedbacks, greenhouse gases, and aerosols on atmospheric and oceanic warming, as well as the structure of the large-scale ocean 285 circulation.
Note also there are numerous ways to quantify the number of interpretable or meaningful PCs resulting from a PCA (Jackson, 1993); the first four PCs we focus on here only explain 30.1% of the total variance in the dataset, but the decisive break in the scree plot ( fig. 6a) indicates strong evidence that these PCs are qualitatively different than the remaining PCs 5-290 25. We interpret the remaining variance in the data as reflective of the large amount of parametric uncertainty left in these models beyond what the observations herein can constrain, attesting to the importance of large ensemble simulations as employed here for quantifying uncertainty in ECS and TCR.

Stepwise regression 295
It is also of interest to what extent the model outputs are directly predictable from or explicable by the individual model parameters and/or PCs. Given the roughly Gaussian and linear model equations, multilinear regression is a suitable approach to identifying these links; in particular stepwise regression (Draper & Smith, 1981) follows an automatic procedure of including and removing explanatory variables from the model fit to identify an optimal combination. We perform stepwise regression to predict the model outputs from the model parameters and/or the first , (de)selecting explanatory variables using 300 the Bayesian Information Criterion (Schwartz, 1978) and also including interactions between model parameters (i.e. their products).
We find ECS100 to be significantly a function of all of PC1-4 and their interactions, with an 4 = 0.24. Including only PC1-3 only reduces R 2 by 0.0019, i.e. 4 = 0.24 still. While this is not an especially good fit, it is 98.1% of the variance in the 305 model parameters explained by these PCs (24.2%), i.e. almost all of the model parameter variance these PCs explain directly translates to explained variance in ECS100. In combination with the PCA results, this suggests the observations used here collapse the model parameters around four degrees of freedom, and that ECS100 is proportional to these first three degrees of freedom and their interactions, with the remaining variance in ECS100 due to the remaining variance in the model parameters.
This implies that the observational constraints used here directly constrain ECS100 in our modeling approach, with very little 310 information lost through constraining model parameters. In contrast, other model outputs (e.g. ECS20 or the whole-ocean carbon content) are poorly predicted by these PCs ( 4 = 0.10 or less).
We also performed stepwise regression of model outputs against the 25 model parameters. We found ECS100 to be a 315 significant function of only seven model parameters ( 4 = 0.40; the three feedbacks , the whole-ocean to sea-surface https://doi.org/10.5194/esd-2020-79 Preprint. Discussion started: 20 October 2020 c Author(s) 2020. CC BY 4.0 License.
warming ratio r2, the aerosol forcing coefficients "Q" and •"O-76 , and the CO2 forcing coefficient 674 ). ECS20 by contrast was not a significant function of "Q" or r2 but was a significant function of the slow feedback timescale I)'J (as well as the s and 674 ; 4 = 0.40). This implies both that ECS is not strongly dependent on the other parameters in the model used here, and also that there is a large amount of variation in ECS that can be reduced by better constraining these 320 parameters. In contrast, other model outputs are sensitive to more model parameters and also more predictable; for instance the whole-ocean carbon content increase from 1982-2018 (Δ ) is a significant function of 20 model parameters ( 4 = 0.97).
For this latter case, these various dependencies can be simplified to some extent into a single parameter, the fraction of the ocean which is ventilated over a given timescale R"A&,& , which is a function of the different ocean boxes' volumes and exchange timescales (see Supplementary Information); on the intermediate 50-year timescale R"A&,™š alone explains 75% of 325 the variance in Δ . This is likely due to the simple carbon cycle in the model, whereas a more complex representation of the carbon cycle model would have numerous dependencies not considered here that would decrease the relative importance of R"A&,& in determining Δ . Surprisingly, R"A&,& is not a notably good predictor of other properties of interest (ECS, OHC) and when included in the aforementioned analyses in this section does not meaningfully affect the results or their interpretation.

Discussion
Many studies have combined reconstructions of surface temperature and ocean heat uptake with estimates of radiative forcing to calculate the effective climate feedback during the historic period (e.g. Annan, 2015;Anan and Hargreaves, 2020;Bodman and Jones, 2016;Lewis and Curry, 2014;Skeie et al, 2018;Otto et al., 2013;Tokarska et al., 2020). However, climate feedback strengths evolve over time in complex climate models (e.g. Andrews et al., 2015), indicating that effective 335 climate sensitivity values obtained from historic observations may not apply into the future.
This study applies the historic observational record to constrain how effective climate sensitivity evolves on different response timescales (Figs. 3,4), utilising a model of independent climate feedback terms that respond to forcing over instantaneous (Planck), fast (several days) and slow (multi-decadal) timescales (eqns. 2,3,4). A Bayesian approach is 340 adopted, where uniform prior probability distributions are applied for the fast and slow climate feedbacks (Fig. 2, Supplementary Table S1). Different temperature and ocean heat content observational datasets (Supplementary Table S2 , eqns. 6,7,8) are applied to extract posterior probability distributions (Figs. 2). We then use these posterior probability distributions to evaluate effective climate sensitivity (ECS) and transient climate response (TCR) from 4xCO2 and 1pctCO2 forcing scenarios respectively. 345 Our estimates of ECS on a 20-year timescale is directly comparable to estimates of climate sensitivity made from historical constraints (e.g. Otto et al., 2013;Lewis and Curry, 2014), without explicitly considering the impact of additional slow climate feedbacks that may not have had time to equilibrate in the present day. Our estimates of ECS on a 100-year timescale is directly comparable to the climate sensitivity estimates evaluated in complex climate model simulations from simulations lasting order 100 years, for example using the Gregory et al. (2004) method, and directly comparable to estimates of climate sensitivity from the palaeo-record where any longer, 1000-year, Earth system sensitivity feedbacks have been eliminated (e.g. Rohling et al., 2012;2018).

355
We find that the differences in temperature and heat content datasets have a significant impact on the implied probability distributions of ECS (Figs. 1,3,4; Table 1). However, the implied probability distributions for TCR are sensitive to differences in temperature datasets but not to the differences in ocean heat content datasets (Figs. 1,5; Table 1).
The Cowtan&Way (Cowtan and Way, 2014) temperature reconstruction implies a larger ECS and TCR than HadCRUT4 360 (Figs. 3,4,5; Table 1), because the additional recent warming in Cowtan&Way (Fig. 1a) implies more positive slow climate feedbacks (Fig. 2). The Cheng et al. (2017) ocean heat content reconstruction implies larger ECS than the NODC reconstruction (Fig. 3,4; Table 1), because the additional ocean heat content uptake (Fig. 1b) in Cheng et al. (2017), with identical historic warming, must be balanced by a larger ECS (eq. 2). However, the different heat content datasets make almost no impact on TCR ( Fig. 5; Table 1), possibly because larger historic heat content also implies larger heat uptake on a 365 1pctCO2 scenario and this balances any warming impact of a larger ECS.
Our method constrains ECS over multiple response timescales ( Fig. 3; Table 1). Our constraints on ECS over a 100-year response timescale (ECS100: Table 1) is are directly comparable to previous reviews of climate sensitivity in the literature in AR5 (IPCC, 2013) and Sherwood et al. (2020). The IPCC (2013) AR5 estimate of ECS has a 66% range of 1.5 to 4.5 K 370 (IPCC, 2013), while the recent Sherwood et al. (2020) Bayesian review has a narrower baseline 66% range of 2.6 to 3.6 K.
The Sherwood et al. (2020) range removes both the lower portion of the IPCC ECS range (from 1.5 to 2.5 K) and the upper portion (from 3.7 to 4.5 K) at 66% confidence, suggesting a similar best estimate but with reduced uncertainty than IPCC (2013).

375
Our posterior 66% ranges for ECS100 constrained by historical observations show a more complicated link to the IPCC (2013) range that depends on the combination of statistical interpretations of historic temperature and heat content observations used (Table 1). When applying the HadCRUT4 temperature and NODC heat content datasets, our method provides a 66% range on ECS100 of 1.7 to 3.2 K, in agreement with the lower portion of the IPCC (2013) Figure 3). Therefore, we find that historical observations of temperature and ocean heat content support the IPCC (2013)