Emergent constraints on transient climate response (TCR) and equilibrium climate sensitivity (ECS) from historical warming in CMIP5 and CMIP6 models

Climate sensitivity to CO2 remains the key uncertainty in projections of future climate change. Transient climate response (TCR) is the metric of temperature sensitivity that is most relevant to warming in the next few decades and contributes the biggest uncertainty to estimates of the carbon budgets consistent with the Paris targets. Equilibrium climate sensitivity (ECS) is vital for understanding longer-term climate change and stabilisation targets. In the IPCC 5th Assessment Report (AR5), the stated “likely” ranges (16 %–84 % confidence) of TCR (1.0–2.5 K) and ECS (1.5–4.5 K) were broadly consistent with the ensemble of CMIP5 Earth system models (ESMs) available at the time. However, many of the latest CMIP6 ESMs have larger climate sensitivities, with 5 of 34 models having TCR values above 2.5 K and an ensemble mean TCR of 2.0± 0.4 K. Even starker, 12 of 34 models have an ECS value above 4.5 K. On the face of it, these latest ESM results suggest that the IPCC likely ranges may need revising upwards, which would cast further doubt on the feasibility of the Paris targets. Here we show that rather than increasing the uncertainty in climate sensitivity, the CMIP6 models help to constrain the likely range of TCR to 1.3–2.1 K, with a central estimate of 1.68 K. We reach this conclusion through an emergent constraint approach which relates the value of TCR linearly to the global warming from 1975 onwards. This is a period when the signal-to-noise ratio of the net radiative forcing increases strongly, so that uncertainties in aerosol forcing become progressively less problematic. We find a consistent emergent constraint on TCR when we apply the same method to CMIP5 models. Our constraints on TCR are in good agreement with other recent studies which analysed CMIP ensembles. The relationship between ECS and the post-1975 warming trend is less direct and also non-linear. However, we are able to derive a likely range of ECS of 1.9–3.4 K from the CMIP6 models by assuming an underlying emergent relationship based on a two-box energy balance model. Despite some methodological differences; this is consistent with a previously published ECS constraint derived from warming trends in CMIP5 models to 2005. Our results seem to be part of a growing consensus amongst studies that have applied the emergent constraint approach to different model ensembles and to different aspects of the record of global warming. Published by Copernicus Publications on behalf of the European Geosciences Union. 738 F. J. M. M. Nijsse et al.: An emergent constraint on climate sensitivity from simulated historical warming


Introduction
The key uncertainty in projections of future climate change continues to be the sensitivity of global mean temperature to changes in the Earth's energy budget, called "radiative forcing". This sensitivity is usually characterised in terms of the global mean temperature that would occur if the atmospheric carbon dioxide concentration was doubled, for which the radiative forcing is reasonably well-known.
Two related quantities are used to characterise the climate sensitivity of Earth system models (ESMs). Equilibrium climate sensitivity (ECS) is an estimate of the eventual steadystate global warming at double CO 2 . Transient climate response (TCR) is the mean global warming predicted to occur around the time of doubling CO 2 in ESM runs for which atmospheric CO 2 concentration is prescribed to increase at 1 % per year. Across an ensemble of ESMs, TCR values are less than ECS values because of deep-ocean heat uptake, which leads to a lag in the response of global temperature to the increasing CO 2 concentration (Hansen et al., 1985). The ratio of TCR over ECS tends to decrease with increasing ECS and depends on spatial pattern effects (Armour, 2017).
Despite decades of advances in climate science, the Earth's ECS and TCR remain uncertain. The "likely" range of ECS (66 % confidence limit) has been quoted as 1.5 to 4.5 K in all of the five Assessment Reports (ARs) of the Intergovernmental Panel on Climate Change (IPCC) starting in 1990, aside from the fourth AR which moved the likely lower range temporarily to 2 K. Similarly the likely range of TCR is given as 1 to 2.5 K in the IPCC AR5, based on multiple lines of evidence.
There have been numerous attempts to constrain ECS using the record of historical warming or palaeoclimate data  and more recently using emergent constraints which relate observed climate trends, variations or other variables to ECS using an ensemble of models Cox et al., 2018a). However, debate still rages about the likely range of ECS (Brown et al., 2018;Bretherton and Caldwell, 2020;Cox et al., 2018b;Gregory et al., 2019), in part because observed global warming is a rather indirect measure of global warming at equilibrium. On the other hand, TCR is more closely related to the rate of warming and therefore ought to be more amenable to constraint by the record of global warming (Bengtsson and Schwartz, 2013;Gregory and Forster, 2008;Jiménez-de-la Cuesta and Mauritsen, 2019;Tokarska et al., 2020). Nevertheless, the accepted likely range of TCR has also resisted change , for reasons we will discuss in this paper. At the time of the AR5, the CMIP5 ESMs produced central estimates (mean ± SD) of ECS (3.3 ± 0.7 K) and TCR (1.8 ± 0.3 K) that were broadly consistent with these IPCC likely ranges. However, there has been a general drift upwards towards higher climate sensitivities in the new CMIP6 ESMs, such that more than one-third of the new CMIP6 models now have ECS values over 4.5 K (Forster et al., 2020) and five have TCR values over 2.5 K (Table 1). If the real climate system is similarly sensitive, the Paris climate targets will be much harder to achieve (Tanaka and O'Neill, 2018).
Therefore some key science-and policy-relevant questions arise: a. Are such high climate sensitivities consistent with the observational record?
b. If so, do the CMIP6 models demand an upward revision to the IPCC likely ranges for climate sensitivity?
We address these questions in this paper by evaluating the historical simulations of global warming from the CMIP6 models. In particular, we explore an emergent constraint on TCR based on global warming from 1975 onwards (Jiménezde-la Cuesta and Mauritsen, 2019; Tokarska et al., 2020) but using the CMIP6 models and observational data up to 2019.
Emergent constraints are increasingly used to assess future change by exploiting statistical relationships in multimodel ensembles between an observable and a variable describing future climate (Cox et al., 2018a;Hall et al., 2019). In the work presented here, we use the latest CMIP6 multimodel ensemble to define an emergent relationship between historical warming (expressed in terms of global mean surface temperature, GMST, the observable) and TCR (the variable related to future climate). In line with published recommendations (Hall et al., 2019;Klein and Hall, 2015), we check the robustness of the resulting emergent constraint against the CMIP5 ensemble, using exactly the same methodology as for CMIP6. We also follow the suggestion of Hall et al. (2019) in striving to base the emergent constraint on sound physical reasoning.
From physical principles, we expect values of TCR to be very well-correlated with simulated global warming across a model ensemble. By definition, TCR is a measure of warming from a simulation that is driven by an exponential 1.0 % per year increase in CO 2 . Historical global warming has been driven by a qualitative similar forcing, albeit somewhat less rapid. In reality, the atmospheric CO 2 concentration has increased at about 0.5 % per year since 2000 (Dlugokenchy and Tans, 2019), augmented by additional positive radiative forcing from other well-mixed greenhouse gases and partially offset by the cooling effects of anthropogenic aerosols.
The radiative effects of the rise in greenhouse gas concentrations are relatively well-known  and are broadly similar in different ESMs. By contrast, the radiative forcing due to changes in anthropogenic aerosols, especially indirect effects via changes in cloud brightness and lifetime, are poorly constrained Bellouin et al., 2019).
These uncertainties in aerosol forcing have hindered attempts to constrain TCR or ECS from the rate of warming, especially during the pre-1980 period when the burning of sulfurous coal led to increases in CO 2 and increases Table 1. List of CMIP6 models used in this study and their effective radiative forcing at CO 2 doubling F 2× , the climate feedback parameter λ, equilibrium climate sensitivity (ECS) and transient climate response (TCR). Mean values are reported for models with multiple realisations. The values of F 2× , ECS and λ are computed using the Gregory method (Gregory, 2004). Models above the horizontal line were used in the extended simulations to 2019. Models below the line did not have SSP simulations available at the time of writing. Consistently derived values for CMIP5 are displayed in Table S1 in the Supplement. The ensemble-mean values of TCR and ECS are shown in bold font as these are most relevant to this study.

Centre
Model in sulfate aerosols that went up almost together (Andreae et al., 2005). As a result it has been difficult, based purely on the observational record of global warming, to distinguish between a model with high climate sensitivity and strong aerosol cooling and a model with low climate sensitivity and weak aerosol cooling (Kiehl, 2007).
In order to minimise the effects of uncertainties in aerosol forcing, we need periods in which aerosol radiative forcing changes relatively little compared to the change in radiative forcing due to CO 2 and other well-mixed greenhouse gases. Fortunately, this applies to the decades after 1975 when total aerosol load from global SO 2 and NH 3 emissions were sim-  (Stevens et al., 2017). For this reason, we focus on global warming since 1975. However, we also test the robustness of our conclusions to different start dates (see Fig. 5c), including the start year of 1970 as used by Jiménez-de-la Cuesta and Mauritsen (2019) (hereafter JM19).
To establish an emergent constraint on ECS, we investigate the appropriate functional form between observed warming and climate sensitivity. Due to the slow response of the ocean, this is not expected to be linear, and using a set of assumptions, JM19 proposed an analytical form based on a two-layer box model. By computing the model parameters directly per model, we investigate the appropriateness of this analytical function and use it to derive an emergent constraint.
The remainder of this paper is organised as follows: in Sect. 2 we describe our methodological choices; Sect. 3 contains the emergent constraints on TCR and ECS, and Sect. 4 contains the discussion and conclusions. More technical details concerning the regression methods are given in the Appendix.

Choice of period over which to calculate warming trends
To constrain climate sensitivity using observed warming, we seek a period for which the forcing is relatively similar across models. In order to identify such a period we compute the effective radiative forcing F (ERF) for each model run using following Forster et al. (2013). Here N is the difference in net top-of-the-atmosphere radiative flux and T is the difference in near-surface temperature, both computed as global annual mean anomalies relative to the initial state. We calculate the signal-to-noise ratio of F at each time as the model mean F divided by the standard deviation of F across the model ensemble. Figure 1 shows how the signal-to-noise ratio of the ERF varies from 1880 to 2010. It is notable that the signal-to-noise ratio increases rapidly from around 1975, as relatively wellknown greenhouse gas forcing continues to increase but the uncertain aerosol forcing begins to saturate. We have therefore focused our analysis on the post-1975 warming, but we also performed a sensitivity analysis by varying the start year between 1960 and 2005.

Selection of CMIP6 model runs
We use all currently available CMIP6 models that have control (piControl), historical, a shared socioeconomic pathway simulation (SSP1-2.6, SSP2-4.5, SSP3-7.0 or SSP5-8.5) and a 1 % CO 2 increase per year (1pctCO2) experiment. We extend the historical simulations from 2014 to 2019 using the shared socioeconomic pathways (SSPs) scenario runs. Additional warming over this 5-year period varies very little across the SSPs, so by default we use SSP2-4.5 as this has the largest number of participating models at the time of writing.

Calculation of model sensitivity
From the 1pctCO2 experiment TCR is determined as the average temperature difference from the corresponding piControl run between 60 and 80 years after the start of the simulation (IPCC, 2013a). ECS is computed using the Gregory method (Gregory, 2004) on the first 150 years of the abrupt-4xCO2 simulations. The values of ECS and TCR that we derived are given in Table 1.

Calculation of warming trend
Historical warming (our observable) is found from the historical and SSP simulations using the global annual mean surface air temperature (GMSAT) smoothed with an equally weighted running mean. Some of these models have multiple runs starting from different initial conditions, forcing time series or parameter settings. We use all available runs.
We use smoothed GMSAT to calculate warming. This is to limit the random effect of internal variability on the forced change we wish to constrain. We choose a centred 11-year running mean to remove shorter interannual and mid-term variability from sources such as the El Niño-Southern Oscillation (ENSO) and to reduce the effect of longer-period modes of natural variability. We have tested the robustness of the constraint on TCR to the length of the running mean. It remains relatively invariant past a length of 8 years, suggesting most of the internal variability in GMSAT resides in shorter periods.
Warming T is calculated as the difference in GM-SAT between two periods, typically the years 1975-1985 and 2009-2019 (or equivalently, the difference in smoothed temperature between 2014 and 1980). We have chosen the end year to be 2019 to maximise the chances of discrimination between high-and low-sensitivity models. As the forcing from CO 2 increases with time, the warming in more sensitive models is more likely to diverge from less sensitive ones as we extend the period over which we calculate the trend. Extending to 2019 also allows us to include the most recent observational data and to eliminate possible effects from the warming slowdown between 2000 and 2012. This slowdown has been attributed to a combination of internal variability and decreased forcing, amongst other things (Medhaug et al., 2017). We assess the impact of the slowdown by comparing emergent constraints derived from time series truncated to have different end years.

Transient climate response
Once choices of the length of the running mean and start and end years for the calculation of T are fixed (our observable), we can fit an emergent relationship between the observable and our values of TCR via linear regression. Linear regression is performed using a hierarchical Bayesian model which can take into account all the different simulations per model: models with more simulations have a better-constrained post-1975 warming. This results in a set of 127 simulations from 26 different models. The regression method is further described in Appendix A. The choice of linear regression is justified by considering a two-layer energy balance model (Winton et al., 2010;Geoffroy et al., 2013a): Here T is the top layer temperature anomaly, T 0 the deepocean temperature anomaly, λ is the climate feedback parameter, is the ocean heat uptake efficacy (reflecting a pattern effect), and γ is the ocean heat uptake parameter (Winton et al., 2010). The parameters C and C 0 are the heat capacity of the upper ocean and deep ocean, respectively. We will refer to this model as EBM-or EBM-1 if is set to 1.
We follow the approximations in Williamson et al. (2018) and JM19 in assuming no change in deep-ocean temperature (T 0 = 0) and assuming the upper ocean to be in equilibrium (dT /dt = 0). These assumptions are reasonable for timescales larger than a decade but smaller than a century (see JM19) and lead to the following relationship: Here s is a forcing parameter, defined as F 2× /F , and T is the difference in temperature between two periods. For fitting, we include an offset η, so that TCR = s T + η, allowing for a possible model mis-specification and regression dilution (Hahn, 1977). A hierarchical linear regression was adopted which includes both uncertainty in T and TCR (see Appendix). The choice of 1975 for the starting period minimises the uncertainty in our estimate of TCR. However, uncertainty is relatively flat for starting periods between 1975 and 1990. We also investigated the sensitivity of our TCR constraint to the final year, the length of the running mean, the model selection and the method of regression (see Fig. 5).

Equilibrium climate sensitivity
Similarly to the constraint on TCR, we use the warming between 1975-1985 and 2009-2019 to find an emergent constraint on ECS. The relationship between climate sensitivity and observed warming or TCR is not expected to be linear, as a smaller fraction of equilibrium warming is typically realised in models with high climate sensitivity within the first decades of warming (Hansen et al., 1985;Rugenstein et al., 2020). Using Eq. (2), ECS = F 2× /λ, and again assuming the upper ocean to be in equilibrium and the deep-ocean temperature to not change, TCR and ECS are related via So the relationship between ECS and T ends up as The forcing parameter is denoted by s , defined as F /F 2× , and e is the ocean heat uptake parameter, defined as γ /F 2× . The function has an asymptote at s − e T = 0 and turns negative for larger T values. As negative ECS values are unphysical, we modify the equation by keeping ECS at infinity for T > s /e . The appearance of negative ECS for high T is an artefact of the "no deepocean temperature rise" assumption: it corresponds to an equilibrium between the heating effect of F −λ T , balanced by − γ T . In reality, this last term cancels out completely with γ T 0 at equilibrium. Fitting is done using orthogonal distance regression.
To test the validity of these assumptions, we perform two checks. Firstly, by explicitly simulating the two box model, we investigate to what extent the analytical functional form  Table S3, and related ECS and TCR via ECS = TCR/(1 − e TCR), with e = 0.24, the model mean.
deviates from the true functional form. We are especially interested in the upper region of this functional form, which, if too steep, could lead to an upper estimate of ECS biased high.
Secondly, we fit the ocean heat uptake and forcing parameters for all CMIP6 models, following the two algorithms described in Geoffroy et al. (2013a, b), with slight modifications to ensure solutions exist for all models described in the Supplement to this paper.
Using these fitting parameters, we investigate the physical basis of Eq. (5) with the EBM-and EBM-1 models. If this function derived from the two-box model is a faithful representation, T /(s − e T ) should be better related to ECS with individual model parameters than with the bulk fitted parameter. Figure 2 plots model TCR versus ECS, related via Eq. (4), using the ensemble mean of the fitted ocean parameters. Figure 3a shows the temperature anomaly over the period 1880 to 2019 simulated by 26 different CMIP6 models running a total of 127 simulations smoothed with an 11-year running mean. The reference period in this case is 1880-1910. Model runs have been colour-coded by their TCR value, with darker red indicating models with higher TCR and darker blue indicating lower TCR. Black lines are observational global warming datasets over the same period (Morice et al., 2012;Rohde et al., 2013;Lenssen et al., 2019;Zhang et al., 2020). Models with higher TCR either show large warming at the end of the period, or portray a strong aerosol cooling over the 20th century, particularly visible as a dip around 1960-1970 (notably CNRM-ESM1, UKESM1-0-LL and EC-Earth-Veg). Figure 3b shows the same information for the end of the historical period although the reference period is now chosen to be 1975-1985, after the temperature dip. The positive correlation intuitively expected between TCR and temperature increase T is much clearer for this time interval.

Transient climate response
The T for each model simulation in Fig. 3b is used for the emergent constraint on TCR in Fig. 4a. Observational warming (black vertical dashed line) is the mean of Had-CRUT4 (Morice et al., 2012), Berkeley Earth (Rohde et al., 2013), GISSTEMP4 (Lenssen et al., 2019) and NOAA v5 (Zhang et al., 2020). The 90 % observational confidence interval (grey shaded vertical area) is a combination of the observational uncertainty and the internal variability. To avoid double-counting observational uncertainty, the 90 % regression confidence interval details the uncertainty of the best estimate of T versus TCR (see Appendix for details). The models from the previous CMIP5 generation generally fall within the prediction interval of the CMIP6 emergent constraint: the emergent constraint is therefore robust across generations (Klein and Hall, 2015). The best estimate (1.68 K) from this emergent constraint is higher than the best estimate using the larger set of models that have historical simulations up to 2014 but no future scenarios (median: 1.54 K; 5 %-95 % range: 0.76-2.30 K). This is primarily due to the fact that 2004-2014 overlaps with the slowdown in surface temperature increase over the 2000-2012 period, but the wider range of models also impacts the regression. Figure 4b shows the probability density functions (pdfs) of TCR derived from the emergent constraint for both CMIP6 and the earlier CMIP5 model ensembles. For comparison, the raw model range in each CMIP is plotted as a histogram, as well as the reported IPCC AR5 likely range (assuming a normal distribution). Both CMIP5 and CMIP6 pdfs are very similar (central estimates differ by 0.1 K) even though CMIP6 contains many more high-TCR models. As a continuation of the historical CMIP5 simulation, RCP8.5 is chosen. The tighter constraint in CMIP5 is mostly a consequence of differences in internal variability, which is 42 % larger in CMIP6 than in CMIP5, in line with the findings of Parsons et al. (2020).

Period selection
Estimates of TCR depend on the final year chosen for the emergent constraint. Uncertainty in the estimate of TCR reduces as time increases and the central estimate converges as shown in Fig. 5a. Later end years are favoured as the signalto-noise ratio of the net radiative forcing increases monotonically after 1975 (see Fig. 1). In the 21st century, the climate impact of volcanoes has been dominated by smaller eruptions (Stocker et al., 2019). The scenarioMIP simulations used for 2015-2019 include a small background forcing from  volcanoes (O'Neill et al., 2016). We estimate errors from a potential mismatch between model and real forcing to be relatively small.
To mitigate the effect of internal variability, we use a running mean of GMSAT. Figure 5b shows the likely range of TCR as a function of the length of the running mean. Since we use all available simulations including multiple realisations of the same model in the emergent constraint, the effect of internal variability is already reduced and the length of the running mean on the estimate of TCR is small -the central estimate and the likely range remain relatively invariant past a window length of 8 years. Figure 5c shows the effect of the start year on the emergent constraint. Uncertainty in the estimated value of TCR is relatively flat between start years of 1975 and 1990. Uncertainty for start years from 1990 onwards increases until the estimate and the uncertainty revert towards the raw CMIP6 ensemble statistics (no predictive power) at later years.

Regression method
When only one realisation per model is used for ordinary least square regression, regression dilution takes place in which the slope is underestimated (Cox et al., 2018b). This has the potential to lead to a slight overestimation of TCR (Fig. 5d), as the observed warming is at the lower end of the model range. JM19 used the average warming for models with multiple simulations. As not all models provide a sufficient number of simulations, they state that this leads to a minor inflation of the estimation of uncertainty. Although we use a hierarchical Bayesian model as the default (details in Appendix A), we have investigated three other regression methods used in the emergent constraint literature: ordinary least squares (OLS) with only one realisation per model, OLS on the mean warming per model and orthogonal distance regression (Fig. 5d). While the first three give very similar results, orthogonal distance regression gives a somewhat lower estimate of TCR. Orthogonal distance re- gression assumes that there are both errors in the predictor and in the predictand, which leads to a steeper slope. As our observation lies under the average, a steeper slope results in a smaller mean TCR value. Orthogonal distance regression is known to sometimes overcompensate for errors in the independent variable, for instance in the case of the statistical model not being perfectly known if the model deviates from being a perfectly straight line (Carroll and Ruppert, 1996).

Model selection
Model selection can prevent double-counting of very similar models (Sanderson et al., 2015;Cox et al., 2018a). As models from the same centre can have very dissimilar climate sensitivities (Chen et al., 2014;Jiménez-de-la Cuesta and Mauritsen, 2019) and sensitivity can change drastically with only small adjustments to parameters (Zhao et al., 2016), we initially use all available models in the CMIP5 and CMIP6 ensemble. Figure 5e shows that this choice does not significantly change the best estimate of the transient response and that using one model per modelling centre only very slightly increases the variance, even as models from one modelling centre are relative similar (Fig. 2). Figure 6a shows the emergent constraint on ECS. For CMIP5, the 5 %-95 % confidence interval lies between 0.96 and 4.09 K. The constraint is stronger for CMIP6, with the  Table 2. The results are highly dependent on the time interval chosen. For shorter intervals, the theoretical functional form shows an increased steepness for higher values of T , making it more difficult to constrain. For instance, taking the time period in line with JM19, i.e. 1970JM19, i.e. -1989JM19, i.e. versus 1994JM19, i.e. -2005 we obtain a 5 %-95 % interval of 0.70-8.41 K for CMIP5, significantly wider than found in JM19, which reported a 5 %-95 % confidence interval of 1.72-4.12 K. The major differences lie in the definition of the theoretical function, where we have cut off the unphysical branch, and a correction of a coding error in JM19.

Equilibrium climate sensitivity
In Fig. 6b the dark green dots represent expected ECS from observed warming (using Eq. 5) and true ECS, using the fitted parameters from Fig. 6a. The yellow and red crosses denote the same, but now every model uses its own ocean parameters, F 2× and model forcing computed using Eq. (1). The yellow data shows the expected ECS computed from the EBM-1 model. Full parameter fits for both models are found in Tables S2 and S3.
The EBM-model performs poorly for large values of the ocean heat uptake efficacy parameter . Models with around 1.8 in particular show an expected ECS far above a realistic range, with one expected ECS reaching a value of 89 K. Equation (5) is non-linear and small errors in parameter estimation quickly lead to large errors in ECS. For the EBMmodel in particular, high internal variability may skew the parameter estimate upwards.
The EBM-1 fit leads to an improved estimation of ECS compared to the Eq. (5) fit in 53 % of the cases, whereas the EBM-model leads to an improvement in 34 % of cases. This pattern in similar in the case of only historical models being used, with 66 % and 42 % improved respectively.

Functional form
Explicitly simulating the two-layer model shows that the steepness of the graph is overestimated: assuming no deepocean temperature rise (T 0 = 0) dampens the temperature response of the upper ocean. Geoffroy et al. (2013a) derived an analytical solution to the two-box model of Eq. (2) under the weaker assumption of a linearly increasing forcing, which also showed a less steep increase in ECS with T for high values of T . This leads to the question whether the upper range of ECS is overestimated. In Fig. S1 in the Supplement, we show this is not the case: by using a decreased ocean heat uptake parameter e and forcing, the two analytical solutions do overlap, which demonstrates that using the approximated Eq. (5) in the regression should not lead to biased results in the emergent constraint but simply that the fitted parameters will be slightly different from the model parameters. This also explains why the regression using model parameters in Fig. 6b is not significantly better than using the overall fitted parameters of Fig. 6a.

Discussion and conclusion
The emergent constraint found on TCR in this paper is very similar to the ones found in JM19 and Tokarska et al. (2020) (see Table 3). The most important determinant of the constraint is the periods taken. We have slightly expanded on the number of models compared to Tokarska et al. (2020), taking a different period, and we have compared further regression choices.
Our best estimate for TCR from the CMIP6 models is 1.68 K, which remains close to the centre of the likely range (1-2.5 K) given in the IPCC AR5 (IPCC, 2013b). The emergent constraint on TCR from the CMIP6 models is, however, strong enough to indicate a much tighter likely range of TCR (16 %-84 %, 1.29-2.05 K).
We find a consistent emergent constraint from the CMIP5 models against observed global warming from 1975 to 2019 (16 %-84 %, 1.27-1.88 K). Furthermore, both of these likely ranges overlap strongly with the emergent constraint on TCR derived by Jiménez-de-la Cuesta and Mauritsen (2019) using a similar method but only considering global warming from 1970 to 2005 (5 %-95 %, 1.17-2.16 K). In terms of the classification proposed by Hall et al. (2019), we therefore now have a confirmed emergent constraint on TCR, with consistency across generations and a sound theoretical framework.
Equilibrium climate sensitivity is likely between 1.9 and 3.4 K (16 %-84 % range). This finding strengthens previous evidence that ECS is very unlikely to be above 4.5 K (Cox et al., 2018a;Jiménez-de-la Cuesta and Mauritsen, 2019;Goodwin et al., 2018). For instance, Goodwin et al. (2018) used history matching, a simple emulator, and observations of surface temperature, ocean heat uptake and carbon fluxes to estimate climate sensitivity and concluded that there is a Does the presence of many models with ECS over 4.5 K mean that the CMIP5 generation was better or more useful for understanding climate sensitivity than CMIP6? From the point of view of emergent constraints the answer is clearly no, as model spread helps capture the shape of the emergent relationship.
In the future, we hope that this TCR constraint will become the basis for constraints also on TCRE (transient climate response to emissions), but this will require the inclusion of additional constraints on land and ocean carbon uptake.
However, we are now in a position to answer the questions that we posed in Sect. 1: a. Are such high climate sensitivities consistent with the observational record? No; models with high ECS (> 4.5 K) and high TCR (> 2.5 K) do not appear to be consistent with observed global warming since 1975 (Fig. 3b).
b. If so, do the CMIP6 models demand an upward revision to the IPCC likely ranges for climate sensitivity? No; instead, emergent constraints on TCR (Fig. 4) and ECS ( Fig. 6) suggest narrower likely ranges for TCR (1.3-2.1 K) and ECS (1.9-3.4 K). Figure A1. Schematic of the hierarchical Bayesian model employed. The data layer models a best estimate of historical warming for each model. With this estimate, a regression is performed between historical warming and TCR in the process layer. Using information from both layers and observed warming, a probability density function is estimated for TCR as the final step.

Appendix A: Hierarchical linear regression
To systematically include the information from all model realisations, we use a hierarchical Bayesian model (Sansom, 2014). This model includes two layers: the normal linear regression (process layer) and a layer that computes the expected warming per model from all its initial value realisations (data layer). To include the initial value ensemble, we assume that each model m has a "true" or "best" value for warming over the last decades denoted by T T . We further assume that every realisation j of a model gives a value of T that is drawn from a normal distribution with mean T T and a standard deviation σ x that is the same across all models. Our hierarchical model consists of two steps -for each model the best estimate of historical warming is computed and with this value a simple linear regression is performed: T m,j | T m , σ x ∼ normal ( T m , σ x ) , TCR m |α, β, σ y ∼ normal α + β T m , σ y .
The probability density function for TCR is then sampled from the observed warming between 1975-1985 and 2009-2019 T obs using the emergent constraint. The observational uncertainty σ obs is taken as the sample standard deviation of the four observational datasets. TCR pred = normal α + βnormal T obs , σ 2 x + σ 2 obs , σ y (A3) The second layer corresponds to normal linear regression, while the first layer makes an estimate of the true T m . Note that especially for models with only few initial value members, the best T m does not necessarily correspond to the mean value of these ensemble members but will instead lie closer to the regression line.