Emergent constraints on Equilibrium Climate Sensitivity in CMIP5: do they hold for CMIP6?

. An important metric for temperature projections is the equilibrium climate sensitivity (ECS) which is defined as the global mean surface air temperature change caused by a doubling of the atmospheric CO 2 concentration. The range for ECS assessed by the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report is between 1.5 and 4.5 K and has not decreased over the last decades. Among other methods, emergent constraints are potentially promising approaches to reduce the range of ECS by combining observations and output from Earth System Models (ESMs). In this 15 study, we systematically analyze 11 published emergent constraints on ECS that have mostly been derived from models participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5) project. These emergent constraints are – except for one that is based on temperature variability – all directly or indirectly based on cloud processes, which are the major source of spread in ECS among current models. The focus of the study is on testing if these emergent constraints hold for ESMs participating in the new Phase 6 (CMIP6). Since none of the emergent constraints considered here has been 20 derived using the CMIP6 ensemble, CMIP6 can be used for cross-checking of the emergent constraints on a new model ensemble. The application of the emergent constraints to CMIP6 data shows a decrease in skill and statistical significance of the emergent

Such an extension is not needed for CMIP6 models as their historical simulations cover a longer time period until 2014. 100 To evaluate the resulting constrained probability distribution of ECS, we use the following nomenclature: let be the xaxis variable (i.e. the observable, constraining variable) of climate model and its corresponding target variable (ECS in our case). Following Cox et al. (2018), we use ordinary least squares regression to fit the linear model where ̂ is the predicted target variable for predictor , the intercept of the linear regression line and the slope of the linear regression line. Fitting the regression model includes minimizing the standard error s of the estimate 105 where is the total number of climate models.
In the standard emergent constraint approach, the constrained best estimate for the target variable (here ECS) is given by the regression ̂( 0 ) evaluated at an observed or observationally based (in case of using reanalysis data) value 0 . In that case, the uncertainty in predicting that best estimate is given by the standard prediction error ).
This distribution can be interpreted as the posterior distribution in the regression model based on climate model output but constrained by matching the observation 0 . However, the observation of 0 is not error free and has uncertainties associated with it. Assuming again Gaussian uncertainties, the resulting probability density for given the observation 0 is given by where 0 is the best estimate because the error is assumed unbiased, and 0 2 the variance of the observation about the true 115 value. In a final step, numerical integration is used to calculate the marginal probability density for the constrained prediction of the target variable : In assigning probabilities via equation (6) we have assumed that ( | 0 ) = ( 0 | ) ( 0 is the unknown true value of ) and thereby have implicitly assumed a uniform prior probability density in ECSin other words, that an ECS near 8 K is just as probable as one near 4 K if both are equally consistent with the observation 0 . We do this for simplicity. Note that if 120 we had instead applied this procedure to the climate radiative feedback parameter, which is arguably more physical than applying it to ECS (e.g. Roe and Baker (2007)), the resulting ECS PDFs would have non-symmetric shapes and different means. The conclusions of this study regarding changes from CMIP5 to CMIP6 would be unchanged however in either case. Even though it is also possible to use previous information (e.g., the range seen in GCMs) to inform a prior resulting in a narrower posterior PDF with the emergent constraint added, here we focus on the constraining power of emergent constraints 125 on their own. To assess the effectiveness of a constraint, the emergent-constrained PDFs are compared with published ECS ranges based on expert judgements and other information: if the constraint-based PDF is wider than the one from previous estimates, the constraint brings little added value, even if its validity has been shown.

Statistical significance of emergent constraints
In this study, we evaluate the statistical significance of the different emergent constraints on ECS. The term "statistical 130 significance" refers to the sensitivity of the regression model to changes in the input data, i.e. the removal or addition of datasets. The basis for this evaluation is a non-parametric bootstrapping approach (similar to the one used by Zhai et al. (2015)). For this, we generate 100,000 bootstrap samples of size for every emergent constraint by randomly drawing from the original sample {( , )} with replacement ( is the total number of climate models for which all data required for the considered emergent constraint are available). For each bootstrap sample, the linear (Pearson) correlation coefficient is 135 calculated. Using the probability distribution of the bootstrap samples of , we define a p-value as the probability that exhibits the opposite sign as originally expected from the emergent relationship (see shaded areas in the right columns of Figure 2 to Figure 5). In other words, ≔ CDF(0) ( ≔ 1 − CDF(0) ) for an expected positive (negative) emergent relationship where CDF refers to the cumulative distribution function of the bootstrap samples of . In this context we https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. introduce the following nomenclature: an emergent relationship is called "highly significant" if < 0.02 , "barely 140 significant" if 0.02 ≤ < 0.05, "almost significant" if 0.05 ≤ < 0.1, and "far from significant" if ≥ 0.1.

ESMValTool
All figures in this paper are produced with the Earth System Model Evaluation Tool (ESMValTool) version 2.0 (v2.0) Lauer et al., 2020;Righi et al., 2020). The ESMValTool is an open-source community diagnostics and performance metrics tool for the evaluation of Earth system models (https://www.esmvaltool.org/). An ESMValTool recipe 145 (configuration file defining input data, preprocessing steps and diagnostics to be applied) is available that can be used to reproduce all figures in this paper. This also allows redoing the analysis presented in this study once new model simulations from CMIP6 or other model ensembles become available.

Comparison of emergent constraints on ECS for CMIP5 and CMIP6
In this section we describe and discuss the 11 emergent constraints on ECS summarized in Table 1 using CMIP5 and CMIP6 150 data (sections 3.1 to 3.11) and provide a best estimate for ECS and statistical significance of the 11 emergent constraints in section 3.12. While most of these emergent constraints have been derived using data from the CMIP5 and/or CMIP3 ensembles, to our knowledge none of them has been evaluated on the CMIP6 ensemble so far. The results for the individual emergent constraints described in the following are shown in Figure 2 to Figure 5. Table 4 shows corresponding IPCC likely ranges (i.e. 66% confidence interval) of ECS derived from the probability distributions given by equation (8) and the p-155 values used to assess the significance of the emergent relationships.

3.1
Response of shortwave cloud reflectivity to changes in sea surface temperature (BRI) In this emergent constraint proposed by Brient and Schneider (2016), ECS is correlated with the tropical low-level cloud (TLC) albedo. Differences in the TLC albedo account for more than half of the variance of the ECS in the CMIP5 ensemble.
Following Brient and Schneider (2016), TLC regions are defined as grid points that are in the driest quartile of 500 hPa 160 relative humidity of all grid cells over the ocean between 30°S and 30°N. The albedo of the TLC is obtained by calculating the ratio of TOA shortwave cloud radiative forcing and solar insolation averaged over the TLC region. The regression coefficients of deseasonalized variations of TLC shortwave albedo and sea surface temperature SST (in % per K) are then used as an emergent constraint for ECS. Here, we use observational data from HadISST for SST (Rayner et al., 2003), ERA-Interim for 500 hPa relative humidity (Dee et al., 2011) and CERES-EBAF (Loeb et al., 2018) (Smith and Reynolds, 2003), the original publication used similar observation-based datasets.
Our analysis yields a likely range for ECS of 3.72 K ± 0.56 K for CMIP5 ( 2 = 0.38) and 4.36 K ± 1.16 K for CMIP6, with much lower 2 = 0.15. The original publication stated a best estimate of 4.0 K, with a very low likelihood of values below https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.
2.3 K (90% confidence). The PDFs of the Pearson correlation coefficient obtained by the non-parametric bootstrapping 170 approach (right column in Figure 2) show that the emergent relationship exhibits the expected negative correlation for CMIP5 and CMIP6. The emergent relationship is highly significant for CMIP5 ( = 0.0002) and barely significant for the CMIP6 ensemble ( = 0.0219).

Temperature variability (COX)
The emergent constraint on ECS proposed by Cox et al. (2018) uses interannual variation of global mean temperature 175 calculated from its variance (in time) and one-year-lag autocorrelation. In contrast to the majority of emergent constraints which focus on cloud-related processes, this constraint is based on the fluctuation-dissipation theorem, which relates the long-term response of the climate system to an external forcing (ECS) to short-term variations of the climate system (climate variability). This arguably places the constraint on a more solid theoretical foundation, although several questions were raised on the robustness of the results (Brown et al., 2018;Po-Chedley et al., 2018;Rypdal et al., 2018). As observational 180 data, here we use the HadCRUT4 dataset (Morice et al., 2012) over the time period 1880-2014. For the COX constraint, we assess a likely ECS range of 3.03 K ± 0.71 K for CMIP5 ( 2 = 0.31) and 3.44 K ± 1.15 K for CMIP6 ( 2 = 0.08). Cox et al. (2018) derived a likely range of 2.8 K ± 0.6 K from a different subset of CMIP5 models. For CMIP6, the distribution of evaluated from the bootstrap samples (right column of Figure 2) shows high probability densities around = 0, which means that many bootstrap samples show a very low correlation coefficient. Moreover, while the majority of bootstrap 185 samples indicate a positive , there is also a considerably high fraction of bootstrap samples that show a negative correlation (orange shaded area). In contrast to that, the distribution of for the CMIP5 ensemble supports a clear positive correlation.
Consequently, the COX emergent relationship is highly significant for the CMIP5 ensemble ( = 0.0010), but only almost significant for the CMIP6 ensemble ( = 0.0545).

Southern hemisphere Hadley cell extent (LIP) 190
The results of Lipat et al. (2017) show that the multi-year average extent of the Hadley cell correlates with ECS in CMIP5 models. The Hadley cell edge is defined as the latitude of the first two grid cells from the equator going south where the zonal average 500 hPa mass stream function calculated from December-January-February means of the meridional wind field changes sign from negative to positive. Lipat et al. (2017) explain this correlation by tying it to the observed correlation of the interannual variability in mid-latitude clouds and their radiative effects with the poleward extent of the Hadley cell. 195 For the calculation of the emergent constraint, we use reanalysis data from ERA-Interim (Dee et al., 2011) for the meridional wind speed over the time period 1980-2005. Our application of this emergent constraint gives ECS likely ranges of 2.97 K ± 0.76 K for CMIP5 ( 2 = 0.18) and 3.66 K ± 1.27 K for CMIP6 ( 2 = 0.03). The original publication does not specify an ECS range. The emergent constraint by Lipat et al. (2017) is highly significant for CMIP5 ( = 0.0043) but far from significant for CMIP6 ( = 0.2039). 200 https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.

3.4
Large-scale lower-tropospheric mixing (SHD) Sherwood et al. (2014) proposed that the degree of mixing in the lower troposphere determines the response of boundarylayer clouds and humidity to climate warming, as the associated moisture transport would increase rapidly in a warmer atmosphere due to the Clausius-Clapeyron relationship. The large-scale component D of this mixing is defined as the ratio of shallow to deep overturning. D is calculated from the vertical velocities averaged over two height regions: 850 hPa and 700 205 hPa for shallow overturning and 600, 500 and 400 hPa for deep overturning. Both quantities are averaged over parts of the tropical ocean region away from the regions of highest SST and strongest mid-level ascent, in particular the region 160°W -30°E, 30°S -30°N, wherever air is ascending at low levels. As observationally based data, we use vertical velocities from ERA-Interim (Dee et al., 2011) over the time period 1989-1998 similar to the original publication. We derive ECS likely ranges of 3.65 K ± 0.63 K for CMIP5 ( 2 = 0.28) and 3.74 K ± 1.11 K for CMIP6 ( 2 = 0.04). Sherwood et al. (2014) do 210 not give a best estimate for ECS based on the large-scale component of mixing D or its small-scale counterpart S (section 3.5) but for the sum of D+S only (see section 3.6). The evaluation of the bootstrap distribution of indicates that the SHD constraint is highly significant for the CMIP5 ensemble ( = 0.0006) but far from significant for the CMIP6 ensemble ( = 0.1120).

Small-scale lower-tropospheric mixing (SHS) 215
The small-scale mixing S (Sherwood et al., 2014) is calculated from the differences in relative humidity and temperature between 700 and 850 hPa. The differences are averaged over all grid cells within the upper quartile of the annual mean 500 hPa ascent rate (within ascending regions) in the tropics. The tropics are defined as region between 30°S and 30°N. In the Cloud Feedback Model Intercomparison Project models (CFMIP, Webb et al. (2017)), for which convective tendencies were available, upward moisture transport by parameterized convection was shown to increase more rapidly with warming for 220 higher values of S. We use reanalysis data from ERA-Interim (Dee et al., 2011) for temperature and relative humidity to calculate the observationally based constraint (1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998). Our analysis shows a likely range of ECS of 3.07 K ± 0.68 K for CMIP5 ( 2 = 0.13) and 3.41 K ± 1.13 K for CMIP6 ( 2 = 0.15). The correlation of S and ECS is almost significant for the CMIP5 ( = 0.0581) and the CMIP6 ( = 0.0638) ensemble. The SHS constraint is one of the two analyzed emergent constraints (ZHA being the other exception) that shows a higher coefficient of determination 2 for the CMIP6 than for the 225 CMIP5 ensemble.

Lower tropospheric mixing index (SHL)
The lower tropospheric mixing index (LTMI) formulated by Sherwood et al. (2014) is defined as the sum of the small-scale mixing S (see section 3.5) and the large-scale mixing D (see section 3.4), which are supposed to capture complementary components of the total mixing phenomenon. Sherwood et al. (2014) argue that the increase in dehydration depends on 230 initial mixing linking it to cloud feedbacks and thus also to ECS. For this constraint, we derive an ECS likely range of 3.42 K https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. ± 0.63 K for CMIP5 ( 2 = 0.41) and 3.65 K ± 1.05 K for CMIP6 ( 2 = 0.19). Sherwood et al. (2014) give a best estimate of about 4 K with a lower limit of 3 K. As illustrated by the right column of Figure 3, the SHL emergent relationship is highly significant for both considered climate model ensembles, CMIP5 ( = 0.0001) and CMIP6 ( = 0.0114), although weaker in the newer ensemble. 235

Error in vertical profile of relative humidity (SU)
Another emergent constraint on ECS that targets uncertainties in cloud feedbacks was proposed by Su et al. (2014). They show that changes in the Hadley circulation are physically connected to changes in tropical clouds and thus ECS.
Consequently, the inter-model spread in the change of the Hadley circulation in an ensemble of climate models is well correlated with the corresponding changes in the TOA cloud radiative effect. Moreover, Su et al. (2014) found a correlation 240 between a model's ECS and its ability to represent the present-day Hadley circulation. The latter is calculated from the tropical (45°S -40°N) zonal-mean vertical profiles of relative humidity from the surface to 100 hPa. These profiles are then used to define the x-axis of the SU constraint by calculating a performance metric based on the slope of the linear regression between a climate model's relative humidity profile and the corresponding observational reference. Similarly to the original publication, we use humidity observations from AIRS (Aumann et al., 2003) for pressure levels greater than 300 hPa and 245 MLS-Aura data (Beer, 2006) for pressure levels of less than 300 hPa. Our analysis yields a constrained likely range of ECS of 3.30 K ± 0.90 K for CMIP5 ( 2 = 0.08) and 3.69 K ± 1.59 K for CMIP6 ( 2 = 0.03). The original publication gives a best estimate of 4 K with a lower limit of 3 K. Figure 4 shows that in addition to the low 2 values, the emergent relationship shows different slopes for CMIP5 and CMIP6. For the CMIP5, the expected positive correlation is found, while for CMIP6, a negative correlation is found. This suggests that the constraint is not working (any more) when applied to the 250 CMIP6 data. Consequently, the SU constraint is almost significant for the CMIP5 ensemble ( = 0.0919) and far from significant for the CMIP6 ensemble ( = 0.8573).

Tropical mid-tropospheric humidity asymmetry index (TIH)
Tian (2015) found a link between mid-tropospheric humidity over the tropical Pacific and simulated moisture, precipitation, clouds, and large-scale circulation and thus ECS in CMIP3 and CMIP5 models. The study explains this link with the 255 similarity of mid-tropospheric humidity and precipitation patterns as both are related to the ITCZ. The proposed tropical mid-tropospheric humidity asymmetry index to constrain ECS is defined as relative bias (in percent) in simulated annual mean 500 hPa specific humidity averaged over the Southern Hemisphere (SH) tropical Pacific (30°S -0°, 120°E -80°W) minus the bias averaged over the Northern Hemisphere (NH) tropical Pacific (20°N -0°, 120°E -80°W) when compared with observations. Here, we use humidity observations from AIRS (Aumann et al., 2003) over the time period 2003-2005 as 260 reference dataset. We assess a likely ECS range of 3.88 K ± 0.78 K for CMIP5 ( 2 = 0.24) and 4.07 K ± 1.21 K for CMIP6 https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.

Southern ITCZ index (TII)
In addition to the humidity index, Tian (2015) proposed an emergent constraint on ECS based on the southern ITCZ index 265 (Bellucci et al., 2010;Hirota et al., 2011). This index is defined as the climatological annual mean precipitation bias averaged over the south-eastern Pacific (30°S -0°, 150°W -100°W). The southern ITCZ index is calculated in mm day -1 and dominated by the so-called double ITCZ, a common problem in many CMIP5 climate models. Tian (2015) found a link between double-ITCZ bias and simulated moisture, precipitation, clouds, and large-scale circulation in CMIP3 and CMIP5 models. He argues that this could explain the link found between the double-ITCZ bias and ECS. As reference data, we use 270 observed precipitation data for the years 19862005 from GPCP (Adler et al., 2003). We calculate an ECS likely range of 3.87 K ± 0.70 K for CMIP5 ( 2 = 0.33) and 4.11 K ± 1.17 K for CMIP6 ( 2 = 0.06). Tian (2015) specifies a best estimate of 4.0 K. As shown by the right column of Figure 4, the TII emergent relationship is highly significant for the CMIP5 ensemble ( = 0.0004), but only almost significant for the CMIP6 ensemble ( = 0.0634).

Difference between tropical and mid-latitude cloud fraction (VOL) 275
The study by Volodin (2008) aims at the geographical distribution of clouds in climate models. It was the first published emergent constraint on ECS, relying on models from CMIP3, such that both CMIP5 and CMIP6 are out-of-sample tests for this constraint. He shows that high ECS models tend to simulate a higher total cloud cover over the southern mid-latitudes and a lower total cloud cover over the tropics (relative to the multi-model mean). This can be used to establish an emergent relationship between the ECS and the difference in tropical total cloud cover (28°S -28°N) and the southern mid-latitude 280 total cloud cover (56°S -36°S). Analogous to the original study, we use the ISCCP-D2 data (Rossow and Schiffer, 1991) as observational reference. For the VOL constraint, we calculate a constrained likely range of ECS of 3.74 K ± 0.64 K for CMIP5 ( 2 = 0.38) and 4.14 K ± 1.13 K for CMIP6 ( 2 = 0.16), whereas the original publication gives a range of 3.6 K ± 0.4 K (standard deviation) for a climate model ensemble of CMIP3 models. The emergent constraint by Volodin (2008) is one of the two emergent constraints that is highly significant for both the CMIP5 ( = 0.0004) and the CMIP6 ( = 285 0.0057) ensemble. Zhai et al. (2015) focus on the variations of marine boundary layer clouds (MBLC), which largely contribute to the shortwave cloud feedback and thus to the uncertainty in modeled ECS. Their central quantity is the response of the MBLC fraction to changes in the sea surface temperature (SST) in subtropical oceanic subsidence regions for both hemispheres (20° 290 -40°). On short (seasonal) and long (centennial under a forcing) time scales, this quantity is well correlated with ECS among an ensemble of CMIP3 and CMIP5 models. Together with observations of cloud fraction from CloudSat/CALIPSO (Mace et al., 2009), SST from AMSRE SST (AMSR-E, 2011) and vertical velocity from ERA-Interim (Dee et al., 2011), the seasonal response of MBLC fraction to changes in SST forms an emergent constraint on ECS. We assess a likely ECS range of 3.35 K ± 0.72 K for CMIP5 ( 2 = 0.05) and 3.81 K ± 0.60 K for CMIP6 ( 2 = 0.61). In their original publication, Zhai 295 et al. (2015) found an ECS range of 3.90 K ± 0.45 K (standard deviation) for a combination of CMIP3 and CMIP5 models.

Response of seasonal marine boundary layer cloud fraction to SST changes (ZHA)
In terms of statistical significance, the results of the ZHA constraints are somewhat surprising: although CMIP5 data (in combination with CMIP3 data) were successfully used in their original publication, our approach finds that the emergent relationship for the CMIP5 models is far from statistically significant. The reason for this disagreement is the set of climate models used. For our analysis, we use 11 additional CMIP5 models that were not used in the original publication (i.e. 300 ACCESS1-0, ACCESS1-3, bcc-csm1-1, bcc-csm1-1-m, CCSM4, GFDL-ESM2G, GFDL-ESM2M, IPSL-CM5A-MR, IPSL-CM5B-LR, MPI-ESM-MR and MPI-ESM-P). Due to a lack of publicly available data, the model CESM1-CAM5 that is used in the original publication is not included in our analysis. The effect of choosing different subsets of CMIP5 models on the emergent relationship is illustrated in Figure 6. Using the original CMIP5 models from the original publication gives a considerably higher correlation ( 2 = 0.38) than using all available CMIP5 models ( 2 = 0.05). This result shows a strong 305 dependency of this emergent constraint on the subset of climate models used. Moreover, strangely and uniquely among the metrics examined here, the ZHA constraint is highly significant for the CMIP6 ensemble ( = 0.0022) but far from significant in the updated CMIP5 ensemble ( = 0.1195).

Best estimates for ECS and statistical significance of the 11 emergent constraints
In most cases, the emergent relationships (left columns of Figure 2 to Figure 5) show the same sign of the slope for CMIP5 310 and CMIP6, with the SU constraint being the only exception. However, the coefficient of determination ( 2 ) is considerably lower for CMIP6 compared to CMIP5 for all but two constraints: SHS and ZHA. The probability distributions of the constrained ECS that we obtain (middle columns of Figure 2 to Figure 5) give similar results: except for the ZHA constraint, the constraint on the CMIP6 ensemble is weaker, i.e. the constrained PDFs derived from the CMIP6 ensemble are broader than their respective CMIP5 counterparts. As shown in Table 4, for CMIP5, the range of the best estimates for ECS is 2.97 K 315 to 3.88 K, while the corresponding CMIP6 best estimates are higher for each of the tested emergent constraints, resulting in an ECS range of 3.41 K to 4.36 K.
The 2 of the emergent relationships and the constrained range of ECS each show a strong dependency on the climate model ensemble used, even though a physical explanation is given for each emergent constraint that is thought to be valid for every climate model ensemble describing the real world. In order to assess changes in the skill of the emergent constraints when 320 moving from CMIP5 to CMIP6 we use the degree of statistical significance relative to a standard = 0.05 threshold ("highly", "barely", "almost", "far from") of the different emergent constraints shown in the right columns of Figure 2 to Figure 5 as a proxy. We consider reductions in the skill of the constraint as significant if the interquartile ranges of the bootstrapped correlation coefficient for CMIP5 and CMIP6 data do not overlap (colored boxes in Figure 7). This failure to https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.
overlap is seen for all emergent constraints but SHS, confirming that most of the constraints have lost more skill than could 325 be explained, by sampling uncertainty alone, if the models were independent. Only two constraints (SHL and VOL) show skill with high significance for both the CMIP5 and CMIP6 ensembles. Two more emergent constraints (BRI and TIH) are highly significant for CMIP5 but barely significant for CMIP6. Three other constraints are far from significant on CMIP6 (LIP, SHD and SU), but only one fails a significance test on CMIP5 (ZHA).

4
Discussion 340 As shown in the previous sections, most emergent relationships show smaller coefficients of determination when evaluated on the new CMIP6 ensemble compared to the CMIP5 ensemble. In this section, we discuss possible reasons for these differences. As reported by Caldwell et al. (2014), the large amount of data provided by modern ESMs can generate spurious correlations of variables between past climate and ECS just by chance, especially when only a small number of climate models is considered. This would cause the performance of the emergent constraint to be reduced on out-of-sample data (like 345 the new CMIP6 ensemble), since the emergent relationship appeared just by chance and not because of a physically based mechanism.
A further reason for the weaker emergent relationships in CMIP6 may be the increased complexity of the participating ESMs. Each emergent constraint approach is based on the assumption that a single observable process or physical aspect in the current climate dominates the uncertainty in ECS. Some emergent constraints such as ZHA and BRI relate changes in 350 cloud properties (here: low-level cloud fraction and cloud reflectivity) on seasonal or interannual time-scales (here: driven by changes in SST) to ECS. This means that it has to be implicitly assumed that the observable changes in these properties on seasonal or interannual time-scales are basically driven by the same mechanisms as the changes in cloud properties as a result of climate forcing. While this assumption seems to make sense, we do not know whether it actually holds in a significantly different climate or if other or additional mechanisms also might become important. For this reason it also 355 remains unclear if the regions or cloud regimes that have been selected based on present-day climate and that are used to https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. calculate the emergent constraints will be equally important under significant climate change. For example Lauer et al. (2010) showed with a regional climate model that the relationship between cloud amount and lower tropospheric stability in the stratocumulus deck over the Southeast Pacific derived from present-day data will be altered under global warming.
For CMIP6 models, Zelinka et al. (2020) showed that cloud feedbacks and thus ECS in high-sensitivity models are 360 dominated by changes in clouds over the Southern Ocean, while in CMIP3 and CMIP5 the uncertainty in cloud feedbacks is dominated by clouds in the subtropical subsidence regions. One might speculate that a possible reason for this might be an improved simulation of clouds over the Southern Ocean in some models (Bodas-Salcedo et al., 2019;Gettelman et al., 2019a) as shown for some pre-CMIP6 model versions evaluated by Lauer et al. (2018). The findings of Zelinka et al. (2020) could also at least partly explain the larger inter-model spread in climate sensitivity due to more and different regions / 365 clouds types dominating the cloud feedbacks resulting in a weaker emergent constraint compared with CMIP5 models. They found that on average, the shortwave low cloud feedback is larger in CMIP6 than in CMIP5, which they primarily relate to changes in the representation of clouds. As a possible explanation, Zelinka et al. (2020) give an increase in mean-state supercooled liquid water (i.e. increase in the cloud water liquid fraction) in mixed-phase clouds resulting in less pronounced increases in low-level cloud cover and water content with warmer SSTs particularly in mid-latitudes. 370 We note that also observational uncertainties can play a role as using different observational datasets for a given variable as a proxy for observational uncertainty might lead to different emergent constraints. As this study uses only one combination of observational dataset(s) to calculate the emergent constraints as in the original published emergent constraint studies, the error estimations given by our analysis are expected to underestimate the true error. This could be investigated by systematic tests using different observational datasets and/or combinations of thereof as a proxy for observational uncertainty. Where 375 available, additionally observational uncertainty estimates could be used to give better estimates of the likely constrained range of ECS. A major challenge associated with this is, however, to determine how observational uncertainties propagate to the space-time scales represented by the models because of the typically not well known correlation of observational errors in space and time (e.g. Bellprat et al. (2017)).

Summary 380
This paper assesses 11 different emergent constraints on ECS, of which most are directly or indirectly related to cloud feedbacks, by applying them to results from ESMs contributing to CMIP5 and CMIP6. Of particular interest are the results from CMIP6, since all analyzed emergent constraints were published prior to the availability of CMIP6 data. In summary, we assess a range of 2.97 K to 3.88 K for the best estimates of ECS for CMIP5 and a range of 3.41 K to 4.36 K for CMIP6. This increase in the best estimate of ECS can be found for every constraint that we analyzed, and can be at least partly 385 explained by the increased multi-model mean ECS of CMIP6 which was not accompanied by systematic changes in the constraint variables that could explain this increaseleading to regression fits with higher intercept values at observed constraint values. This is also illustrated by the CMIP5 and CMIP6 multi-model means in the left columns of Figure 2 to https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. Figure 5 (colored dots), in which the connecting line between the CMIP5 and CMIP6 multi-model mean is not parallel to the CMIP5 emergent relationships for all emergent constraints. However, these results need to be treated with great care as the 390 analysis showed that all considered emergent relationships are sensitive to outliers and the subset of the climate model ensemble used to fit the emergent relationship. Moreover, our results also show that except for ZHA and SHS, all emergent relationships are weaker (in terms of the coefficient of determination 2 ) in CMIP6 compared to CMIP5, which means that the corresponding emergent relationships are able to explain less of the ECS variation simulated by the newer CMIP6 models than by those of CMIP5. 395 Of the 11 emergent constraints analyzed, four are found to be "working" in the sense that they show statistically significant skill on both the CMIP5 and CMIP6 ensembles: BRI, SHL, TIH and VOL. In contrast, the three emergent constraints LIP, SHD and SU are found to be "not working anymore" as their p-values are well above 0.1 in CMIP6 (far from significant). COX, TII and SHS are somewhat in-between and could be grouped as "indeterminate" as their p-values in CMIP6 dropped from highly or barely significant to almost significant. It is noteworthy that among the group "working" three out of the four 400 emergent constraints point to rather high ECS values of above 4 K in CMIP6, while among the group "not working anymore" two out of three emergent constraints point to rather small ECS values of 3.3 K and less. This might be evidence that emergent constraints on ECS might point to rather higher than lower values in CMIP6.
Typically, studies proposing a single emergent constraint on ECS do not explicitly take into account model interdependency and all approaches discussed above apply a linear regression of some kind to the model data. This means that it is implicitly 405 assumed that the individual data points (i.e. climate models) are independent. As some modeling groups provide output from multiple ESMs and some ESMs from different modeling groups share components and code, this is clearly not the case.
Duplicated code in multiple models is expected to lead to an overestimation of the sample size of a model ensemble and may result in spurious correlations (Sanderson et al., 2015). This limitation also applies to this study as the tests for significance assume that all models are independent. Possible approaches could be to stop treating all models equally by either applying a 410 model weighting based on a model's interdependence with the other models or by simply reducing the ensemble size taking into account models only that are above a given (yet to be defined) interdependence score. Promising approaches to quantify the model interdependency that could be followed include, for example, the studies of Sanderson et al. (2015); Sanderson et al. (2017) and Knutti et al. (2017b).
A further limitation of this study involves the calculation of significance for the different climate model ensembles: our non-415 parametric bootstrapping approach calculates the spread of sample values one might obtain if the truth looked like the sample, not the spread of true values consistent with the sample. This is particularly relevant for datasets with a large 2 and a small sample size, in which case the bootstrap spread will be overconfident. However, since most emergent constraints show small 2 , this effect is expected to be small in our study. Also the calculation of the ECS itself is a source of uncertainty: even though widely used in literature, the Gregory regression method (Gregory et al., 2004) is known to be only 420 an approximation of the true climate sensitivity. A recent paper (Rugenstein et al., 2020) shows that the true equilibrium warming obtained from integrating the climate models until a new equilibrium is reached is 17% (median) higher than the https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. one estimated from the first 150 years of the simulation as done in the Gregory method. However, only a few ESMs provide simulations long enough to assess the true climate sensitivity. The CMIP endorsed LongRunMIP (Rugenstein et al., 2019) could be a promising way to estimate the true climate sensitivity that can then be used to reevaluate emergent constraints and 425 their proposed underlying physical mechanisms.
ECS is the product of the complex interactions of the many components. Thus, constraining ECS with a single physical process might overly simplify this problem. With increasing computational resources available to climate science, more and more detailed of these interactions can be taken into account in a modern ESM. In contrast, the predecessor versions CMIP3 and CMIP5 were less complex with fewer components, so constraining uncertainties of a single dominant process may have 430 allowed for a more successful constraining of ECS than in more complex models. As a conclusion, we argue that to constrain ECS for the latest generation of climate models, it might be beneficial to apply multivariate approaches that are able to consider multiple (different) relevant physical processes at once and thus are able to get a broader picture of the complex reality. New machine learning techniques are a promising avenue forward for such multivariate approaches and for constraining uncertainties in multi-model projections (Schlund et al., in review) with the aim of further improving climate 435 modelling and analysis (Reichstein et al., 2019).   Specific humidity (hus) AIRS (Aumann et al., 2003(Aumann et al., ) [2003(Aumann et al., -2005 TII (Tian, 2015) Southern ITCZ index [mm day -1 ]  Precipitation (pr) GPCP (Adler et al., 2003(Adler et al., ) [1986(Adler et al., -2005 VOL ( Table A1 and Table A2 for  https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. Table A1 and Table A2). The shaded areas around the regression lines correspond to the standard prediction errors (equation (5) Table 1 for details on the individual observational datasets used) with its uncertainty range given as standard error (gray shaded area). The horizontal dashed lines show the best estimates of the constrained ECS for CMIP5 (blue) and CMIP6 (orange).

The colored dots mark the CMIP5 (blue) and CMIP6 (orange) multi-model means. Middle column: probability densities for the constrained ECS following equation (8) (solid lines) and the unconstrained model ensembles (histograms). Note that for each
465 individual emergent constraint, a different subset of climate is used due to the availability of data (see Table A1 and Table A2 for      https://doi.org/10.5194/esd-2020-49 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.

9
Code availability 505 The corresponding ESMValTool recipe that can be used to reproduce the figures of this paper will be included in ESMValTool v2.0 Lauer et al., 2020;Righi et al., 2020) at the time of publication of this paper.

Data availability
CMIP5 and CMIP6 model output (see Table 2 and Table 3) is available through the Earth System Grid Foundation (ESGF) and can be directly used within the ESMValTool (e.g. https://esgf-data.dkrz.de/projects/esgf-dkrz/). Downloading 515 instructions and preprocessing scripts for the observational datasets detailed in Table 1 are included in the ESMValTool distribution.

Author contribution
MS led the writing and analysis of the paper. MS and AL coded the emergent constraints in the ESMValTool. VE, PG and SCS contributed to the concept of the study and the interpretation of the results. All authors contributed to the writing of the 520 manuscript.

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgements
This work has been supported by the European Union's Horizon 2020 Framework Programme for Research and Innovation