Climate sensitivity estimates – sensitivity to radiative forcing time series and observational data

Inferred effective climate sensitivity (ECSinf) is estimated using a method combining radiative forcing (RF) time series and several series of observed ocean heat content (OHC) and near-surface temperature change in a Bayesian framework using a simple energy balance model and a stochastic model. The model is updated compared to our previous analysis by using recent forcing estimates from IPCC, including OHC data for the deep ocean, and extending the time series to 2014. In our main analysis, the mean value of the estimated ECSinf is 2.0 C, with a median value of 1.9 C and a 90 % credible interval (CI) of 1.2–3.1 C. The mean estimate has recently been shown to be consistent with the higher values for the equilibrium climate sensitivity estimated by climate models. The transient climate response (TCR) is estimated to have a mean value of 1.4 C (90 % CI 0.9–2.0 C), and in our main analysis the posterior aerosol effective radiative forcing is similar to the range provided by the IPCC. We show a strong sensitivity of the estimated ECSinf to the choice of a priori RF time series, excluding pre-1950 data and the treatment of OHC data. Sensitivity analysis performed by merging the upper (0–700 m) and the deep-ocean OHC or using only one OHC dataset (instead of four in the main analysis) both give an enhancement of the mean ECSinf by about 50 % from our best estimate.


Introduction
A key question in climate science is how the global mean surface temperature (GMST) responds to changes in greenhouse gases or other forcings. Climate sensitivity is determined by complex feedbacks that operate on very different timescales and may depend on the transient climate state. The standard metric for climate sensitivity is the equilibrium climate sensitivity (ECS) (or Charney sensitivity) given as the change in temperature at equilibrium for a doubling of CO 2 , neglecting long-term feedbacks associated with vegetation changes, carbon feedbacks and ice sheet dynamics. Estimates of the ECS are either based on complex climate models or observations of past climate Knutti et al., 2017). The Intergovernmental Panel on Climate Change (IPCC) presented a likely (> 66 % probability) range for ECS of 1.5 to 4.5 • C .
Regarding the Earth as a climate laboratory and the changes in atmospheric composition and land use over the historical record as a perturbation experiment, observationally based analyses of Earth's energy budget have been used to infer the climate sensitivity (Forster, 2016). Since the current climate is in a nonequilibrium state, observationally based methods can only account for the feedbacks operating during the historical period. These methods using the historical period with observations are referred to as inferred estimates (Armour, 2017;Forster, 2016) and only have the capability to derive an effective climate sensitivity and are generally significantly lower than ECS estimates from atmosphere-ocean general circulation models (AOGCMs) (Armour, 2017;Knutti et al., 2017).
Since the IPCC's fifth assessment report (AR5), there has been an improved understanding of the causes of the Published by Copernicus Publications on behalf of the European Geosciences Union. 880 R. B. Skeie et al.: Climate sensitivity estimates differences in estimates of climate sensitivity from climate models and observationally based methods for two main reasons. First, recent analysis of time-varying feedbacks in AOGCM simulations from Coupled Model Intercomparison Project Phase 5 (CMIP5) (Proistosescu and Huybers, 2017;Armour, 2017;Andrews et al., 2015) has indicated that in most AOGCMs the net feedbacks become more positive over time as a new equilibrium is approached. This is most likely due to the evolution of the pattern of sea surface temperature increase in the Pacific and Southern oceans and associated cloud feedbacks. Whether this slow warming has manifested itself in the climate record used for the analysis is the difference between effective and equilibrium climate sensitivity (Armour, 2017;Knutti et al., 2017). Second, ECS formally refers to global near-surface air temperature ("tas" in CMIP5 nomenclature), and in observationally based methods observed surface temperature records that are a blend of air temperature over land and sea surface temperature (SST) over ocean are used in the estimation. Several observed surface temperature records exist with different methods to account for gaps in the observations. Differences in historical surface temperature warming among various analyses is more than 0.1 • C (Haustein et al., 2017), arising mainly from approaches taken in regions that are missing or have limited spatial coverage of observations. According to Richardson et al. (2016), there is a general bias in the surface temperature records since water heats more slowly than the air above and due to undersampling in fast-warming regions (e.g., the Arctic). Taking both effects into account, Armour (2017) shows that previous estimates of inferred effective climate sensitivity (ECS inf ) of about 2.0 • C are consistent with estimates of ECS of 2.9 • C from climate models.
Although it is now established that the ECS estimated by the use of complex climate models and ECS inf estimated by using historical observations would differ, there is still considerable spread in ECS estimates from models and between observationally based ECS inf estimates. Using observationally based methods and complex models are complementary approaches to quantifying the net effect of the feedbacks that determine the climate sensitivity. Complex climate models include processes that are highly parameterized, in particular the representation of clouds, precipitation and convection, and associated feedbacks, which are crucial for estimating the ECS (Bony et al., 2015;Tan et al., 2016). There is also a large spread in observationally based estimates (Knutti et al., 2017). A better understanding of the feedbacks in the complex models as well as improvements and understanding differences among the observationally based methods are needed.
Observational estimates of climate sensitivity can be improved using longer data series of higher quality (e.g., correcting for observational biases in temperatures or better forcing estimates) (Urban et al., 2014). Estimates can also be improved by including observational data on other climate variables which were not previously available. Several stud-ies indicate that the temporary slowdown in GMST at the beginning of the millennium coexisted with increased accumulation of heat in the deep ocean (e.g., Meehl et al., 2011Meehl et al., , 2013Balmaseda et al., 2013;Watanabe et al., 2013;Chen and Tung, 2014;Lyman and Johnson, 2013). Johansson et al. (2015) found that if ocean heat content (OHC) change below 700 m over this period were included in their observationally based methods, the mean value of ECS inf would increase.
In this study we use our estimation model that was first documented in Aldrin et al. (2012) and further developed in Skeie et al. (2014). Our method is more complex than the common energy-balance-based estimates (Forster, 2016) in that we embed a simple climate model into a stochastic model with radiative forcing time series as input, treating the Northern and Southern Hemisphere (NH and SH) separately and including a vertical resolution of the ocean (40 layers). The radiative forcing time series are linked to the observations of OHC and temperature change through the simple climate model and the stochastic model, using a Bayesian approach. A unique feature with our method is that we use several observational datasets. The method estimates not only the ECS inf but simultaneously also provides posterior estimates of the radiative forcing, as well as posterior uncertainty estimates in the observations datasets and correlations between them. In this study we further develop our estimation model with additional observational datasets, including heating rates of the deep ocean (below 700 m), new forcing time series from the IPCC AR5 as well as extended time series from 2010 to 2014 to update our estimate of ECS inf . We carry out a number of sensitivity experiments to investigate causes of differences in observationally based ECS inf estimates due to differences in the input data (observations of surface temperature, OHC and radiative forcing (RF)).

The model
Our full model consists of a simple climate model (SCM) with an idealized representation of the Earth's energy balance, a data model that describes how observations are related to the process states and finally a parameter model that expresses our prior knowledge of the parameters (Aldrin et al., 2012).
The core of our model framework is the SCM, a deterministic energy balance/upwelling diffusion model (Schlesinger et al., 1992). The SCM calculates annual hemispheric nearsurface temperature change (blended SST and surface air temperature) and changes in global OHC as a function of estimated RF time series. The vertical resolution of the ocean is 40 layers down to 4000 m. The output of the SCM can be written as m t (x 1750:t , θ ), where x 1750:t (the RF from 1750 until year t) and θ are the true, but unknown, input values to the SCM. θ is a vector of seven parameters, each with a physical  meaning. One of these parameters is the climate sensitivity, and the other parameters determine how the heat is mixed into the ocean, which includes the mixed layer depth, the air-sea heat exchange coefficient, the vertical diffusivity in the ocean and the upwelling velocity (see Schlesinger et al., 1992 andAldrin et al., 2012 for details). The true state of some central characteristics (g t ) of the climate system in year t with corresponding observations can then be written as g t = m t (x 1750:t , θ) + n t , where n t is a stochastic process, with three terms, representing long-term and short-term internal variability and model error. For the short-term internal variability, we use the Southern Oscillation index (Table 1) to account for the effect of El Niño-Southern Oscillation (ENSO). The term for the long-term internal variability was implemented in Skeie et al. (2014), and the dependence structure of this term (i.e., correlations over time and between the three elements) is based on control simulations with a general circulation model from CMIP5 (see Skeie et al., 2014, for details) This term will also represent other slowly varying model errors due to potential limitations of the SCM and forcing time series. The third error term is included to account for more rapidly varying model errors.
For the (available) long-term observational data that defines g t we consider the surface temperatures separately for the Northern and Southern Hemispheres and the OHC separately for 0-700 m and below 700 m. Each of these elements of g t are associated with one or more corresponding observationally based data series (Table 1), with individual error terms. To gain as much information as possible, we use several datasets for the same physical quantity (e.g., OHC above 700 m) simultaneously (Aldrin et al., 2012;Skeie et al., 2014). Most of the data series are provided with corresponding yearly standard errors (Fig. S7a in the Supplement). However, we only use the temporal profiles of the reported errors and estimate their magnitudes within the model, taking into account the possibilities that the reported standard errors may under-or overestimate the true uncertainty (Appendix A and Aldrin et al., 2012;Skeie et al., 2014).
The unknown quantities are given prior distributions as presented in Skeie et al. (2014). The ECS inf is given a vague prior, uniform (0, 20), and the informative priors for θ based on expert judgment are listed in Table S1. We apply a Bayesian approach in the spirit of Kennedy and O'Hagan (2001) to the calibration of computer models and  Figure S2 show the corresponding probability density functions.
use Markov Chain Monte Carlo (MCMC) techniques to sample from the posterior distribution (Aldrin et al., 2012).

Setup
The starting point, here called case A, is the main result from Skeie et al. (2014) (hereafter named Skeie14) with some modifications (see Appendix A). These modifications changed the mean ECS inf value from 1.8 • C (median 1.7 • C, 90 % credible interval (CI) 0.92-3.2 • C) to 2.0 • C (median 1.8 • C, 90 % CI 1.0-3.4 • C) (Fig. 1a, case A). The transient climate response (TCR) is calculated by running the model with 1 % per year increase in CO 2 using the joint posterior distribution of the model parameters. These modifications increased the mean value of TCR from 1.4 to 1.5 • C and the 90 % CI shifted slightly to larger values (Fig. 1b).
In case A, we used four hemispheric pairs of observationally based estimates of surface temperatures from about 1880 to 2010 and three series for OHC above 700 m from about 1950 to 2010 and RF from Skeie et al. (2011Skeie et al. ( , 2014 (Table 1). The forcing time series used in case A are hereafter named Forc_Skeie2014 and the priors of each forcing mechanisms included (Table S2) are described in detail in the Appendix D of Skeie14.
The potential for improving the constraint of the estimate of the climate sensitivity using observationally based methods depends crucially on the quality of the input forcing data and the quality and amount of observational data. In case B, we include new and improved knowledge of the forcing time series and add new data for OHC below 700 m, and observational data are extended to 2014. More specifically, in case B we did the following: 1. replaced the Forc_Skeie14 prior with the AR5 effective radiative forcing (ERF) estimates (Myhre et al., 2013), hereafter named Forc_AR5. The priors for the forcing mechanisms included (Table S2) are constructed to be consistent with the uncertainties provided in AR5 and the same relative uncertainty for the prior forcing is used over the entire time period. ERF includes rapid adjustments allowing the full influence on clouds except through surface temperature changes (Sherwood et al., 2014;Boucher et al., 2013;Myhre et al., 2013).
2. included data for OHC below 700 m (ORAS4) and added one extra data series for OHC above 700 m (also ORAS4). Note that the deep-ocean OHC is added as a separate dataset and not merged with the upper ocean. Including data on OHC in the deep ocean thus has the potential to better constrain the parameters in the SCM that determine how the heat is mixed into the ocean as well as the posterior estimates of the effective radiative forcing.
3. used updated versions of the data prior to 2010.

extended the time series from 2010 to 2014.
Previous studies using similar methods have obtained different results with respect to the estimated ECS inf (Knutti et al., 2017). We perform three sensitivity experiments to investigate the effects of different choices about how to use OHC data (cases C and D, Sect. 4.1) and how sensitive the results are to pre-1950 data (case E, Sect. 4.2).

Figure 2. Observed and fitted (posterior mean) values for the GMST for cases A to E (a-e).
The shaded areas show the 90 % CI for fitted values, i.e., the sum of the output from the deterministic SCM and the short-term internal variability excluding the terms for long-term internal variability and model error. Figure S3 shows three sets of fitted values for the GMST for the main analysis that include the long-term internal variability and model error.

Improved estimate of inferred effective climate sensitivity
Here we present our revised estimate of ECS inf by replacing the RF prior with IPCC data, including OHC data below 700 m and extending the time series to 2014 (case B). We consider this analysis using the IPCC forcing estimates, including deep-ocean OHC and extending the length of the input data series as the most trustworthy and physically based case and thus regard it as our main estimate of the ECS inf , with a mean of 2.0 • C (median 1.9 • C, 90 % CI 1.2-3.1 • C). The mean value is similar while the 90 % CI is narrower compared to the refined Skeie14 estimate (Fig. 1a). The individual influence of the four major updates between cases A and B is shown in Fig. S1 and described at the end of this section. The mean value of TCR in case B is 1.4 • C (median 1.3 • C, 90 % CI 0.9-2.0 • C) (Fig. 1b). As for the ECS inf estimate, the TCR mean value is similar and the 90 % CI is narrower compared to the refined Skeie14 estimate (Fig. 1b).
The GMST change is well reproduced (Fig. 2, case B), and less of the recent GMST change is attributed to long-term internal variability compared to the refined Skeie14 estimate ( Fig. S5a-b).
The rate of change in anthropogenic forcing is larger between 1940 and 1970 using Forc_AR5 compared to Forc_Skeie14 (Fig. 3). The fit to the GMST in the 1980s-1990s improved ( Fig. 2 case B vs. A), where the root mean square error between 1980 and 1999 decreased from 0.12 to 0.077 • C. Figure S5 shows posterior estimates of the longterm internal variability, the ENSO term and the model errors. Parts of the increase in GMST over the last decades are explained as long-term internal variability, but the amplitude decreases in case B compared to case A ( Fig. S5a-b). In case B, the estimated amplitude of the multi-decadal internal variability (about 0.2 • C in each hemisphere, cf. Fig. S5) is in good agreement with the decadal trends in global surface temperatures found in unforced control simulations in the multi-model ensemble from CMIP5 (0.2-0.4 • C; Palmer and McNeall, 2014).
The prior anthropogenic mean forcing in 2010 increased from 1.5 to 2.3 W m −2 from case A to case B when Forc_AR5 replaced Forc_Skeie14. For case A, the posterior forcing is shifted to higher values compared to the prior, suggesting that the historical data and our method support higher forcing than the Forc_Skeie14 prior. When the prior is changed to Forc_AR5 in case B, the posterior for the anthropogenic forcing is much closer to the prior (Fig. 3), which in- dicates that the method and observational data are more in accordance with the new prior than the old one. The same holds for the total forcing (Fig. S4). The 90 % CI for the posterior anthropogenic forcing was 1.3 to 2.8 W m −2 in case A compared to 1.3 to 3.4 W m −2 in case B. The upper limit of the 90 % CI is shifted to larger values. The most uncertain part of the forcing time series is associated with aerosols. The difference between the two forcing priors is mainly due to a much weaker aerosol forcing in Forc_AR5 than in Forc_Skeie14 (compare the two dashed-dotted bars in Fig. 4a). While the posterior aerosol forcing was shifted to smaller negative values in case A, the prior and posterior for aerosol forcing are similar in case B (Fig. 4b). A relatively weak aerosolcloud interaction as included in Forc_AR5 is consistent with the recent findings in Malavelle et al. (2017) on how sulfate aerosols from volcanic emissions influences clouds.
The ERFs in AR5 are based on an assessment of several studies reflecting improved knowledge of the forcing mechanisms compared to the one-model RF results used in Skeie14. The new ERFs gave a better posterior estimate of GMST (Fig. 2) and reduced change from prior to posterior forcing (Fig. 3). Note that the number of forcing time series that can be combined was 18 in Skeie14, including 3 time series for volcanic and 8 for aerosols, compared to only 1 time series for each of these forcing mechanisms in Forc_AR5 (Table S2). This gives less flexibility in the time development of the forcing in case B compared to case A; however, the GMST change is better reproduced in the 1980s-1990s using Forc_AR5 compared to Forc_Skeie14.
Ultimately, global climate change is governed by the radiative imbalance at the top of the atmosphere (TOA) and modulated by the internal variability. Forcing by greenhouse gases and aerosols as well as albedo changes, feedback processes and the radiative responses to temperature changes determine this imbalance. With a positive net imbalance at TOA, energy accumulates in the Earth system, mainly as increasing OHC (Church et al., 2011). Since OHC is the dominant energy storage in the system, these data series have profound influence on the ECS inf estimates (Tomassini et al., 2007;Skeie et al., 2014;Aldrin et al., 2012;Johansson et al., 2015). In case B, we have extended our use of OHC data, so in addition to the three OHC data series above 700 m, we have included the ORAS4 data above and below 700 m (Table 1) as two separate data sources. Including the deep-ocean OHC data gives a stronger constraint on the overall accumulation of heat in the system, and the posterior estimates of the parameters of θ that determine the vertical transport of heat in the ocean -the effective diffusivity and the upwelling velocity -increase by 44 and 31 %, respectively. Having separate data series for the two ocean layers also provides information that influences the balance between negative (by aerosols) and positive forcings, since these forcings have different evolution over time (cf. Sect. 4.1).
In Fig. 5 the observed and fitted OHC for cases A and B are shown. Including data on OHC change below 700 m increases the total heat uptake. The increase in the fitted OHC above 700 m over the last decade is larger in case B compared to case A. In case B the increase in the fitted OHC above 700 m is larger than the observational data, while below 700 m, the observed OHC increase is higher than the fitted one (Fig. 5). This is to be expected since the parameters of θ do not change over time. Thus, the observed rapid change in OHC below 700 m over the last years with corresponding slower warming above 700 m is attributed to long-term internal variability (a part of the n t term) in the model (Fig. S5cd). Note that the Ishii and Kimoto series is outside the 90 % CI. The reason is that the assumed observational errors for all series are much larger back in time than in the recent years (see Appendix A). Therefore, the various data series are aligned quite closely to each other in recent years, and since the Ishii and Kimoto series has a much weaker trend than the others, it lies above the 90 % CI in the first part of the data history.
The update of the ECS inf from case A to B was done stepwise in four steps (Fig. S1f, g, i and j). The new ERFs were first implemented. The posterior forcing is much closer to the prior using Forc_AR5 instead of Forc_Skeie14, and the fit to the GMST in the 1980s-1990s also improved with a decrease in the root mean square error between 1980 and 1999 from 0.12 to 0.087 • C compared to case A. The stronger forcing resulted in a shift of the ECS inf estimate to lower values (Fig. S1f vs. e), with an ECS inf mean value of 1.5 • C (90 % CI 0.9-2.3 • C). So far, only OHC data in the upper 700 m were used, leaving the model unconstrained with respect to the heating of the deeper ocean.
We then included the ORAS4 data above and below 700 m as two separate data sources. Similar to Johansson et al. (2015) we found that including the OHC change be-low 700 m increases the total heat uptake and thus the mean value of ECS inf from 1.5 to 1.7 • C (Fig. S1g vs. f). The 90 % CI shifted to larger values ranging from 1.0 to 2.8 • C.
The last two steps to update the ECS inf estimate from case A to case B were to use the most recent version of the data prior to 2010 and to extend the data series used from 2010 to 2014 (Table 1). Some of the observational data series have been updated by the data suppliers, so first we use refined data up to 2010 before we extend the data series to 2014 (cf. Appendix B). Using the refined data up to 2010, the estimated mean ECS inf increased from 1.7 to 2.0 • C (Fig. S1i) and the 90 % CI was shifted again to larger values ranging from 1.1 to 3.3 • C. Further, when the data series were extended from 2010 to 2014, the upper bound of the 90 % CI decreased from 3.3 to 3.1 • C while the lower bound remained unchanged and the mean estimate slightly reduced (Fig. S1j).
In total, the changing from case A to case B did not change the mean value of ECS inf (it is 2.0 • C in both cases), but the 90 % CI was reduced from 1.0-3.4 to 1.2-3.1 • C. The reduc- tion in ECS inf in the first step of the update is more or less counteracted by the subsequent steps.

Sensitivity tests -the use of input data
We now investigate possible causes of differences in observationally based ECS inf estimates due to the use of input data. We analyze the impacts of different usage of the OHC data (cases C and D) and the treatment of uncertainties in the GMST data (case E).

The role of the use of OHC data
The vertical transport of heat in the SCM (with 40 vertical layers) is quite simple. Turbulent diffusion mixes heat down from the surface, while downwelling transports heat directly to the deepest layer, i.e., no detrainment to intermediate layers (Aldrin et al., 2012). Therefore, it is of interest to investigate a constraint of the model with OHC data for the total depth of the ocean instead of above and below 700 m. In case C we do not separate the 0-700 m from the deeper ocean. We use four datasets for total OHC by adding the ORAS4 below 700 m data to each of the four OHC above 700 m estimates. Merging the OHC above and below 700 m (case C) results in a substantial decrease in the posterior ERF from 2.5 to 1.8 W m −2 (Fig. S6b-c) and an increase in the ECS inf estimate from a mean value of 2.0 • C (median 1.9 • C) to 3.2 • C (median 2.9 • C) (Fig. 1a). Without the separate constraint on the OHC above and below 700 m, the posterior warming of the ocean increases faster (compared to case B) over the last 20 years (Fig. 6). This is mainly caused by enhanced warming in the upper 700 m (Fig. 7). This allows for a stronger negative ERF estimate for aerosols (Fig. 4a). While the prior and posterior radiative forcing in case B is similar, in case C the posterior aerosol ERF is shifted to lower values (Fig. 4a) and the posterior net forcing is shifted towards lower values (Figs. 4a and S6c) and hence a higher estimated ECS inf (Fig. 1) compared to case B. This anticorrelation between aerosol forcing and ECS inf is illustrated in Fig. 4c for case B. However, the observations show a stronger recent increase in heat in the deep ocean (cf. Sect. 3) and not in the upper 700 m, so this test where this information is not used is likely to overestimate the aerosol forcing strength and hence overestimate the ECS inf . Since the IPCC best estimate of −0.9 W m −2 was published in 2013 for aerosols ERF, studies point towards weak aerosol-cloud interaction (Gordon et al., 2016;Malavelle et al., 2017;Toll et al., 2017). These recent studies indicate that there is no firm evidence to revise the IPCC AR5 aerosol ERF best estimate yet. We therefore keep case B as our best estimate, since having separate data series for the two ocean layers provides information that constrains the balance between negative and positive forcings due to their different time evolution.
A unique feature with our method is that we use data from more than one observational dataset. It is obvious that, as long as the various data series for the same quantity (here OHC above 700 m) differ, it is easier to fit a model to one data series, thus giving less uncertainty in the posterior esti-  mates. In case D we test the effect of using one alternative time series for OHC. We choose to use the Levitus2000 time series, which is the same OHC data as used in Johansson et al. (2015). The pentadal heat content is used from 1955 to 2012, treated as annual observations and extended to 2014 using the yearly OHC data for the upper 2000 m from the same data source. We use the OHC data for the upper 2000 m as they were data for the total OHC. Observed energy stored below 2000 m is not included in the estimation, and hence the ECS inf might be underestimated. Energy stored below 2000 m is uncertain. Purkey and Johnson (2010) found an increase in OHC in the abyssal and deep Southern Ocean in the 1990s and 2000s based on sparse observations from ships, but it is not clear if this is a long-term trend. Llovel et al. (2014) could not detect a deep-ocean (below 2000 m) contribution to sea level rise and energy budget between 2005 and 2013 using ocean observations and satellite measurements; however, the uncertainties are large.
As in case C, we do not separate the OHC data above and below 700 m. Quite similar to case C, there is a more rapid increase in the posterior estimate of total OHC (Fig. 6) compared to case B; the increased warming is mostly in the upper 700 m (Fig. 7), and the posterior forcing is shifted to lower values than in the prior (Figs. 4a and S6d). In case D the estimated mean ECS inf is 2.8 • C (median 2.6 • C, 90 % CI 1.5-4.6 • C) (Fig 1a, case F). This is higher than in case B, but lower than for case C.
The estimated total OHC has a narrower range when OHC above and below 700 m are merged (Fig. 6a). The range is also narrower in case D than in case C. As expected, using several data series for OHC (case B: 5; case C: 4; case D: 1) increases the posterior observational error. Note that the magnitude of the observational errors is estimated (Aldrin et al., 2012;Skeie et al., 2014). In case D, the posterior standard deviation of the observed OHC is similar to the reported standard deviation (Fig. S8), while when using several OHC time series, the posterior standard deviation is larger (Fig. S7) and arguably more correct than reported due to the large variability among the datasets (Appendix A). Hence, larger uncertainties in the observed OHC data result in larger uncertainties in the estimated OHC. Johansson et al. (2015) used the same OHC data series as in our case D and a similar method; however, their 90 % CI for the OHC in the upper 2000 m (their Fig. S5) is even narrower. This might not only be due to the use of one OHC dataset. While we estimate the magnitude of the observational error, Johansson et al. (2015) use the error given by the OHC data provider. In Johansson et al. (2015) the estimated uncertainties in OHC were smaller than the given observational uncertainties (their Fig. S5). The narrower ECS inf range may primarily be because Johansson et al. (2015) assumed very small measurement errors in the most informative data (OHC); secondly, they ignored time correlation in observational errors and did not take into account long-term internal variability to the same degree as in our method.
To sum up, using several observational series (and estimated observational errors) increases the estimated observational errors to more realistic values, since data series are not well correlated and hence increase the range of estimated OHC with implications for estimated ECS inf .

The role of uncertainty estimates in the temperature series
The prior standard deviation for the surface temperature data is quite different among the datasets (Fig. S7a). The NCDC data have 3 to 5 times larger standard error prior to 1950 compared to after 1950, while it is more constant back to the 19th century for the three other datasets.
To investigate this, we reestimated our model using data only after 1950, which is equivalent to assuming a very large uncertainty prior to 1950. The estimated magnitude of the ENSO signal increases (Fig. S5a-b) since the data series are more correlated in the latter part of 20th century. For temperature, the model fits the observations of GMST well, but with a larger 90 % CI range (Fig. 2), and the observed NH and SH temperatures are well within the 90 % CI of the model (Fig. S9). The mean ECS inf increases from 2.0 (median 1.9 • C) to 2.2 • C (median 2.1 • C), and the upper 90 % CI limit increases from 3.1 to 3.8 • C (Fig. 1a, case E vs. B). The mean TCR increases from 1.4 to 1.5 • C and the 90 % CI is shifted slightly to lower values compared to the range from IPCC by 0.1 • C (Fig. 1b). Johansson et al. (2015) used only the NCDC data for GMST; thus, the data prior to 1950 were given little weight when fitting the model. Our ENSO signal is now (case E) of a similar magnitude as in Johansson et al. (2015) (their Fig. 1b). The ECS inf uncertainty in this study is still larger, and our mean value is slightly higher than their lower limit of 2 • C.
Excluding data before 1950 also excludes the late 19thcentury period with a large volcanic eruption where the signal in the GMST data is small and quite uncertain (Santer et al., 2016). Santer et al. (2016) argued that the method in Johansson et al. (2015) down-weights the volcanic forcing due to the small response of the Krakatau eruption in the temperature data. Johansson et al. (2016) responded that the observational uncertainty was large, so the GMST data at that time will have a limited effect. In our results, excluding observations before 1950, the GMST response following the Pinatubo eruption in 1991 increases (Fig. 2) and is similar to observations due to the larger ENSO signal and stronger posterior volcanic signal.
In the early period, the aerosol forcing had a larger relative contribution to total ERF causing a more uncertain forcing trend in the early period. Uncertainty in the temporal trend of the forcing is not included, and better representation of forcing uncertainties than the scaling approach is needed (Tanaka et al., 2009). Omitting data before 1950 (case E), when the net forcing is more uncertain (Stevens, 2013), makes it easier to fit the model to observations, but the uncertainty in estimated ECS inf , TCR and GMST and increases (Figs. 1 and  2).

Discussions and conclusions
Causes of differences in observationally based estimates of ECS inf due to the use of input data are analyzed, and an updated ECS inf estimate is presented using our Bayesian estimation model. Adding observational data from 2011 to 2014 and OHC data below 700 m and replacing forcing data with IPCC AR5 ERFs, the ECS inf posterior mean was 2.0 • C (median 1.9 • C, 90 % CI 1.2-3.1 • C). The mean value is similar and the range is slightly narrower than the refined Skeie14 estimated ( Fig. 1 case B vs. A). The mean ECS inf estimate is larger than in Skeie14. Although the estimate in cases A and B is quite similar, the ECS inf estimate shifted to lower values when Forc_AR5 replaced Forc_Skeie14 (from a mean ECS inf estimate of 2.0 to 1.5 • C), and it shifted to larger values when OHC data below 700 m were included (to a mean ECS inf value of 1.7 • C). The ECS inf estimate was very sensitive to the forcing data used, and we showed that the ECS inf estimate was also sensitive to the assumed uncertainties in the GMST data (case E: ECS inf mean value increased from 2.0 to 2.2 • C) and how the OHC data were treated (cases C and D, with mean ECS inf of 3.2 and 2.8 • C, respectively).
Bayesian methods have recently been reviewed by Annan (2015) and Bodman and Jones (2016), and limitation by assuming constant sensitivity over time, the role of the ECS inf prior distribution and equal efficacy for different forcings have been discussed. Implementing an alternative prior for ECS inf as in Skeie14, where 1 / ECS inf is uniformly distributed, shifted the mean ECS inf to lower values from 2.0 • C (median 1.9 • C, 90 % CI 1.2-3.1 • C) to 1.6 • C (median 1.6 • C, 90 % CI 0.97-2.5 • C). The ECS inf estimate is sensitive to the prior; however, one could argue against this alternative prior because it has a high probability of low climate sensitivities that may not be realistic, with 76 % probability for ECS inf being lower than the pure black-body radiation sensitivity of 1.1 • C (Aldrin et al., 2012;Skeie et al., 2014). Recently, studies have suggested that assuming equal efficacy for all forcings biases the ECS estimate low (Marvel et al., 2015;Shindell et al., 2015) even when ERFs are used. In our approach, the efficacy is implicitly included in the forcing uncertainty and thus accounted for. However, if we apply an efficacy of 1.5 for ozone, surface albedo, BC on snow and aerosols, which is the efficacy found in the analysis of Shindell (2014), the probability density function of the ECS is shifted to larger values (Fig. S1l), with a 90 % CI ranging from 1.2 to 3.7 • C.
The fit to the temperature data in the 1980s and 1990s improved using Forc_AR5 instead of Forc_Skeie14, indicating that the forcing trend over this period is better represented in Forc_AR5 compared to Forc_Skeie14. The trend in the forcing is more uncertain in the first half of the 20th century due to less dominance of CO 2 , and in our method the same relative uncertainty for the prior forcing is used over the entire time period. A sensitivity simulation omitting observations before 1950, similar to making these observations very uncertain, gave better representation of the GMST in the latter part of the 20th century and an increased mean ECS inf . Future work should include uncertainties in the temporal development of the forcing, and there is a clear need for an international effort to establish forcing time series, using a consistent forcing definition and allowing for uncertainties in emissions to give a better representation of the temporal uncertainties.
Including OHC data below 700 m shifted the ECS inf to higher values. The estimated ECS inf was found to be very sensitive to how the OHC data were used. Including four OHC time series but merging the data above and below 700 m (case C), the ECS inf mean value increased from 2.0 to 3.2 • C. The probability of ECS inf above 4.5 • C increased to 13 %, values that are practically excluded in our main estimate (case B). Previous studies have used total-column OHC data, and due to the simple representation of the ocean one can argue that this might be more appropriate. However, in case C most of the recent increase in OHC in the model occurred in the uppermost 700 m, allowing a stronger aerosol cooling (Fig. 4a) and hence a larger ECS inf , while the observations indicate that the ocean was warming mainly be-low 700 m. Using only the total-column OHC might therefore overestimate the aerosol forcing strength and hence the ECS inf . We recognize structural uncertainties in the model, and a multi-model intercomparison of observational methods using identical input data would be of great value to investigate these uncertainties.
Using only the Levitus2000 series for OHC for the totalocean column (case D), the ECS inf 90 % CI was shifted to lower values with a range of 1.5-4.6 • C and the range shrunk compared to case C. The historical measurements of ocean temperatures are sparse (Abraham et al., 2013), with large differences between the datasets. The temporal structure of the reported uncertainties differs, and the full uncertainties are often not assessed. Hence, relying on only one OHC series and its reported uncertainty may underestimate the observational uncertainties and hence overestimate the certainties in the estimated OHC with implications for the ECS inf estimate.
Recent studies indicate that the upper-ocean warming is underestimated due to the gap-filling methods (Durack et al., 2014;Li-Jing et al., 2015), in which case the ECS inf will also be underestimated. When refining historical OHC estimates, not only the best value, but also the uncertainty is crucial for observationally based ECS inf estimation.
Other priorities are to improve the GMST series, including uncertainties -not only for the recent trend (Karl et al., 2015;Cowtan and Way, 2014) but also for earlier time periods. Assuming a very large uncertainty prior to 1950, the GMST fit improved and the ECS inf mean increased while the estimated uncertainty ranges increased.
Our ECS inf posterior mean was 2.0 • C with a 90 % CI of 1.2 to 3.1 • C. This is consistent with a mean ECS of 2.9 • C (Armour, 2017), which compares reasonably well with climate model estimates (Andrews et al., 2012;Forster et al., 2013). A final remark is that it is not obvious that the true ECS is a more relevant metric for the climate sensitivity than the ECS inf in a policy context (i.e., the Paris Agreement). The United Nations Framework Convention on Climate Change (UNFCCC) has not adopted a predefined definition of GMST and the stronger long-term feedbacks found in an analysis of CMIP5 simulations (Proistosescu and Huybers, 2017) operate on a timescale longer than the timescale for reaching 2 • C. Data availability. Several publicly available datasets were used in this study. The specific references to the data sources are given in Table 1. Model outputs are available upon request.

Appendix A: Refinement of Skeie14
A few updates/corrections to Skeie14 (Fig. S1a) had to be made prior to the analyses presented in this study. In the Skeie14 study, the standard error of observed OHC above 700 m for two out of the three series was constant in time, while for the third dataset the standard error decreased with time. Due to the limited observational data back in history (e.g., Abraham et al., 2013), it is reasonable to assume that the shape of the standard error of observed global OHC increases back in time, as for the CSIRO series. Therefore, we now assume a common observational uncertainty temporal profile for OHC above 700 m equal to CSIRO for all the OHC time series (Fig. S1b). Note that the magnitude of the observational errors are estimated in our approach (Aldrin et al., 2012;Skeie et al., 2014); i.e., we account for the possibilities that the reported observational errors may be biased upward or downwards compared to the real observational errors.
In fact, the results from Skeie et al. (2014, Appendix B) indicated that the reported standard errors for the Levitus and the Ishii and Kimoto OHC series were too low. We have investigated this further by the following simple analysis.
Let y 1t and y 2t be two different estimates of the true OHC in year t. Then y 1t = "true OHC" + e 1t and y 2t = "true OHC" + e 2t . Here, e 1t and e 2t are error terms, with reported standard deviations s 1t and s 2t , and with true, but unknown standard deviations σ 1t and σ 2t . The difference of the series is y 1t −y 2t = e 1t −e 2t , so even if we cannot observe the errors, we can observe their difference. If the two data series are based on more or less the same data, as for the OHC series used here, one can expect that e 1t and e 2t are positively correlated. Then Var(y 1t − y 2t ) = Var(e 1t − e 2t ) <= (σ 2 1t + σ 2 2t ) We can estimate the average variance of the differences y 1t − y 2t over all time points by Var obs = 1/(n − 1) t (y 1t − y 2t − m) 2 , where m is the average of y 1t − y 2t and n is the number of years. This could be compared to the corresponding reported variance under the assumption of uncorrelated errors, by Var rep = 1/n t (s 2 1t + s 2 2t ), and if the reported standard deviations are correct, then the variance ratio Var obs /Var rep should be less than or equal to 1. For differences of the Levitus, Ishii and Kimoto and ORAS4 (above 700 m) series, the variance ratios are between 2.13 and 3.74 (Table A1), indicating that the reported observational errors for these series are too low, and the real uncertainty may be larger. This is an additional argument for using the CSIRO standard errors for all OHC series.
Another update of Skeie14 that was needed was to use monthly volcanic RF data (Fig. S1c) compared to yearly data in Skeie14. In addition to the three global mean surface temperature (GMST) time series used in Skeie14, another time series for GMST has been published recently (Cowtan and Way, 2014). This time series finds a stronger increasing trend in temperature over the last decade compared to the Had-CRUT4 data, due to the method of accounting for the unsam- pled regions in the world. This data series is now included (Fig. S1d).
Our previous studies showed that the correlation between the observational errors in temperature data was almost uncorrelated with the observational errors in the OHC data. Therefore, to simplify the numerical computations, we from now on assume that these correlations are exactly zero (Fig. S1e).
The estimated ECS inf for each step in the refinement of Skeie14 is presented in Fig. S1a-e.

Appendix B: Extending data up to and including 2014
When extending the analysis from 2010 to 2014, not all the time series used in the estimation is available up to and including the year 2014. Below is a description of how the different datasets are extended if not available up to 2014.
AR5 ERF: The end year for the forcing time series presented in AR5 is 2011 and has to be extended to 2014. For long-lived greenhouse gases the time series are extended using recent observations of global mean concentrations and the formulas relating concentrations and forcing used in Skeie et al. (2011). Tropospheric ozone, stratospheric ozone, aerosol ERF, land use change, BC on snow and volcanoes are kept constant between 2011 and 2014. Stratospheric water vapor follows methane RF. Contrails RF is extended using aircraft traffic data (http://airlines. org/dataset/world-airlines-traffic-and-capacity/, last access: March 2015). Solar RF is extended using the Physikalisch-Meteorologisches Observatorium Davos (PMOD) composite (Frohlich and Lean, 2004).
CSIRO: Data up to and including 2012 were downloaded. The time series were extended from 2012 to 2014 using the mean rate of change of the other OHC data. The uncertainty in 2014 and 2013 is set equal to the uncertainty in 2012.
ORAS4: Balmaseda et al. (2013) investigated the time evolution of global OHC at different depths of the ocean from 1958 to 2009 using the European Centre for Medium-Range Weather Forecasts ocean reanalysis system 4 (ORAS4). Five ensemble members of ORAS4 are generated that sample plausible uncertainties in the wind forcing, observation coverage and the deep ocean. The ORAS4 system runs auto-matically in operations, with numerical weather prediction forcing and observations that are not manually quality controlled. The 1 × 1 • ocean potential temperature up to December 2014 are made available through the APDRC (http: //apdrc.soest.hawaii.edu/datadoc/ecmwf_oras4.php, last access: March 2015) for one ensemble member. The trend in OHC for the total depth and upper 700 m from 2010 to 2014 based on the one ensemble member is used to extend the corresponding OHC data for all the five ensemble members from Balmaseda et al. (2013) up to 2014. The data after 2009 are based on the automatic ORAS4 system and are not quality controlled, and the results in this paper using the data after 2009 should be interpreted with caution. The same method is used to extend the ORAS4 data from 2009 to 2010 (Fig. S1gi). From the five ensemble members, the estimate with uncertainty is calculated as the annual average and standard deviation of OHC above and below 700 m. The standard deviations are modified by smoothing the curve (9-year moving average) since the curve was otherwise very static.