A lower and more constrained estimate of climate sensitivity using updated observations and detailed radiative forcing time series

Equilibrium climate sensitivity (ECS) is constrained based on observed near-surface temperature change, changes in ocean heat content (OHC) and detailed radiative forcing (RF) time series from pre-industrial times to 2010 for all main anthropogenic and natural forcing mechanism. The RF time series are linked to the observations of OHC and temperature change through an energy balance model (EBM) and a stochastic model, using a Bayesian approach to estimate the ECS and other unknown parameters from the data. For the net anthropogenic RF the posterior mean in 2010 is 2.0 Wm−2, with a 90 % credible interval (C.I.) of 1.3 to 2.8 Wm−2, excluding present-day total aerosol effects (direct+ indirect) stronger than−1.7 Wm−2. The posterior mean of the ECS is 1.8 C, with 90 % C.I. ranging from 0.9 to 3.2C, which is tighter than most previously published estimates. We find that using three OHC data sets simultaneously and data for global mean temperature and OHC up to 2010 substantially narrows the range in ECS compared to using less updated data and only one OHC data set. Using only one OHC set and data up to 2000 can produce comparable results as previously published estimates using observations in the 20th century, including the heavy tail in the probability function. The analyses show a significant contribution of internal variability on a multi-decadal scale to the global mean temperature change. If we do not explicitly account for long-term internal variability, the 90 % C.I. is 40 % narrower than in the main analysis and the mean ECS becomes slightly lower, which demonstrates that the uncertainty in ECS may be severely underestimated if the method is too simple. In addition to the uncertainties represented through the estimated probability density functions, there may be uncertainties due to limitations in the treatment of the temporal development in RF and structural uncertainties in the EBM.


Introduction
To link long-term targets of climate policy, e.g. the 2 • C target (UNFCCC, 2009(UNFCCC, , 2010)), to a more specific emission mitigation policy, a key question in climate science is to quantify the sensitivity of the climate system to perturbation in the radiative forcing (RF).The equilibrium climate sensitivity (ECS) is defined as the global mean surface temperature change following a doubling of the CO 2 concentration when the system has reached a new equilibrium.However, the ECS has been poorly constrained, with significant probabilities of high values.The ECS was given a likely (> 66 % probability) range of 2 to 4.5 • C, with a best estimate of 3 • C by the Intergovernmental Panel on Climate Change (IPCC) in 2007, and values substantially higher than 4.5 • C could not be excluded (Meehl et al., 2007).To constrain the ECS there are two main approaches.A "bottom up" approach performing Monte Carlo simulations or a multi-model experiment with general circulation models (GCMs) (Murphy et al., 2004;Piani et al., 2005;Stainforth et al., 2005;Andrews et al., 2012) and a "top down" approach constraining the ECS using RF estimates and observed data on past climate change on various timescales: 20th century warming (Andronova and Schlesinger, 2001;Forest et al., 2002;Gregory et al., 2002;Knutti et al., 2002;Frame et al., 2005;Annan and Published by Copernicus Publications on behalf of the European Geosciences Union.Hargreaves, 2006;Forest et al., 2006Forest et al., , 2008;;Tomassini et al., 2007;Meinshausen et al., 2009;Libardoni and Forest, 2011;Huber and Knutti, 2012;Olson et al., 2012;Ring et al., 2012;Lewis, 2013;Otto et al., 2013), the last millennium using proxy data (Hegerl et al., 2006), the last glacial maximum (Annan et al., 2005;Schneider von Deimling et al., 2006;Schmittner et al., 2011;Hargreaves et al., 2012), or using data further back in time (Royer et al., 2007;Kohler et al., 2010;Hansen and Sato, 2012;Hansen et al., 2013).
The main challenge in determining the climate sensitivity is that it is governed by complex feedback mechanisms.Bottom-up estimates use prescribed CO 2 perturbations (i.e. the RF known with small uncertainty), but the uncertainties in the representation of the physics and thus the feedbacks lead to large uncertainties in the ECS (Andrews et al., 2012).For the top-down approach the Earth can be considered as a "laboratory" in which all feedbacks are by definition perfectly represented, although the impact of very slow feedbacks like melting of ice caps might not be fully captured.The problem is that the human-induced "climate experiment" is not very well set up in that neither the RF nor the response is well known.There is a combination of positive and negative forcings and the documentation of the changes in the system is less than perfect.However, this is an ongoing experiment and over time the net positive forcing increases as CO 2 continues to increase, while the concentrations of scattering aerosols have more or less stabilized (Wild et al., 2009;Skeie et al., 2011b).Our understanding of the physics and magnitude of the forcings (e.g. the aerosol forcings) has also improved, leading to less uncertainty in the RF estimates (Myhre, 2009), and the time series of observations become longer and with improved quality.This combination is expected to provide a better constraint on the ECS.Estimating the climate sensitivity using historical data implicitly assumes that the feedbacks do not change over time, which is equivalent to assuming that the effective climate sensitivity (Murphy, 1995;Frame et al., 2005) and the ECS are equal.This assumption adds some additional uncertainty to the estimate of the tail of the ECS towards higher values (Armour et al., 2012), since the slow feedbacks are not fully represented.However, these changes are slow and the climate sensitivity estimated here (i.e. the average effective climate sensitivity over the 1750-2010 period) is what is required for analysis of climate change on a century timescale (Raper et al., 2001;Sokolov, 2006).
In this study, RF time series with uncertainty of all wellestablished mechanisms are linked to the observations of ocean heat content (OHC) and temperature change through an energy balance model and a stochastic model, using a Bayesian approach to estimate the climate sensitivity following the method described in Aldrin et al. (2012), but with certain improvements (Sect.2).Observational data up to and including the year 2010 are used.This is at least ten additional years compared to the majority of previously published studies (summarized in Hegerl et al., 2007;Knutti and Hegerl, 2008).A key feature of both the near-surface air temperatures and the OHC (upper 700 m) is an apparent flattening during the last decade (Easterling and Wehner, 2009;Palmer et al., 2010).Thus the situation over the last decade with the possibility for better quantification of the net RF and more observations can give significant new information.
A Bayesian statistical approach also provides posterior estimates of all RF mechanisms and an estimate of the magnitude and timescales of unforced natural variability in the system.The ECS probability density function (PDF) with a heavy tail has recently been discussed (Frame et al., 2005;Roe and Baker, 2007;Fiore et al., 2009;Hannart et al., 2009;Annan and Hargreaves, 2011;Roe and Armour, 2011) and Allen and Frame (2007) called off the quest to find the upper bound of the ECS.In recent years, the transient climate response (TCR, defined as the global mean temperature change at the time of CO 2 doubling under a scenario of a 1 % per year increase in CO 2 ) has therefore received more attention (Stott et al., 2006;Forest et al., 2008;Gregory and Forster, 2008;Knutti and Tomassini, 2008;Padilla et al., 2011;Gillett et al., 2012).The TCR is non-linearly related to ECS (Allen et al., 2000).The ECS temperature response will eventually be realized on a timescale on century to millennia when the system has reached a new equilibrium.The uncertainty in TCR may therefore be more relevant for near-term transient climate change (Cubasch et al., 2001;Hegerl et al., 2007).
Based on the updated model we also present a PDF for the TCR.

Methods
In this study the climate sensitivity is estimated following the method described in Aldrin et al. (2012), where RF time series with uncertainty of all well-established mechanisms are linked to the observations of OHC and temperature change through an energy balance model and a stochastic model, using a Bayesian approach.
The main differences between Aldrin et al. (2012) and this work are (i) inclusion of a term representing the long-term internal variability in the stochastic model, (ii) use of three OHC series simultaneously, (iii) updated RF time series from Skeie et al. (2011b), including updated RF priors and the cloud lifetime and semi-direct effects, and (iv) using data up to the year 2010.

The model
We give here an overview of the model framework, and more details are given in Appendix A and in Aldrin et al. (2012).The core of our model framework is a deterministic EBM (Schlesinger et al., 1992), which calculates annual hemispheric and global mean near-surface temperature change and changes in global OHC (and can divide it into OCH above and below 700 m) as a function of estimated RF time series.The ECS is an explicit parameter in the EBM and can therefore be constrained in a Bayesian framework.The deterministic model is combined with a stochastic model and fitted to observations of annual hemispheric mean temperature change and OHC.The ECS is given a vague prior, uniform [0, 20] • C, while the other model parameters and the RF time series are given informative priors based on expert judgment.
The EBM does not capture internal natural variability in the climate system, such as the El Niño-Southern Oscillation (ENSO).In the stochastic model we account for the effect of ENSO using the Southern Oscillation index.Also on multidecadal scales there may be internal variability (e.g.Hegerl et al., 2007).This is taken into account by an explicit term for long-term variability.This term also represents other slowly varying model errors due to potential limitations of the EBM and forcing time series.Another error term is included to account for more rapidly varying model errors.
The sum of these terms then defines our model for the yearly values of the underlying true global OHC (in the upper 700 m if nothing else indicated) and hemispheric temperatures, here put into a vector g t for year t: Here, all terms are three-dimensional vectors corresponding to the hemispheric temperatures and the OHC in year t.The m t (x 1750:t , ECS, θ) is the EBM with RF time series from 1750 until year t (x 1750:t ) as input, and the ECS is a parameter in addition to other physical parameters (θ).Furthermore, e t is the Southern Oscillation index (http://www.bom.gov.au/climate/current/soihtm1.shtml, Bureau of Meteorology, Australia) and β 1 is a coefficient vector with one value for each hemisphere, and which is 0 for OHC.The two coefficients are estimated from observational data.The long-term internal variability is represented by the term n liv t , which also accounts for potential other slowly varying model errors.The dependence structure of this term (i.e.correlations over time and between the three elements) is based on control simulations with GCMs from CMIP5 (Appendix A), but the standard deviations (or the amplitude) are estimated from the observational data.In the main analysis the Canadian ESM (CanESM2) is used; however, in a separate sensitivity test the Norwegian ESM (NorESM) is used.Finally, the term n m t is the (short-term) model error.This term is modelled by a vector autoregressive process, which accounts both for correlations between years and between the three elements within the same year.
The true values of g t are not known exactly, but there are published several observational-based data series for each of the three components of g t .There is no consensus that one of these is considerably more precise than the others, and we believe that combining them will be more informative than using each series alone, because this will reduce the influence of observational errors (i.e. the combined sampling and analysis errors).One simple way of combining them could be to take the yearly average over data series for the same physical component (e.g.OHC), but this would give an inconsistent average series if the data series cover different periods (i.e. the series actually included in the average would then vary over time).Therefore, we use instead several data series (here three, see Sect.2.2) for each component simultaneously.One advantage of this approach is that it also gives information on the observational errors, since the difference between two estimates of the same true quantity necessarily must be due to observational error.Now, let y t denote the vector with observations for year t.The first six elements are hemispheric temperature estimates from three different data sets (see below), and the last three elements are estimates of OHC in the upper 700 m, from three OHC data sets (see below), so the dimension of y t is nine in the present analysis.Furthermore, let g * t be a corresponding vector of true but unknown values of temperatures and OHC, by copying each element of g t three times.Then, the observations and the underlying truth are related by where n o t is a vector of observational errors.It is reasonable to expect that the elements of n o t are correlated, both between the three elements corresponding to the same physical quantity because they use basically the same raw data, and between elements representing different hemispheres.It is also reasonable to believe that n o t are correlated over time.Therefore, also n o t is modelled by a vector autoregressive process, but with standard deviations that vary over time according to temporal profiles of the error estimates supplied by the data providers.However, the actual levels of the observational errors are estimated from the data within the model framework, taking into account the possibilities for under-or overestimating the reported errors (Aldrin et al., 2012).
All unknown parameters in the EBM and in the other parts of the model are estimated from the available observations using a Markov Chain Monte Carlo (MCMC) technique, and the posterior distributions of the RF time series are obtained simultaneously (see the Supplement in Aldrin et al., 2012, for details on the MCMC algorithm).
The estimates of the n o t process partly decide the influence of each data series.A data series with small observational errors will tend to have more impact than a data series for the same physical quantity (OHC, say) with higher observational errors.On the other hand, if there is a high correlation between the observational errors of two of the data series, but they are both uncorrelated with a third one, the importance of two correlated data series will tend to be less than the double of the third series.Therefore, including many more data series for one physical quantity, for instance OHC, may be useful, but only to a certain extent, because they will share the same information.
Note that the standard deviations of all stochastic terms are treated as unknowns and are estimated from the data, which ensures that the modelled variance on the right side of

R. B. Skeie et al.: A lower and more constrained estimate of climate sensitivity
Eq. ( 1) is consistent with the variance of the data.This differs from the approach of Meinshausen et al. (2009) or Huber and Knutti (2012), whose model contains some of the same stochastic terms, but where all modelled variances are kept fixed to values based on external sources.However, even if we think it is conceptually useful to divide the errors into separate terms, each with a distinct interpretation, it is of course a possibility that the estimated error terms may be mixed, so one should perhaps be careful with too strict an interpretation of each term.
Our model is of course an extreme simplification of the real climate system.Therefore, to investigate if the model is useful for estimating the ECS from observations, we have previously validated its performance on artificial data generated from GCMs in the CMIP3 experiment.The estimates of ECS were, in light of their corresponding uncertainties, comparable with the "true" values of ECS for two different GCMs (Aldrin et al., 2012).

Observational data
Three different sets of annual hemispheric mean near-surface temperatures data are used simultaneously: HadCRUT3: Brohan et al. (2006), NCDC: Smith and Reynolds (2005) and Smith et al. (2008) and GISS: Hansen et al. (2006Hansen et al. ( , 2010)); GISS and HadCRUT3 downloaded March 2011, NCDC downloaded June 2011.An additional analysis has been performed with the updated HadCRUT4 data (Morice et al., 2012).For global mean OHC between 0 and 700 m three different data series are also used: Levitus et al. (2009, downloaded February 2011), CSIRO (Domingues et al., 2008;Church et al., 2011) and Ishii and Kimoto (2009).Observational data for OHC below 700 m are limited.However, we perform a sensitivity study using recent OHC data for the deeper layers, cf.Sect.4.3.

Radiative forcing
Input to the EBM is RF time series (natural and anthropogenic).The anthropogenic RF are from Skeie et al. (2011b), where the RF of all well-established mechanisms from 1750 to 2010 were estimated.Observed concentrations of long-lived greenhouse gases are used in forcing calculations, and thus the possible impact of biogeochemical feedbacks is not included in the climate sensitivity estimate.For short-lived climate components, tropospheric ozone and anthropogenic aerosols, detailed atmospheric chemistry, aerosol and radiative transfer modelling have been performed using emissions from Lamarque et al. (2010).Natural forcing mechanisms included are changes in total solar irradiance and explosive volcanic eruptions (Appendix C).All forcings are listed in Table 1, with a mean RF value and the 90 % uncertainty range for the year 2010.The uncertainty ranges for the anthropogenic RF are based on Fig. 1d in Skeie et al. (2011b) (Appendix D).Separate RF time series for each hemisphere are used as input to the EBM (Appendix D).
For the total direct aerosol effect the uncertainty is better constrained for recent years utilizing both models and satellites (Myhre, 2009).We adopt the same relative uncertainty as in Forster et al. (2007) for each aerosol component, but the standard deviation for each component is multiplied by a factor for the years 2000-2010 to match the total direct aerosol effect uncertainty in Skeie et al. (2011b).The factor increases linearly back in time, reaching 1.0 in 1950 and being constant thereafter (Appendix D).For the other components we assume the same relative uncertainty for all years, except otherwise stated in Table 1.
We use effective radiative forcing (Boucher et al., 2013;Myhre et al., 2013) by also including forcing mechanisms that are not strictly radiative forcings according to the IPCC AR4 (i.e. the cloud lifetime and the semi-direct effects, see Table 1) since they alter the hydrological processes and act much more rapidly than the timescale of global surface temperature change.We include the semi-direct effect as a uniform distribution of −0.25 to +0.50 Wm −2 in 2007 (Isaksen et al., 2009), assuming it is proportional to the RF of black carbon (BC) from contained combustion.
We use the common assumption that the RF mechanisms are additive and independent (Boucher and Haywood, 2001).The prior distributions for the RF time series are shown in Fig. 1.The mean value for the anthropogenic RF in 2010 is 1.5 Wm −2 , with a 90 % C.I. of 0.27 to 2.5 Wm −2 .The mean value is weaker than the mean value of the net anthropogenic RF in IPCC AR4 of 1.6 Wm −2 (Forster et al., 2007), and our prior is wider.This is reasonable since the RF the IPCC AR4 estimate did not include the cloud lifetime and the semi-direct effects.The prior for the total aerosol effect, which includes the direct effect, the cloud albedo effect, the cloud lifetime effect and the semi-direct effect, has a mean value of −1.5 Wm −2 in 2010 and a 90 % C.I. of −2.7 to −0.63 Wm −2 .This prior is more strongly negative than the AR5 estimate of −0.9 (−1.9 to −0.1) Wm −2 (Boucher et al., 2013).

Results
In this section we present the results of our analysis where the parameter values in the EBM are updated using observations of hemispheric temperature and OHC (0-700 m) to the year 2010 and detailed RF time series from 1750 to 2010 (the main analysis).We investigate the effect the last 10 years of observational data have on the estimated ECS, and calculate PDFs of the TCR using the joint posterior distribution of the model parameters.Prior and posterior distribution of the RF time series and PDF of RF in 2010 for total RF (upper panel), anthropogenic RF (middle panel) and total aerosol effect (direct effect, cloud albedo effect, cloud lifetime effect and semi-direct effect) (lower panel) from the main analysis.Red colour for the posterior distributions and black lines and grey shadings for the prior distribution.

Main analysis
The posterior RF time series and PDFs for the RF in 2010 are shown in Fig. 1.The posterior mean of the total RF is higher than the prior mean (Fig. 1, upper panel), mainly due to the weakening of the magnitude of the total aerosol effect when the model is updated with data (Fig. 1, lower panel).In the mid-20th century (1940s-1970s) the posterior mean of the total anthropogenic RF time series show much weaker change compared to the decreasing RF for the prior assumptions (Fig. 1, middle panel).Our analysis suggests that the net anthropogenic RF did not cause a global cooling as is observed (Trenberth et al., 2007) during this period.
The posterior RF for the total aerosol effect is weaker than the prior assumptions (Fig. 1, lower panel), with a mean posterior value in 2010 of −1.06 Wm −2 and a 90 % C.I. of −1.7 to −0.40 Wm −2 .Inverse estimates of net aerosol RF over the 20th century that are consistent with observed warming, Hegerl et al. (2007), gave a similar likely range of −1.7 to −0.1 Wm −2 .The 90 % confidence interval of the total aerosol effect in IPCC AR5 is slightly broader than our posterior, −1.9 to −0.1 Wm −2 (Boucher et al., 2013).Our result is in accordance with the residual forcing (all aerosol effects and any unknown mechanisms) between the pre-industrial and 1970-2000 periods of −1.1 ± 0.4 Wm −2 (1σ ) found by Murphy et al. (2009).This corresponds to a 90 % C.I. of −1.8 to −0.44 Wm −2 that is similar to the total aerosol RF posterior mean of −1.12 Wm −2 (90 % C.I. of −1.7 to −0.53 Wm −2 ) for the 1970-2000 average in our analysis.
A strong historical aerosol cooling implies a high ECS to be consistent with the observed temperature trend and vice versa (Andreae et al., 2005).Our results show a less negative aerosol forcing than our prior assumption.The ECS posterior mean is 1.8 • C (Fig. 2a), which is below the lower limit of the likely range (> 66 % probability) for the ECS of 2 to 4.5 • C in IPCC AR4 (Meehl et al., 2007), but within the AR5 likely range of 1.5 to 4.5 • C (Collins et al., 2013).The 90 % C.I. of the posterior ECS is 0.9 to 3.2 • C, and the heavy upper tail often seen in estimates of ECS is less pronounced.The probability of ECS being larger than the upper limit of the IPCC likely range of 4. including observations for the last decade (Lewis, 2013;Otto et al., 2013) and using data from the last glacial maximum (Schmittner et al., 2011) that also find climate sensitivity in the lower range of IPCC.
The fitted posterior mean and the observed hemispheric temperatures and OHC are compared in Fig. 3.For the SH the model reproduces the long-term trend of the observations.In the Northern Hemisphere (NH), the fitted temperature increase over the last two decades is not as rapid as in the observations, leaving the observations just outside 90 % C.I. of the fitted temperatures.The fitted temperature and OHC include only the results from the deterministic model (m t (x 1750:t , ECS, θ)) and the effect of ENSO (β 1 e t ) on the right side of Eq. ( 1).In the NH, much of the discrepancies are accounted for by the long-term internal variability represented by the term n liv t (Fig. 4, left panel).This is further discussed in Sect.4.4.Figure 4 also shows the posterior estimates for the ENSO term β 1 e t and the model error n m t .For the OHC the model reproduces the long-term trend of the observations (Fig. 3).In the 1950s one of the observational time series is outside the 90 % C.I. of the fitted OHC.The CSIRO group reports a large standard error of up to 10 ∼ 10 22 J in this period (Fig. B3), and the posterior estimates of the standard errors are of the same magnitude (Fig. B4), so the observational error term (n o t ) will explain these discrepancies.This is discussed further in Sect.4.3.

Updating the model with data between the years 2000 and 2010
Our 90 % C.I. for ECS is tighter compared to previous estimates of the ECS using observations from the 20th century.To investigate the influence of the last ten years' data on our model's estimate, we have re-estimated the model using only data up to the year 2000.The resulting posterior distribution of ECS is shown in Fig. 2f.The estimated mean of ECS of 2.26 • C is 23 % higher than in the main analysis.The upper limit of the 90 % C.I. is 5. 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990  1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 (c) Temperatures, Southern Hemisphere Observed, HadCRUT3 Observed, GISS Observed, NCDC Fitted 90% credible interval −10 0 5 10 OHC [10^22 J] 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 (d)   model is updated with data from 2003 and 2004, and when the model is further updated with data from 2005 and 2006 the ECS estimate is still the second highest of these six estimates.The reason is probably that the total RF (both prior and posterior) remains at the same level between 2002 and 2006 (Fig. 1), whereas the OHC increases, especially between 2002 and 2004 (Fig. 3).After 2006, the RF increases again, while there is little or no increase in the OHC.The drastic reduction in uncertainty (R90 reduced from 2.20 to 1.23) with ten more years of data may be surprising.We believe there are two main reasons for this.First, the RF increased significantly in this period (Fig. 1), so these ten years of data are more informative than data from a period of the same length, but with less variation in RF.Second, while the temperature series is lengthened by 6-8 %, the OHC series are extended by about 20 %.Since the OHC data are more informative (cf.Sect.4.3), ten years of data therefore makes an important contribution to the information content in the data.

Transient climate response
To determine a PDF for the TCR the EBM is run with a 1 % per year increase in CO 2 using the joint posterior distribution of the model parameters.The TCR from the main analysis has a posterior mean estimate of 1.4 • C and a 90 % C.I. of 0.79 to 2.2 • C, while using the model parameters constrained by data only up to the year 2000 gives a wider distribution with a 90 % C.I. of 0.54 to 2.9 • C. Recently, Padilla et al. (2011) also found a narrowing of the TCR over the last decade, with a 90 % C.I. of 1.3 to 2.6 • C in 2008.Also, Gillett et al. (2012) estimated a narrower range of TCR using observations over the period 1851-2010 rather than 1900-1999.Our results are in line with other estimates of TCR (Stott et al., 2006;Gregory and Forster, 2008;Knutti and Tomassini, 2008) and are slightly shifted to weaker values when the model is updated with data up to 2010.The IPCC AR5 concluded that TCR is likely (66-100 % probability) in the range of 1.0 to 2.5 • C and extremely unlikely greater than 3 • C (IPCC, 2013).

Discussion
The results from our main analysis give a lower and better constrained estimate of the climate sensitivity compared to the majority of previous estimates.In this section we investigate the role of several factors that would impact the expected value as well as the uncertainty in the estimated climate sensitivity.This includes structural features of the EBM, uncertainties in the RF and in the surface air temperatures and OHC data, and the role of internal variability.We perform a sensitivity analysis including recent observational data on OHC trends for depths below 700 m.In addition to the inclusion of uncertainties in parameters of the deterministic model, and in the observations used that together are propagated to give the pdf of the climate sensitivity, there might be other limitations in the method and sources of uncertainties that are not quantified in the estimated pdfs.This includes e.g.uncertainties due to the simplified structure of the EBM or the a priori estimates.

Uncertainties in radiative forcing
To constrain the climate sensitivity, the treatment of RF uncertainties is important.Tanaka et al. (2009) suggested that the probability of high climate sensitivity is even higher than previous estimates because of insufficient handling of the historical development of the RF uncertainty.The uncertainties in the RF time series are treated simply in this study, however in a more sophisticated way than in many previous  studies that have used a scaling approach related to only the aerosol RF (Andronova and Schlesinger, 2001;Gregory et al., 2002;Knutti et al., 2002;Forest et al., 2006).We include the uncertainty for all components (Table 1), as in Tomassini et al. (2007), who scaled individually the nine RF mechanisms they considered.The uncertainty in the temporal pattern of each RF mechanism is not included, but the net RF time series will have uncertainty in the temporal structure when all the RF mechanisms are combined.
Since the ECS is better constrained by adding data for the last 10 yr, we first investigate the RF over the last decade.Our prior total RF increased by 0.29 Wm −2 from 2000 to 2010.There have been several studies investigating the possible changes in RF related to the apparent flattening of the temperature trend during the last decade.Solomon et al. (2010) explained some of the recent trend in temperature by a reduction in stratospheric H 2 O.They calculated an RF of −0.1 Wm −2 in 2000-2005 relative to 1996-2000.Data before the mid-1990s are sparse, but observations indicate an increase in stratospheric H 2 O between 1980 and 2000, possibly being an important driver for decadal climate change (Solomon et al., 2010).The reason for the recent decrease is not clear, but if the change in stratospheric H 2 O is due to natural variability or climate feedback (Dessler et al., 2013), it should not be included as an RF in our setup.Only stratospheric water vapour change from CH 4 oxidation is taken into account in our analyses.Stratospheric aerosols have increased since 2000, contributing to −0.1 Wm −2 (Solomon et al., 2010).Vernier et al. (2011) related this to recent tropical volcanoes.We have used the updated values for stratospheric aerosol optical thickness from Sato et al. (1993) which give a greatest in magnitude volcanic RF of −0.14 Wm −2 in 2003 and 2005.In addition, the Sun experienced a minimum in activity around 2010, and the prior mean solar RF in 2010 relative to the maximum in 2000 is −0.14 Wm −2 .In comparison the net anthropogenic RF increased by 0.44 Wm −2 over the last decade for our prior assumptions (0.33 Wm −2 from LL-GHGs).For both the prior and posterior the increase in total RF over the last decade is less than the increase in anthropogenic RF (Fig. 1), but shows a clear increase of 0.29 Wm −2 for the prior mean and 0.31 Wm −2 for the posterior mean.
It is also suggested that strengthening of the sulfur RF due to increased emissions in China also contributed to the flattening temperature trend (Kaufmann et al., 2011).For our prior RF time series for the direct aerosol effect, the strengthening of the sulfur direct aerosol effect has been offset by the strengthening in the BC direct aerosol effect from 2000 to 2010 (Skeie et al., 2011b).The BC effect was not considered by Kaufmann et al. (2011).The BC emissions from China used in Skeie et al. (2011b) increased by ∼ 20% between 2000 and 2010, in agreement with the inventory by Zhang et al. (2009), where it increased by 13 % between 2000 and 2005, but weaker than the 46 % increase between 2000 and 2010 from Lu et al. (2011).This inventory also shows an increase of 46 % for the sulfur emissions between 2000 and 2010, in agreement with the ∼ 50 % increase in Chinese emissions over the same period used in Skeie et al. (2011b).To test how the temporal structure of the RF over the last years affect the results we did a sensitivity test (Appendix E) where the prior mean for the direct aerosol effect strengthens between 2000 and 2010 compared to the main analysis where the direct aerosol effects weakened over the same period.This had a minor effect on the estimated ECS (Fig. E1).
We also tested the sensitivity to changes in the temporal development of the RF early in the simulation period and to the role of uncertain data before 1900 (update with data only between 1900 and 2010 to exclude the uncertain early period including the 1883 Krakatoa volcanic eruption).Both sensitivity tests had only very minor effects on the estimated ECS (Fig. E1).Limited information regarding the uncertainty in the rate of change of RF is available.In Fig. D2 the prior and posterior anthropogenic RF time series is plotted together with RCP historical RF time series and AR5 forcing estimates with uncertainties (IPCC, 2013) for the years 1950, 1980 and 2011 (see Appendix).Our posterior is in good agreement with the AR5 values; however, we have larger uncertainty in the 1950s, a lower mean value in the 1980s, and we do not include the upper range of IPCC AR5 for the year 2011 estimate.This indicates that the low estimate for the ECS in our main results is not due to unreasonably high RF estimates; however, large changes to the historical RF path (e.g.due to indirect aerosol effects) may change the ECS estimate.
There are other proposed RF mechanisms that are not included here due to large uncertainties and a lack of scientific understanding, which could possibly alter the ECS estimate.For the indirect aerosol effects we have included the cloud albedo effect, cloud lifetime effect and the semi-direct effect.The prior time series for the indirect effects, constructed in a simple way, are based on aerosol effects on liquid water clouds.Aerosols may also influence mixed-phase or ice clouds (Denman et al., 2007) and the indirect effect of aerosols on these clouds are very uncertain, but possibly of great importance (Penner et al., 2009).However, if all indirect effects have a similar temporal pattern, there is a clear signal that the data do not allow large negative values for the total aerosol effect (Fig. 1, bottom panel).
We have assumed that the RF mechanisms have equal temperature responses, and are additive and independent, which may not be entirely valid.The climate efficacy, that the climate sensitivity depends on the type of forcing (Joshi et al., 2003;Hansen et al., 2005), is not considered.However, we include the semi-direct effect and the cloud lifetime effect as forcings, and these effects are partly reasons for including differences in climate efficacy in GCMs (Forster et al., 2007).There are few studies considering efficacy, and the efficacy for different forcing mechanisms generally lies in the range 0.6 to 1.3 (Forster et al., 2007).The efficacy can be assumed to be partly included in the RF uncertainty, but a proper inclusion of efficacy will increase the uncertainty in the estimated ECS.We have also assumed that the RF errors are independent.This may not be true, since e.g. the forcing mechanisms related to emissions due to fossil fuel use will be dependent.It is also plausible that the magnitudes of the different aerosol effects are related.However, insufficient information is available to include dependent radiative forcing error estimates.
The RF due to CO 2 is calculated based on measured changes in CO 2 concentrations.With the standard formal definition of ECS as the equilibrium temperature response to a CO 2 doubling, carbon-cycle feedbacks are by definition not included.There is however no scientific reason for this definition; it is based on a technical approach to setting up GCM experiments in a simple way.A more reasonable definition of the ECS would be as the equilibrium temperature response to an external forcing with the magnitude and distribution equal to that of a CO 2 doubling.In our RF estimate for CO 2 there is an implicit assumption that any contribution to the historic CO 2 change from climate-carbon feedbacks is negligible.However, Arora et al. (2009) estimate that as much as 15-20 ppm of the observed CO 2 increase may be due to climate feedbacks.Allowing for such a feedback in our analysis would lead to a lower RF estimate and thus a somewhat higher estimate of the sensitivity.

Surface temperature observations
Our analysis suggests an almost stable total anthropogenic RF in the middle of the 20th century (Fig. 1, middle panel) and that the decrease in the observed temperature in this period (all three records used in the main analysis) is not caused by anthropogenic RF.The fitted global mean temperature during this period is in accordance with Thompson et al. (2010), who related the decrease in the observed temperature between the 1940s and the 1970s to two distinct periods.The first one was a discontinuity in the mid-1940s, due to uncorrected instrumental biases in the sea surface temperature (SST) (Thompson et al., 2008).The second period was around 1970, when an abrupt drop in Northern Hemisphere SST was observed, which is real and not related to any instrumental bias.The bias in the mid-1940s is not corrected in the surface temperature data sets used in our main analysis.A sensitivity test replacing HadCRUT3 with HadCRUT4 (Morice et al., 2012) which includes this SST correction has been carried out and gave almost identical results (Fig. 2c vs. 2a).

The role of ocean heat content
Including observations from the last decade had a large effect on the ECS estimate.The OHC time series is extended by about 20 % and previous studies (Tomassini et al., 2007;Urban and Keller, 2009;Aldrin et al., 2012) have shown that OHC data have the potential to constrain the ECS estimate.In Appendix E we show that when adding only nearsurface temperatures between 2000 and 2010 (excluding the OHC data for this decade, Fig. E1g) the ECS estimate is only slightly narrower than using data up to 2000 (Fig. E1c), highlighting the information provided by the OHC data.
The fitted OHC from the model is compared to the three historical estimates in Fig. 3.The fitted OHC is a smooth curve compared to the observations but with dips related to volcanoes, as is also seen in the observations, and two of the three OHC data sets used in this study show a flattening of the OHC since 2004.There is good agreement with the long-term trend in OHC between the model and the observations.Except for responses to volcanic eruptions there is low correlation between the shorter term variability in the 3 observational data sets (as opposed to the nearsurface temperature data).As noted in Sect.3.1, one of the observation curves lies outside the 90 % C.I. in the 1950s.The reported and estimated standard errors for this data set increase back in time (Figs.B3 and B4) and in the 1950s the standard errors are larger than the difference between our estimate and the data.The further back in time, the poorer the spatial coverage of the observations (e.g.Fig. 1 in Abraham et al., 2013).In the early 2000s the Argo floats were launched (http://www.argo.ucsd.edu/),which significantly improved the spatial coverage of ocean observations.Prior to the launch of Argo the main data used were collected from expendable bathythermographs (XBT) which have systematic data errors (Gouretski and Koltermann, 2007), but the Argo data also have known biases that need to be corrected (Abraham et al., 2013).Lyman et al. (2010) found that XBT bias correction was the main source of uncertainty in the warming trend from 1993 to 2008.The differences among the three data sets are larger than between the surface temperature data series (Fig. 3d), and a further effort in estimation of historical OHC data and its uncertainty is needed.
Since the sea level has continued to increase, it has been suggested that the recent flattening in the OHC in the upper ocean has been compensated for by an increase in the heat content of the deep ocean.Meehl et al. (2011) used model simulations, and found that periods with no increase in temperature in the upper ocean are accompanied by an increasing temperature in the deeper ocean.Purkey and Johnson (2010) found that in the 1990s and 2000s there was an increase in OHC in the abyssal and deep Southern Ocean, based on sparse observations from ships, but it is not clear if it is a long-term trend.Palmer et al. (2011) highlighted the importance of deep ocean observations to monitor the Earth's energy balance; however, our main estimate is not constrained by deep ocean data.In the EBM heat is however transported to the deep ocean and between 1961 and 2010, 11 % of the total increase in OHC occurred below 700 m in the main analysis.This is lower than in Hansen et al. (2011), who used 15 % for the period 1993-2008 and 19 % for the period 2005-2010, assuming constant heat uptake in the deep ocean from the work by Purkey and Johnson (2010).Since added energy to the climate system is almost exclusively stored as heat in the oceans, a non-zero global radiative imbalance is approximately equal to the rate of change in OHC.The inferred planetary energy imbalance from Hansen et al. (2011) was 0.58 ± 0.15 Wm −2 during the 6 yr period 2005-2010 assuming a stronger aerosol RF (−1.6 ± 0.3 Wm −2 ) and a larger climate sensitivity (3 ± 1.0 • C) than the posterior means in this study.We find a similar planetary imbalance of 0.46 ± 0.16 Wm −2 over the same time period.
Very recently OHC data for the deeper ocean have become available, for the layer 700-2000 m for the period 1955-2010 (Levitus et al., 2012), and below 3000 m between 1985and 2006(Kouketsu et al., 2011).We have made an additional simplified sensitivity test with our model using the three OHC data sets for 0-700 m as in the main analysis and the new deep ocean OHC data for the two decades of data.The sum of these OHC deep ocean trend data is included to constrain the total OHC in our model between 700 m and the ocean floor.Including the deep ocean data leads to an increased mixing of heat down to the deep ocean and a small increase in the estimated ECS (from 1.84 to 1.92 • C, Fig. 2d vs. Fig.2a).For the 2005 to 2010 period the estimated increase in OHC for the entire ocean is 0.37 ± 0.14 Wm −2 for the main analysis and 0.40 ± 0.16 Wm −2 for the sensitivity with data for the deep ocean.This is in good agreement with the heat gain of 0.39 Wm −2 (averaged over the whole globe) in the upper 1500 m of the ocean estimated by von Schuckmann and Le Traon (2011) based on the ARGO measurement network.
As discussed above, the OHC data have the potential to constrain the ECS estimate.However, while the temperature series are quite similar, the three OHC series differ considerably more, indicating that the observational errors for the OHC data can be large (and perhaps larger than some of the data providers report, see Figs.B3 and B4 in Appendix B).Therefore, using three OHC series simultaneously instead of only one should decrease the uncertainty of the ECS estimate due to the reduced influence of observational errors.This is demonstrated in a sensitivity test where we use only one OHC data set (from Levitus et al., 2009) and data only up to 2000 as in many previous studies (Fig. 2g).Using only one OHC data set the PDF for the ECS is remarkably wider (Fig. 2g) than using three OHC data sets (Fig. 2f), with a posterior mean shifted towards the prior mean (which is 10 • C).Using one OHC data set and data up to the year 2000, the 90 % C.I. is 1.1 to 14.5 • C, with a mean value of 4.5 • C, in line with or even higher than previously published estimates (Andronova and Schlesinger, 2001;Forest et al., 2002;Gregory et al., 2002;Knutti et al., 2002;Frame et al., 2005;Annan and Hargreaves, 2006;Forest et al., 2006;Tomassini et al., 2007) that used observational data for time periods ending between 1994 and 2003.This indicates that the narrow range of the ECS in the main analysis is not due to an artifact of the model used, but indeed due to the added observational information by the two additional OHC data sets and the 10 additional years.In the cases with one OHC data set the posterior distribution of the total aerosol effect in 2000 is −1.4 Wm −2 , with a 90 % C.I. of −1.9 to −0.71 Wm −2 .The estimated total aerosol effect is stronger than in the main analysis that had a mean value of −1.1 Wm −2 in 2000.

Multidecadal oscillations
In the North Atlantic the observed SSTs show a multidecadal oscillation, known as the Atlantic Multidecadal Oscillation (AMO) (Kerr, 2000).Whether AMO is due to external forcing or internal variability is however not clear (e.g.Knight, 2009;Ottera et al., 2010).DelSole et al. (2011) explicitly identified a significant unforced multidecadal component using climate simulations and observations, and found that the AMO is dominated by internal dynamics.Using observed temperature Wu et al. (2011) separated the temperature trend into a secular trend related to fossil fuel emissions and a multidecadal variability, and found a significant contribution to the late 20th century warming from this multidecadal variation.
In the stochastic model multidecadal variability is represented by a separate term (n liv t ) based on results from long control simulations with CanESM2 (main analysis) or NorESM.The difference between the ECS estimates using these two GCMs (Fig. 2b vs. 2a) is minor.To investigate the impact of prior knowledge about the multidecadal variability we have performed a sensitivity test ignoring the explicit term for long-term internal variability (Fig. 2h).In such a simplified model, temporary increasing or decreasing trends in temperature or OHC over 10-20 yr may falsely be accounted as permanent trends, giving too optimistic uncertainty estimate.As expected, adding unforced long-term variability gives a larger uncertainty in the estimated climate sensitivity.Also, the expected value becomes somewhat higher (Fig. 2a vs. 2h).This is reasonable since in general larger uncertainty will move the posterior mean towards the prior mean.The posterior estimate for the unforced multidecadal variability is shown in Fig. 4.
The results indicate that during the periods 1910-1940 and 1970-2000 a global warming of about 0.2 and 0.12 • C can be attributed to internal variability (Fig. 4).The magnitude of the long-term internal variability is in reasonable agreement with the findings of DelSole et al. ( 2011), who found a significant component of unforced multi-decadal variability in the recent acceleration of global warming, with ±0.08 • C per decade for a 30 yr trend.Wu et al. (2011) estimated that up to one third of the late twentieth century warming could have been a consequence of natural variability.Our results indicate that 23 % of the increase in the near-surface NH mean temperature between 1976-1985 and 2001-2010 is explained by internal variability, in agreement with Wu et al. (2011).
There are studies indicating that there might be a forced component in the long-term variability, e.g. the AMO (Booth et al., 2012).In the climate system (and in GCMs) this would mean that the forcing would affect the variability of mixing of OHC from the surface layers to the deeper ocean.With the simple structure of our EBM we cannot represent this possibility since the parameters of the EBM are fixed over time (although the values are estimated from the data), and longterm variability not explained by variations in forcing will be attributed to internal variability.There are large uncertainties in historical forcings, and in the temporal development of the RF.However, the forcing histories from Skeie et al. (2011b) applied here are based on recent estimates of historical emissions and detailed modelling of atmospheric chemistry, and are significantly more detailed than RF histories applied in many previous studies.Shortcomings in temporal development of the historical RF could lead to either too much or too little of the response attributed to the internal variability term.If too little of the response attributed to the internal variability, we would expect that the uncertainty in the ECS estimate is underestimated and vice versa.

Interhemispheric differences
The fitted hemispheric temperatures in Fig. 3 show that the deterministic parts of the model (the EBM and the ENSO terms in Eq. 1) underestimate the recent hemispheric difference in the warming compared to the observations.Previous studies (Andronova and Schlesinger, 2001;Forest et al., 2002) have highlighted the role of the inter-hemispheric temperature difference as a key diagnostic to determine the aerosol forcing.However, there are several factors that influence the interhemispheric temperature asymmetry (ITA).Anthropogenic and natural aerosols, hemispheric differences in climate feedbacks, differences in response time (through e.g.land/ocean fractions) and differences in internal variability could all play a role.Anthropogenic aerosols mainly cool the NH, thus too low an ITA could indicate that the net negative RF from aerosol is underestimated, which again would mean that the ECS is underestimated.However, results from the CMIP5 models (Friedman et al., 2013) indicate that since about 1975 anthropogenic aerosols have not contributed to an increase in the ITA, but that the increase is mostly due to increased GHGs.In the EBM we calculate the temperature response in each hemisphere, but we assume that the climate feedbacks are equal.This is a simplification, as e.g. the snowalbedo feedback can be expected to be stronger in the NH due to more land.Other EBMs (e.g.Raper et al., 2001) have imposed a fixed hemispheric difference in the climate sensitivity parameter to emulate the response of specific GCMs.
In a sensitivity test we have allowed for hemispheric differences in the climate sensitivity parameter.The difference between the hemispheric ECS estimated from the data was minor (10 %), and the posterior estimate for the ECS was very close to the main analysis (Fig. 2e).
The EBM used here accounts for different ocean volumes due to different land fractions in the two hemispheres, thereby imposing a different effective heat capacity and thus a different temporal response to short-term RF changes in the two hemispheres.However, our EBM does not include an explicit representation of the energy balance for land and ocean areas in each hemisphere (Olivie and Stuber, 2010).
The model error terms are shown in Fig. 4. In the NH and for the global OHC the error term is mainly short-term fluctuations, while for the air temperatures in the SH there is also a multidecadal signal indicating that the correlation between the variability in the two hemispheres is different in the data than in CanESM2.The model error term accounts for several factors such as lack of explicit representation of the energy balance for land and ocean areas in each hemisphere, and possible shortcomings in the RF, etc.
Overall it is difficult to determine which factors are responsible for the discrepancy between the observed and fitted ITA, and thus it is also difficult to assess the impact of this shortcoming on the estimated ECS.

Comparisons with results from a similar approach
Huber and Knutti (2012) used a similar approach and similar data as us, but their PDF of ECS was remarkable different from ours, with a posterior estimate of 3.6 • C and a much wider 90 % C.I. from 1.7 to 6.5 • C. Since their approach (Huber, 2011) is seemingly very similar to ours, it is worthwhile to discuss potential reasons for the differences.We will focus on two details that may lead to larger uncertainties, which means that the ECS will be more similar to its prior distribution.These differences are: (i) although they use basically the same data series as us, they use only one temperature series and one OHC series per analysis, and make a simple average of these separate analyses at the end.As we have argued for above, using multiple observational series for the same quantity reduces the influence of the observational errors, especially for OHC; (ii) they do not use a simple climate model such as our EBM, but instead they use a so-called emulator.This emulator is based on a neural network model and is an approximation to a medium complex climate model.Therefore they have to introduce an extra error term to account for the approximation.For OHC, the approximation error is much larger than other error components (Fig. 4.2b in Huber, 2011).We believe that the effect of this is that the OHC data are considerably down-weighted compared to the temperature data, resulting in a high uncertainty in their ECS estimate.

Summary and conclusions
In this study, detailed RF time series for all well-established forcing mechanisms and observed OHC (0-700 m) and nearsurface temperature changes to the year 2010 are combined in a Bayesian framework using a simple EBM and a stochastic model.The heavy tail often seen in PDFs of the ECS constrained by observed temperature change over the 20th century (summarized in Hegerl et al., 2007;Knutti and Hegerl, 2008) is substantially reduced.The posterior mean estimate of the ECS is 1.8 • C, in the lower part of the likely range given in IPCC AR5 (IPCC, 2013), and the probability of values larger than 4.5 • C is only 1.4 %.The majority of previous studies have not included temperature and OHC data over the last decade.Here we have used observational data including 2010, and we have shown that the combination of using multiple data series for surface temperatures and OHC and the additional 10 yr of data since the year 2000, especially the OHC data, improve the constraint of the ECS.Using data only up to the year 2000 and using one OHC data set as in previous studies, gave a significantly wider posterior distribution with a 90 % C.I. of 1.1 to 14.5 • C, with a heavy tail towards larger values.One of the reasons why it is difficult to find the upper bound of the ECS is that the ECS is nonlinearly related to the climate response time (Hansen et al., 1985;Wigley and Schlesinger, 1985).For lower values of the www.earth-syst-dynam.net/5/139/2014/Earth Syst.Dynam., 5, 139-175, 2014 ECS the ECS is more linearly related to the TCR, allowing a narrower uncertainty range for the ECS.The estimated PDF for the TCR was also narrowed using observational data over the last 10 yr and three OHC data series.The 90 % C.I. for TCR is 0.8 to 2.2 • C, slightly shifted towards lower values than the likely range of 1 to 2.5 • C from IPCC AR5 (IPCC, 2013).The analysis also suggests that there is a significant contribution of internal variability on a multi-decadal timescale to the global mean temperature change, and that both anthropogenic forcing and internal variability contributed to the temperature increase at the end of the 20th century.The analysis excludes the possibility for very large negative aerosol RF.The posterior 90 % C.I. for the total aerosol effect in 2010 was −1.7 to −0.4 Wm −2 .From the data we estimate an almost stable total anthropogenic RF in the middle of the 20th century.
There are limitations in the prior RF, including the uncertainties in the temporal development for the RF.Therefore, to obtain better knowledge of the ECS and future climate change, further efforts in estimating the historical RF are needed.Especially the historical development of the indirect aerosol effects, which changed most from prior to posterior, needs to be estimated using a more detailed approach.There are also limitations and uncertainties that are difficult to quantify related to the necessary simplifications that must be done to the climate model that is the core of the method.In future studies several alternative models, including a model of intermediate complexity, should be applied in a controlled experiment.This would allow us to better quantify the role of these structural uncertainties.
We have shown that it is especially the simultaneous use of three OHC data series and including the last 10 yr that have constrained the ECS.However, there are large uncertainties in the estimates of OHC time series (Lyman et al., 2010).Therefore further efforts in re-evaluating the OHC data and correction for instrumental biases should be a high priority, including the monitoring of the deep ocean.It should be noted that the estimated ECS in this study does not include very slow climate feedbacks like melting of ice sheets.Also, the ECS estimates do not include biogeochemical feedbacks for the LLGHGs which possibly can have affected historic concentrations and can make a large contribution to future climate change (Arneth et al., 2010).

Model description
In a Bayesian approach, parameters are assigned prior uncertainties, accounting for uncertainties in the knowledge of the parameter values.By combining this prior knowledge with observational data, the parameter values of a computer model can be constrained (Kennedy and O'Hagan, 2001).The Bayesian theorem is as follows: where P (θ|data), the posterior distribution of the parameters (θ) given the observational data, is proportional to the prior distribution of the parameters P (θ) multiplied by the likelihood of the data P (data|θ).Knowledge of the parameters can be gained by the observational data.It is important that the observational data used to constrain the parameters have not been used when the prior distributions of the parameters are decided.
The parameter of interest in this paper is climate sensitivity, one of several parameters in the EBM (see Table A1 below and Aldrin et al., 2012).The EBM is a deterministic model which calculates hemispheric mean temperature and ocean heat content with RF time series as input, and we combine the EBM with a stochastic model to make an inference.
The combined process model is given by Eq. ( 1) in the main paper.However, in the data process model given by Eq. (2), we have for simplicity deleted a term β 0 , so the correct process model is Here, β 0 is a vector of intercepts and is included because the measurements and output of the computer model are given relative to the mean of the different reference periods.
The model errors are modelled as a vector autoregressive process of order 1 (VAR(1) process), while the observational errors are modelled as a scaled VAR(1) process, where the scaling factor is given by a vector of standard errors.
The long-term internal variability term n liv t is modelled as a vector autoregressive process of order 3 (VAR(3) process), i.e. n liv t = φ liv 1 n liv t−1 + φ liv 2 n liv t−2 + φ liv 3 n liv t−3 + ε liv t and Var(ε liv t ) = liv = diag(σ liv ) C liv diag(σ liv ), where φ-s are matrices with coefficients and where σ liv and C liv are the standard deviations and correlation matrix of the covariance matrix, respectively.To estimate the parameters of this process, we use results for the Canadian CanESM2 in the CMIP5 experiment (Yang and Saenko, 2012) with simulations over 900 yr with zero RF.First we subtract a linear trend from the each of the temperature and OHC series to account for drift.Then we apply a 10 yr running mean to each of the three resulting time series (two for temperature, one for OHC).Furthermore, we estimate a VAR(3) process from these data, such that for instance the temperature in the Northern Hemisphere depends on the values of itself and the two other quantities in the three preceding years.We include this VAR(3) process as an extra term (n liv t ) in our model, with all parameter values, except the standard deviations σ liv of the errors ε liv t , kept fixed (see Table A2).However, this is only an estimate of the internal variability.The magnitude may differ between different AOGCMs and Huber and Knutti (2012) claim that models underestimate internal variability by a factor of 3. Therefore, we treat the standard deviations in the VAR(3) process as unknown parameters that we estimate, whereas the correlation structure is kept fixed.Note that in the model presented previously in Aldrin et al. ( 2012), we did not include the term n liv t in the process model.In that model long-term internal variability was accounted for by the model error term n m t .To apply a Bayesian approach, prior distributions of the model parameters and input data must be given.The input data, the RF time series for all well-established mechanisms, are estimated in this paper and given prior uncertainties based on the ranges of published estimates and subjective assessments.The priors of ECS and θ = (θ VHD , θ P , θ UV , θ ASHE , θ OIHE , θ M ) are given in Table A1, while all remaining parameters of the stochastic model, except the standard deviations in the VAR(3) process for long-term internal variability, are given vague priors.The reasoning behind the choice of priors of ECS and θ is given in the Supplement of Aldrin et al. (2012).The standard deviations in the VAR(3) process are modelled as σ liv = β 2 σ liv GCM , where σ liv GCM is the standard deviation obtained when estimating the VAR(3) process from the CanESM2 data, and β 2 is a diagonal matrix where each diagonal element of β 2 is uniformly distributed between 1/5 and 5.
The EBM used in this study is the hemispheric version of the energy-balance climate/upwelling-diffusion ocean model described in Schlesinger et al. (1992) and its global version in Schlesinger and Jiang (1990).This model is a part of the CICERO SCM (Fuglestvedt and Berntsen, 1999) and has been used in several studies (e.g.Fuglestvedt et al., 2003;Rive et al., 2007;Skeie et al., 2009).The model has been shown to reproduce GCM model results from idealized experiments with a gradually changing forcing when the model parameters are calibrated (Olivie and Stuber, 2010).In the model the atmosphere is represented by a single layer and the ocean is subdivided into 40 vertical layers where the uppermost ocean layer is the mixed layer.Horizontally the model is divided into a Northern and a Southern Hemisphere part, with a separate energy-balance calculation for each hemisphere.The hemispheric difference in the land/ocean fraction is taken into account by scaling the efficiency of heat uptake by the ocean.This allows for a more rapid response to forcings in the Northern Hemisphere.Each ocean box is divided into a polar and a non-polar region.In the polar region heat is transported from the mixed layer into the deep ocean representing deep water formation.In the non-polar region heat is transported downwards by processes treated as diffusion and advected upwards by slow upwelling.The representation of the ocean mixing is very simplified and does not include entrainment of downwelling water at intermediate depths as some other simple climate models do (e.g.MAGICC6 described in Meinshausen et al., 2011).As a consequence of this, in the sensitivity test with observed deep ocean OHC (Fig. 2d) we treat all ocean below 700 m as one compartment.
The process model is updated by observed temperature change and ocean heat content y t , taking into account uncertainties in the observational data, using a Markov Chain Monte Carlo (MCMC) algorithm.Posterior distributions of the climate sensitivity and other parameters are obtained.The statistical method is described in more detail in Aldrin et al. (2012).Table A3 gives the MCMC parameter estimates for the main analysis, and three other key analyses.The estimates of standard deviations of the terms for ENSO, longterm internal variability and model errors are given in Table A4, while standard deviations of observational errors are given in Table A5.
Note that in the sensitivity test with two ECSs, one for the Northern Hemisphere (ECS NH ) and one for the Southern Hemisphere (ECS SH ), the prior for ECS is as in the main analysis, ECS NH + ECS SH 2 = ECS, and log ECS NH ECS SH ∼ Uniform(− log(1.5),log(1.5)).In the sensitivity test with two mixed layer depths, one for the Northern Hemisphere (θ M NH ) and one for the Southern Hemisphere (θ M SH ), the priors for θ M NH and θ M SH are set equal to the prior for θ M in the main analysis.The uncertainty range for the RF mechanisms are based on Skeie et al. (2011b).If the 90 % confidence interval for the RF mechanisms include zero, there are no restrictions on the sign of the forcing, and a distribution where the uncertainty is proportional to time is chosen.The uncertainty in 1750 is zero.If the sign of the RF mechanisms is restricted, uncertainty proportional to the expected value is chosen.If the given 90 % confidence interval is skewed, a lognormal distribution is chosen.A lognormal distribution is also chosen for forcing mechanisms with a symmetric 90 % confidence interval and a probability greater than 0.005 of RF with the wrong sign.Otherwise, if the probability is less than 0.005, a normal distribution is chosen.For the lognormal distribution, the upper and lower quantiles are dependent and will not exactly match the given 90 % confidence interval.

D1 Direct aerosol effect
The direct aerosol effect is the sum of five components: sulfate (SO 4 ), black carbon (BC) from fossil fuel and biofuel combustion (FFBF), organic matter (OM) (organic carbon from FFBF and secondary organic aerosols), biomass burning aerosols (BB) and nitrate aerosols (Nit).
For a given year we want the following statement A to be true: "The sum of the expected value for the aerosol components is equal to the expected value for the total direct aerosol effect."Due to nonlinearities the sum of the individual aerosol direct effects is not identical to the total direct aerosol effect (Skeie et al., 2011b).For each year we find a constant a which is such that when we multiply each of the expected values for BC FFBF (SO 4 , OM, BB, Nit) with 1 + a(1 − a), the statement A is true.The values of the as are between −0.0177 and 0.0141.
For 2010 we also want the following statement B to be true: "In the year 2010 the sum of the variances for SO 4 , BC FFBF, OM, BB and Nit is equal to the variance for the total direct aerosol effect."We find a constant b which is such that when we multiply the standard deviations for SO 4 , BC FFBF, OM, BB and Nit, the statement B is true.We will also multiply the standard deviations before 2010 by a constant b.

D2 The posterior distributions
In Fig. 1 we show posterior RF time series and PDFs for RF in 2010 for the main analysis.Here (Fig. D1) we show the same results, but for the Northern and Southern Hemisphere separately.
Figure D2 shows the prior and posterior for the global anthropogenic radiative forcing time series as in Fig. 1.In addition, the historical RF from the RCP database (http://www.iiasa.ac.at/web-apps/tnt/RcpDb) is plotted as a dashed line.The RCP4.5 value is used for the year 2010 as in Skeie et al. (2011b).The RCP RF time series do not include the effect of land albedo changes, as this is included in our prior.In our prior we also include indirect and semi-direct aerosol effects that are not RF according to the definition in IPCC AR4, and are probably not included in the RCP historical RF time series.
The three error bars in the figure are total anthropogenic RF in 1950, 1980 and 2011 from the IPCC AR5 summary for policymakers (IPCC, 2013).These error bars include the total aerosol effects and are comparable to our forcing time series.Our prior has a weaker forcing (stronger aerosol forcing) than IPCC AR5, but our posterior is more in agreement with the AR5.Our posterior has a larger uncertainty in the 1950s, a lower mean value in 1980 and does not span the upper range in 2010 compared to AR5 (AR5 values are 2011).

Sensitivity tests
Here we describe some sensitivity tests performed with a somewhat simpler model setup using only one OHC data set (from Levitus et al., 2009).Results are shown in Fig. E1.
There is no uncertainty in the RF time development (i.e. the temporal form of the curve) for each component or mechanism.We make sensitivity tests (Test 1 and Test 2) to see how changes in the prior assumptions for the time development of RF affect our result.In sensitivity Test 3 we test the sensitivity of our results to the role of uncertain data before 1900 (update with data only between 1900 and 2010 to exclude the uncertain early period including the 1883 Krakatoa volcanic eruption), while in sensitivity Test 4 we test how the OHC data for the years 2001 to 2010 affect the ECS estimate by excluding OHC data from 2001 to 2010.
E1 Test 1: change the BC direct aerosol effect in the latter part of the simulation period Skeie et al. (2011a) calculated RF time series for Black Carbon (BC) from fossil fuel and biofuel (FFBF) sources using emission data from Bond et al. (2007).This emission scenario has a more rapid decrease in emissions in Europe and North America and a less rapid increase in emissions in eastern Asia in the latter part of 20th century compared to the emission inventory (Lamarque et al., 2010), which are used to construct the RF time series for the main analysis.
As a sensitivity test we replace the RF time series for BC FFBF in the main analysis with data from Skeie et al. (2011a).This RF time series has a less rapid increase in the latter half of the 20th century compared to the RF time series used in the main analysis.The time series in Skeie et al. (2011a) ended in 2000, so the increase from 1990 to 2000 is extrapolated further to 2010.The change in the BC FFBF RF time series will also influence the semi-direct effect.The total direct aerosol effect is assumed to be the sum of all aerosol components in this sensitivity test.
In this sensitivity test the prior mean for the direct aerosol effect strengthens between 2000 and 2010 by −0.04 Wm −2 compared to the main analysis, where the direct aerosol effects weakened between 2000 and 2010 by 0.03 Wm −2 .The altering of the temporal structure of the prior RF had only very minor effects on the estimated ECS (Fig. E1d vs. Fig.E1b).

E2 Test 2: change the BC direct aerosol effect in the first part of the simulation period
There are large uncertainties in the historical emission of BC (Bond et al., 2007).We modify the RF time series for BC FFBF (from the main analysis) from pre-industrial times and up to 1960 to see if changes in the RF pattern early in the period affect the estimated ECS.There are indications of too large a BC concentration around 1850 and too low a concentration in the early 20th century (Figs. 11 and 12 in Skeie et al., 2011a).As a sensitivity test we modify the time series of BC FFBF to have a more rapid increase at the end of the 19th century.Between 1750 and 1850 the RF time series is multiplied by 0.2 and between 1910 and 1940 multiplied by 1.3.We have linearly interpolated the multiplication factors for the years in between.After 1960 we use the same RF time series as in the main analysis, i.e. a multiplication factor of 1.As in Test 1, the change in the BC FFBF RF time series will also influence the semi-direct effect and the total direct aerosol effect.The altering of the temporal structure of the prior RF had only very minor effects on the estimated ECS (Fig. E1e vs. Fig.E1b).

E3 Test 3: updating the model with data between 1900 and 2010
Due to the large uncertainties in both the historical RF and observed temperature change, we did another test where the model is only updated with data between 1900 and 2010, excluding the uncertain early period including the 1883 Krakatoa volcanic eruption.This sensitivity test slightly shifts the ECS to larger values, increasing the posterior mean value by 0.3 to 2.2 • C and a 90 % C.I. of 1.4 to 3.3 • C, but values larger than 4.5 • C are still basically excluded (Fig. E1f vs. Fig.E1b).The probability of ECS greater than 4.5 • C is 0.005.

E4 Test 4: the role of ocean heat content -excluding OHC data for the years 2001 to 2010
To test how the OHC data for the years 2000 to 2010 affect the ECS estimate, a sensitivity test is performed where the OHC data from 2000 to 2010 is excluded, i.e.OHC data between 2001 and 2010 are not used to update the model.The resulting PDF for the ECS (Fig. E1g) is significantly wider than in the corresponding full analysis (Fig. E1b) and only slightly narrower than when data only up to year 2000 are used to estimate the ECS (Fig. E1c).The posterior mean for the ECS is 3.5 • C and the probability of ECS being larger than 4.5 • C is 0.17, so the estimated ECS is significantly constrained using the OHC data after the year 2000.

E5 An additional sensitivity test with two mixed layer depths
We have performed an additional sensitivity test where we included two instead of one mixed layer depths.More precisely, we included one mixed layer depth for the Northern Hemisphere and one for the Southern Hemisphere.Note that in this sensitivity test long-term internal variability is included explicitly in the model and three OHC series are used, i.e. the sensitivity test is as in the main analysis, except for the Earth Syst.Dynam., 5, 139-175, 2014 www.earth-syst-dynam.net/5/139/2014/inclusion of two mixed layer depths.The results are shown in Fig. E2.We observe that the inclusion of two mixed layer depths instead of one had only minor effects on the estimated ECS.

E6 The role of the ECS prior
In the main analysis the ECS is given a uniform prior (0-20 • C).To test the role of the prior distribution of ECS we have re-calculated the posterior PDF for ECS based on two other priors; one taken from Hegerl et al. (2006) and another that is uniform for 1/ECS.Technically, this is done by reweighting the MCMC samples from the estimation of our main model according to the alternative priors.This can be seen as a variant of importance sampling.Hegerl et al. (2006) calculated a PDF for the climate sensitivity that was a combination of PDFs from several authors, all on the basis of reconstructed temperature data before 1850.Since this PDF is based on data other than we use in our work, it can be reasonable to use this PDF as an informative prior.The median of this prior is around 3.5 • C, with a 90 % C.I. from 1.2 to 8.6 • C.
The other prior is a uniform prior for 1/ECS, which is equivalent to a prior for ECS that is proportional to 1/ECS 2 .This prior was discussed in Frame et al. (2005).As we stated in our previous paper (Aldrin et al., 2012), "this prior is strongly informative towards low climate sensitivities with 76 % probability for ECS being lower than the pure blackbody radiation of 1.1 K", and it is perhaps not very realistic.
We observe that the estimates when using Hegerl's prior are slightly larger than those obtained with a uniform prior with a mean value of 1.9 • C, while the credible intervals are slightly narrower (90 % C.I. from 1.1 to 3.1 • C) compared to the main analysis (Fig. 2a).When using a uniform prior for 1/ECS, the PDF is shifted considerably towards lower values, with a mean value of 1.3 • C and 90 % C.I. of 0.46 to 2.3 • C compared to 1.8 • C and 90 % C.I. of 0.9 to 3.2 • C in the main analysis.

Fig
Fig. 1.Prior and posterior distribution of the RF time series and PDF of RF in 2010 for total RF (upper panel), anthropogenic RF (middle panel) and total aerosol effect (direct effect, cloud albedo effect, cloud lifetime effect and semi-direct effect) (lower panel) from the main analysis.Red colour for the posterior distributions and black lines and grey shadings for the prior distribution.

Fig. 2 .
Fig. 2. Posterior distributions for the ECS for different analyses.In (a) the main analysis, (b) with NorESM data to estimate n liv t , (c) sensitivity test using HadCRUT4 instead of HadCRUT3 data, (d) sensitivity test using data for OHC change below 700 m, (e) sensitivity test allowing different ECSs in each hemisphere, (f) updating the model with data only up to 2000, (g) updating the model with data only up to 2000 and using only one OHC data series and (h) sensitivity test without the long-term internal variability (without the n liv t term).The estimated mean of ECS, the 90 % C.I. and the probability of ECS being larger than 4.5 • C are given in the text box of each panel.The 90 % C.I. (the error bar) and estimated posterior mean (triangle) and median (black dot) are also indicated in each panel.
6 • C compared to 3.18 • C in the main analysis.Furthermore, we have updated the model sequentially with 2 yr of additional data between 2000 and 2010.Figure5shows the sequence of estimated ECS with 90 % C.I. and the relative uncertainty R90.The R90 values decrease steadily, except from 2002 to 2004, as we add more data, showing the value of a longer time series in constraining the ECS estimate.The ECS estimate itself, however, is shifted slightly towards higher values when the

Fig. 3 .
Fig. 3. Observed and fitted (posterior mean) values for the temperature series and the ocean heat content for the main analysis.The shaded areas show the 90 % C.I. for the sum of the two first terms (m t (x 1750:t , ECS, θ) + β 1 e t ) on the right side of Eq. (1).

Fig. 4 .
Fig. 4.Posterior estimates of the long-term internal variability term (n liv t , left column), the ENSO term (β 1 e t , middle column) and the model errors (n m t , right column) for the temperature and ocean heat content.

Fig. 5 .
Fig. 5.Posterior means (triangles), medians (dots), modes (crosses), and 90 % credible intervals for estimates of ECS using various data sets updated between 2000 and 2010 (2 yr intervals).The relative uncertainty measure R90, defined as the width of the 90 % C.I. divided by the posterior mean, is also shown.

Fig
Fig. D1.(a) Prior and posterior distribution of the RF time series and PDF of RF in 2010 for the Northern Hemisphere for total RF (upper panel), anthropogenic RF (middle panel) and total aerosol effect (direct effect, cloud albedo effect, cloud lifetime effect and semi-direct effect) (lower panel) from the main analysis.Red colour for the posterior distribution and black lines and grey shadings for the prior distribution; (b) as (a), but for the Southern Hemisphere.
Figure F1.Pairs plots of samples from the posterior distribution for ECS, total RF in 2010 and 4 total aerosol effect in 2010.In (a) the model is updated with data up to 2010, while in (b) the 5 model is updated with data up to 2000.6 7 8

Fig. F2 .
Fig. F2.Pair plots of samples from the posterior distribution for the model parameters ECS and θ = (θ VHD , θ P , θ UV , θ ASHE , θ OIHE , θ M ).In (a) the model is updated with data up to 2010, while in (b) the model is updated with data up to 2000.

Table 1 .
The RF mechanisms included with information on the prior distribution assumed and the prior mean value and the 90 % confidence interval in the year 2010.The RF values are relative to 1750.See alsoSect.2.3.bTheRF is the average2001-2010.cTheRF is the average over the last 11 yr compared to the average of 11 yr around 1750. a

Table A1 .
Priors of ECS and the other model parameters θ. θ VHD = H θ UV , where H is the scale depth.Range of H : 400-1000 m.H is uniform, θ VHD is not. *

Table A2 .
Parameter estimates (posterior means) for the VAR(3) process in the main analysis.

Table A3 .
Parameter estimates with 95 % C.I.In the table ϕ m and ϕ o are the diagonal coefficient matrices of the VAR(1) processes for the model and observational errors, respectively, and σ m /C m and σ o /C o are the standard deviations/correlation matrices of the covariance matrix m and o of the error terms of the VAR(1) processes for the model and observational errors, respectively, i.e. m = diag(σ m ) C m diag(σ m ) and o = diag(σ o ) C o diag(σ o ).For each analysis there are at least 140 million iterations after burn-in in the MCMC estimation algorithm.

Prior and posterior distributions for RF time series
Skeie et al. (2011b)f the RF mechanisms is constructed by first constructing an expected or best guess time series for each hemisphere.These expectation curves are taken from the results inSkeie et al. (2011b).Then the uncertainties around the expectation curves are constructed by adding or multiplying each number in the expectation by an error term, which gives a pair of NH and SH time series.The error terms are either the same for all time points and both hemispheres, or they are proportional to the number of years since 1750 times the expected value in 2010.Four kinds of priors are assumed for the different RF mechanisms (Table1): (1) normal distribution where the uncertainty/standard deviation is proportional to the expected value.(2) Normal distribution where the uncertainty/standard deviation is proportional to time.(3) Lognormal distribution where the uncertainty/standard deviation is proportional to the expected value.(4) Uniform distribution where the uncertainty/standard deviation is proportional to the values of a time series.