Statistical estimation of global surface temperature response to forcing under the assumption of temporal scaling

forcing under the assumption of temporal scaling Eirik Myrvoll-Nilsen1, Sigrunn Holbek Sørbye1, Hege-Beate Fredriksen1, Håvard Rue2, and Martin Rypdal1 1Department of Mathematics and Statistics, UiT The Arctic University of Norway, N-9037 Tromsø, Norway 2CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Correspondence: Eirik Myrvoll-Nilsen (eirik.myrvoll-nilsen@uit.no)

forcing F , mathematically expressed as 25 where σdB(t) represents a white-noise forcing that gives rise to internal climate variability, and G is the response function, or Green's function, characterizing the relation between forcing and the temperature anomaly. A model of this form can arise from the simplest energy-balance model, i.e. the equations and ∆Q = C∆T , where ∆Q is the change in the system's heat content corresponding to a temperature change ∆T , and C is a 30 heat capacity (Rypdal, 2012). If a white-noise forcing term is included on the right-hand side of Eq.
(2) it becomes a stochastic differential equation with a stationary solution on the form of Eq.
(1), with G(t) = C −1 e −t/τ and τ = C/λ. The process has a natural decomposition into the response to the known forcing, and a stochastic term which for this particular model is an Ornstein-Uhlenbeck process. Rypdal and Rypdal (2014) show how the parameters of the two terms can be estimated simultaneously from time series of forcing and the GMST using the maximum likelihood (ML) method. They also demonstrate that the resulting process is inconsistent with observations. The stochastic term X(t) does not exhibit the strong positive decadal-scale serial correlations that is observed in the GMST in the instrumental era, 40 and secondly, the model's response to reconstructed forcing for the last millennium does not show sufficient low-frequency variability compared to Northern-hemisphere temperature reconstructions.
The inconsistency of the simple energy-balance model is due to the slow climate response associated with the energy exchange with the deep ocean. One can easily incorporate this effect within the framework of Eq. (1), by generalizing the zero-dimensional (one-box) model to a two-box model that includes a layer representing the deep ocean (Geoffroy et al., 2013;45 Held et al., 2010;Caldeira and Myhrvold, 2013), or the more general m-box model discussed by Fredriksen and Rypdal (2017).
The generalization from the zero-dimensional (one-box) energy balance model to the two-box model, or the m-box models, means that the number of free parameters increases. Concerning statistical inference, this could be problematic due to potential over-fitting. Mathematically, the generalization of Eq.
(2) is on the form 50 where the diagonal elements of C are the heat capacities of each box, and the matrix K contains coefficients describing heat exchange between boxes and the feedback parameter λ.
The system in Eq. (5) is solved by bringing the matrix C −1 K to diagonal form, and the surface temperature anomaly can be written as in Eq. (1) where G(t) is the weighted sum of m exponential functions (Fredriksen and Rypdal, 2017): 55 The characteristic time scales τ k = −1/µ k are defined from the eigenvalues µ k of C −1 K and w k denotes the weight of the kth exponential function.
On the other hand, global temperature variability exhibits a scaling symmetry. For instance, both the forced and the unforced global temperature variability have power spectral densities (PSDs) that are approximate power laws, for frequencies corresponding to time scales ranging from months to centuries Rybski et al., 2006;Lovejoy and Schertzer, 2013;Huybers and Curry, 2005;Franzke, 2010;Fredriksen and Rypdal, 2016). The global temperature fluctuations are consistent with a fractional Gaussian noise (fGn), which can formally be defined by the integral analogous to Eq. (4), but with the exponential response function replaced with a scale-invariant response function Here, γ is a scale parameter with the dimension of time, and ξ is a variable needed in order for G(t) to have the correct physical dimensions. The scaling exponent β (defined from the PSD in Eq. (7)) relates to the so-called Hurst exponent of the fGn via the formula β = 2H − 1. Based on this Rypdal and Rypdal (2014) proposed a fractional linear response model in the form of Eq.
(1), in which the parsimonious expression in Eq. (8) replaces the linear combination of exponential functions in Eq. (6). The cost of the reduction in model complexity is that the fractional linear response model does not conserve energy, and in general, 70 we cannot write the model as a system of differential equations as in Eq. (5). But on time scales up to approximately 10 3 years, the model provides an accurate description of both forced and unforced surface temperature fluctuations (Rypdal and Rypdal, 2014;Rypdal et al., 2015), and the millennial-scale climate sensitivity in the estimated fractional linear response model correlates strongly with ECS over the ensemble of models in the Coupled Model Intercomparison Project Phase 5 (CMIP5) (Rypdal et al., 2018a).

75
Temporal scale invariance in global temperature fluctuations is an empirical observation and we cannot deduce the parameters in the fractional linear response model from physical principles. This paper presents a statistical methodology that makes it possible to fit the model to observational data and estimate all model parameters. Parameter estimation is done within a Bayesian framework making use of the methodology of integrated nested Laplace approximation (INLA) for latent Gaussian models introduced in Rue et al. (2009). Barboza et al. (2019) use this framework to investigate model formulations and forcing 80 components in paleoclimate reconstructions.
The INLA-methodology and inference for our statistical model, assuming the scale-invariant response function in Eq. (8), is described further in Section 2. This section explains how to compute the marginal posterior distributions of the model parameters. As the model has a non-standard form, this includes certain modifications of the INLA-methodology to ensure computational efficiency. We discuss applications in Section 3.

85
In Section 3.1 we fit the model to the temperature and forcing data set generated by the GISS-E2-R (Schmidt et al., 2014) ESM. Here we show how to extract the GMST response to the known forcing using a Monte Carlo sampling approach. In Section 3.2, the model is used for temperature forecasting where the representative concentration pathway (RCP) trajectories describe the future CO2 forcing. Section 3.3 describes how the transient climate response (TCR) can be estimated using our model. We obtain estimates for 19 temperature series and their associated adjusted forcing series. 90 We compare the resulting estimates of TCRs with the TCRs obtained directly from the respective ESMs, and with TCR estimates from historical HadCRUT4 temperature data set using different forcing data. The applications are incorporated in the R-package INLA.climate. This package also includes the option of using the exponential response functions as defined by Eq.
(3) and Eq. (4) for the one-box model and Eq. (6) for m-box models. A discussion and final conclusions are given in Section 4.

95
2 Discrete-time modeling and statistical inference Rypdal and Rypdal (2014) use an ML estimator to estimate the model parameters from the observational yearly time series of ∆T = (∆T 1 , . . . , ∆T n ) of GMST, and the corresponding vector of radiative forcing F = (F 1 , . . . , F n ). Here, we estimate parameters by adopting a Bayesian framework, making use of the INLA-methodology (Rue et al., 2009(Rue et al., , 2017. This approach implies that parameters are treated as stochastic variables and assigned prior distributions. The information given by the priors 100 is then combined with the likelihood of the observations and updated to give posterior distributions using Bayes' theorem.
In a discrete-time model, we assume that ∆T t has a Gaussian distribution with a random mean expressed by the linear where σ f = γ −β/2+1 while F 0 denotes a shift parameter which gives the initial forcing value. G ts denotes a discretely indexed 105 element of the function, Further, the vector ε = (ε 1 , . . . , ε n ) denotes a zero-mean fGn process, implying that the covariance between ε t and ε s is The covariance matrix of the predictor is Σ = Σ(H, σ ) with the elements in Eq. (11). Notice that the matrix G(H) is lower triangular with elements given by Eq. (10). The given formulation implies that the vector µ represents the GMST response to 115 the known forcing F while ε is the GMST response to the random forcing, i.e. the unforced climate variability.
The statistical regression formulation in Eq. (9) has a hierarchical structure in which the expected temperature anomalies are modelled in terms of the random predictor η with elements specified by Eq. (9). The predictor depends on additional model parameters θ = (H, σ ε , σ f , F 0 ). This set-up implies that we need to assign priors, both to the predictor and to the model parameters. By assigning a Gaussian prior to η, the resulting model becomes a latent Gaussian model, which can be 120 analyzed using the INLA-methodology. In general, this class of models introduces a latent Gaussian field x, which contains all the random components of a linear predictor, including the predictor itself. In our case, the latent field is equal to the linear predictor, x = η = µ + ε. However, inference for this model is not straightforward as the model parameter H appears in both of the terms µ and ε. We choose to circumvent this problem by considering the sum µ+ε as a single model component, i.e., as a fractional Gaussian noise process with mean vector µ and covariance matrix Σ. The dependence between two components 125 implies that we will not get separate posterior estimates for µ and ε, directly.
Using p(·) as a generic notation for probability density functions, we can summarize the three-stage hierarchical structure of latent Gaussian models, including distributional assumptions, as follows: -The first stage specifies the likelihood of the model. The observed temperature anomaly ∆T t is assigned a Gaussian distribution with negligible fixed variance and mean η t . The observations are assumed to be conditionally independent 130 given the latent field x and parameters θ, i.e.
-The second stage specifies the prior distribution for the latent field. Given the parameters θ, the latent field x is assigned a Gaussian prior distribution with mean vector µ -The third stage specifies independent priors for the parameters: The shift parameter F 0 is assigned a zero-mean Gaussian prior, while the other parameters are assigned penalised complexity (PC) priors (Simpson et al., 2017). The class of PC priors represents a recently developed framework to compute 140 priors based on specific principles, including support to Occam's razor. The PC prior of the two scaling parameters σ f and σ can be computed to equal the exponential distribution while the PC prior of H is computed numerically (Sørbye and Rue, 2018).
The joint posterior for all components of the latent field and all of the model parameters is then summarized by Our main objective is to estimate the marginal posterior distribution for all components of the latent field and the marginal posteriors for all the model parameters Here, the notation θ −j is used to denote the vector θ excluding the jth parameter. The posterior distributions provide a complete 150 description of the latent field components and the parameters in our model. From the marginals in Eq. (14)-Eq. (15) we can extract summary statistics such as the mean, variance, quantiles and credible intervals.
Traditionally, marginal posterior distributions have been approximated using Markov chain Monte Carlo methods (Robert and Casella, 1999). Such methods are simulation-based and can potentially be very time-consuming for hierarchical models.
The INLA-methodology represents a computationally superior, but still accurate, alternative and is available using the R-  15) is approximated by employing a Laplace approximation evaluated at the mode x * (θ): This approximation is usually very accurate as we know that p(x | θ) is already Gaussian. The marginal for each model parameter is then obtained by assuming a normal distribution modified to allow for skewness, The scaling parameters σ j+ and σ j− are found using the approximate joint posterior distribution of Eq. (16), see Martins et al. (2013) for details. To compute Eq. (14), the Laplace approximation in Eq. (16) is combined with a simplified and computationally faster version of the Laplace approximation of p(x t | θ, ∆T). Finally, the integrand of Eq. (14) is evaluated for values of θ in a grid efficiently covering the parameter space for θ, see Rue et al. (2009) and Rue et al. (2017) for details.

170
A key assumption for the numerical approximations to be computationally efficient is that the latent Gaussian field x has Markov properties, i.e. x needs to be a Gaussian Markov random field having a sparse precision matrix Q (Rue and Held, 2005). This is not the case for fGn as the long-range dependency structure of this process gives a dense precision matrix. We resolve this problem by approximating ε as a weighted sum of m independent first-order autoregressive (AR(1)) processes, i.e.
To capture the correlation structure betweenε and each of the AR(1) processesx i , the latent field must be extended to also include the underlying AR(1) processes, i.e. x = (η, µ +ε,x 1 , . . . ,x m ). The weights {w i } m i=1 and the first-lag autocorrelation coefficients of the AR(1) processes are selected such that the resulting autocorrelation function ofε best approximates that of fGn. In addition to ensuring computational efficiency, this approximation also proves to be remarkably accurate. For further 180 details about this approximation, see Sørbye et al. (2019) who also provide a discussion from a statistical perspective. For a physical interpretation of this approximation we refer to Fredriksen and Rypdal (2017).
Currently, there are no built-in model components in R-INLA which suit our specifications. This means that we have to construct one manually using rgeneric, a modeling tool that allows generic model components to be defined for INLA. To make this accessible to applied scientists we have developed an R-package called INLA.climate which includes functions that 185 take care of the technical part of the fitting procedure and presents important information and summary statistics in a readable format. This package contains a versatile and user-friendly interface to fit the model in Eq. (9) and includes functions to replicate all results presented in this paper. The package is available at the GitHub repository www.github.com/eirikmn/INLA.climate.
Detailed description of the package and its features is available in its accompanying documentation. In this example, we illustrate how we can approximate the marginal posterior distribution for the temperature response For comparison, we apply the same approach to estimate the given marginal posterior distributions under the assumption of an exponential response function in Eq. (3). In this case, the discretized unforced response described in Eq. (4) is an AR (1) process. More generally, the discretized response functions corresponding to Eq. (6) are a weighted sum of m AR(1) processes.
Notice that a mixture of a few AR(1) processes will in general have short-range dependence properties, while the approximation However, we observe significantly wider credible intervals for the model using the single exponential response function. The larger uncertainty suggests that a smaller portion of the variance is explained by the unforced climate variability, leaving more 210 of the variation to be explained by the response to the known forcing. Using the INLA.climate package, we obtain full inference in seconds on a personal computer. The code to run the example is as follows: data("GISS_E2_R") y = GISS_E2_R$Temperature z = GISS_E2_R$Forcing 215 r.scaling = inla.climate(y,z,compute.mu="full") r.exponential = inla.climate(y,z,compute.mu="full",m=1,model="ar1") r.2exponential = inla.climate(y,z,compute.mu="full",m=2,model="ar1")

Temperature predictions for Representative Concentration Pathway trajectories
Once trained on historical temperature and forcing data, the response model can easily be used to obtain temperature predictions  In Fig. 7 we compare the scale-invariant model's temperature projections for RCP scenarios to ESM temperature projections.
In this analysis, we tune the statistical model to historical runs of different EMSs in the CMIP5 ensemble. As in the other 240 analyses, we use model-specific forcing for each of the ESM. Using the same method as for historical forcing, we also estimate model-specific RCP forcing for each of the ESMs, and each RCP scenario. From this RCP forcing, and using the estimated parameters, we project GMST under the RCP scenarios and compare with the corresponding projections in the ESMs. Fig. 7.
shows the differences between the two types of projections. As seen from the figure, there is a negative trend for almost all models and all scenarios, indicating that the predictions made using the statistical model slightly overestimate the temperature 245 increase in the ESMs. This overestimation is not a statistical bias. Instead, it shows that scale invariance is too crude an approximation for several ESMs. One can interpret the differences as a weakness of the scale-free model. However, not all climate models have scaling properties consistent with temperature reconstructions, and it could also reflect the weaknesses of the ESMs.  forcing according to the RCP2.6, RCP4.5, RCP6 and RCP8.5 trajectories, respectively, using a mixture of two exponential response functions.
These are compared to the AR5 projections (black boxes).

Estimating the transient climate response 250
As a final application, we describe how our suggested method using a scale-invariant response function can be used to estimate the TCR. The TCR is defined as the average temperature response between 60 and 80 years following a gradual CO 2 -doubling, assuming a 1 % annual increase. In this scenario, the forcing increases linearly according to f (s) = Q 2×CO2 70 yrs (s + F 0 ), for s = 1, ..., 80 yrs.
Here, Q 2×CO2 is a model-specific coefficient describing the forcing corresponding to a CO 2 doubling. We obtain these coeffi- As in Section 3.1, the approximate marginal posterior distribution for TCR is obtained by combining INLA with Monte Carlo sampling. We first generate samples from the joint posterior distribution of the model parameters p(θ | ∆T). For each of these samples, we calculate TCR, which then gives the approximate posterior distribution for TCR.
For our analyses, we use temperature data sets generated from 19 ESMs in the Coupled Model Intercomparison Project Phase 265 5 (CMIP5) ensemble, see Table 2. We obtain the forcing by combining the forcing data from Forster et al. (2013) andHansen et al. (2010) such that the 18-yr moving averages of the two are equal. We use the instrumental HadCRUT dataset (Morice et al., 2012), which combines the land temperatures of the CRU dataset  with the sea surface temperatures of HadSST3 (Kennedy et al., 2011).
To assess the accuracy of the TCR estimations from Eq. (12) we compare the estimates from each of the 19 ESMs with the 270 TCR obtained from the ESMs directly (Forster et al., 2013). Inference is obtained by producing one hundred thousand Monte Carlo simulations of the TCR. Summary statistics for our model are shown in Tables 3-4, which include the marginal posterior means and 95 % credible intervals for the TCR and the model parameters used to compute it.
To assess the approach using the scale-invariant versus the exponential response functions, we compare the posterior mean estimates with the values obtained directly from the ESMs. Specifically, we calculate the average absolute value of the bias, the 275 root mean square error (rmse) and the correlation between the posterior mean estimates and the TCR-values from the ESMs, see Table 1. We observe that the method using the single exponential response model clearly performs worse than the other two approaches, having higher error and lower correlation with the TCR-estimates from the ESMs. The two-exponential response function and the scale-invariant approach perform almost similar, the latter approach showing slightly higher correlation.
These two methods have approximately the same average absolute value of the bias, but the two-exponential approach slightly underestimates TCR while the scale-invariant approach slightly overestimates TCR. This is illustrated by the scatter plots in Fig. 8. Using INLA.climate we obtained, for a typical analysis, inference in around 13 seconds using a scale-invariant response and 35 seconds using the exponential response functions.
To obtain estimates for the TCR of the HadCRUT4 data set we use the 19 different forcing data associated with the ESMs enlisted in Table: 2 as well as the Hansen radiative forcing which we will assign ID number 0. The Monte Carlo simulations are 285 carried out separately for each forcing data set, using one hundred thousand samples and forcing slope coefficient Q 2×CO2 = 3.8 W m −2 (IPCC, 2013a). This is again performed using both the scale-invariant and the exponential response functions. For the scale-invariant response, the posterior means and credibility intervals of the TCR and the parameters used to compute the TCR for each ESM are shown in Tables 5-6. The marginal posterior mean estimates and 95 % credible intervals for the TCR using all approaches are illustrated in Fig. 9 where we observe wider credible intervals when using a single exponential response 290 function. We obtain an estimated posterior distribution for the TCR across all models by aggregating all TCR samples obtained from each analysis, totaling two million simulations. The posterior density is obtained from the Monte Carlo samples using the density function in R. The resulting density function is depicted in Fig. 10, where it is compared with a histogram describing the TCRs obtained directly from the ESMs. We observe a mean of 1.43 K, 1.35 K and 1.61 K, and standard deviation of 0.19 K, 0.40 K and 0.40 K when using a scale-invariant, a single-exponential and a two-exponential response function, respectively. This paper presents a Bayesian formulation to analyse a linear temperature response model to radiative forcing, incorporating long-range temporal dependence using a scale-invariant response function. Computational efficiency is achieved by incorporating the model within the R-INLA framework and adopting the approximation introduced in Sørbye et al. (2019). The benefits of this methodology are three-fold. First, the model is both accessible and adaptable to more advanced models that require 305 more trends and effects. Second, the approximations ensure low costs in both computational complexity and memory, even for long time series. Third, the method yields full Bayesian inference, giving a full description of the behavior of the time variables and model parameters.
In addition to providing parameter estimates, the model has been used to produce temperature predictions as responses to the four RCP forcing trajectories used to describe future radiative forcing. For comparison, we have also included prediction 310 results using the one-box and two-box energy balance models giving single exponential or a mixture of two exponential response functions, respectively. We observe that the exponentialresponse models underestimate the predicted temperature compared to the projections made by the IPCC. On the other hand, the scale-invariant response models tend to overestimate future temperatures but are overall more accurate than using the exponential response functions.
We further demonstrate that the model can be used to estimate the transient climate response in instrumental data. Our best 315 estimate is that the TCR is 1.43 K with a standard deviation of 0.29 K. This estimate falls right in the middle of the range put forward in the IPCC report (0.8 − 2.5 K) and the accuracy is consistent with the TCR obtained directly from the ESMs. The presented model has also been seen to give coherent estimates for the equilibrium climate sensitivity, compared with running ESMs (Rypdal et al., 2018a).
Accurate linear response models for global temperature are essential alternatives to ESMs in studies where one needs to 320 explore a large number of emission scenarios, and the modeling framework presented here can easily be included in integrated assessment models. Moreover, since the models are invertible, they can efficiently compute forcing scenarios corresponding to given future scenarios for global temperatures. Hence, in combination with linear models for the CO 2 response to emissions, they can be used to obtain observation-based estimates of the remaining carbon budget in scenarios where we reach the goals of the Paris agreement.

325
In combination with dedicated ESM experiments, the methods presented in this paper can also be used to estimate global and regional climate sensitivity as a function of background state and time scale. One can use such estimates to study the effect of non-linear responses across time scales and to obtain insight into how sensitivities and fluctuations change in the vicinity of climate tipping points.
Code and data availability. The code and data sets used for this paper is available through the R-package, INLA.climate, which can be   1 GISS-E2-R Schmidt et al., 2014Schmidt et al., ) [1850Schmidt et al., ,2015