On scales of

The key is to exploit the scaling of the dynamics and the large stochastic memories that we quantify. Since macroweather temporal (but not spatial) intermittency is low, we propose using the simplest model based on fractional Gaussian noise (fGn): the ScaLIng Macroweather Model (SLIMM). SLIMM is based on a stochastic ordinary differential equation, differing from usual linear stochastic models (such as the linear inverse modelling – LIM) in that it is of fractional rather than integer order. Whereas LIM implicitly assumes that there is no low-frequency memory, SLIMM has a huge memory that can be exploited. Although the basic mathematical forecast problem for fGn has been solved, we approach the problem in an original manner, notably using the method of innovations to obtain simpler results on forecast skill and on the size of the effective system memory.

A key to successful stochastic forecasts of natural macroweather variability is to first remove the low-frequency anthropogenic component. A previous attempt to use fGn for forecasts had disappointing results because this was not done. We validate our theory using hindcasts of global and Northern Hemisphere temperatures at monthly and annual resolutions. Several nondimensional measures of forecast skill – with no adjustable parameters – show excellent agreement with hindcasts, and these show some skill even on decadal scales. We also compare our forecast errors with those of several GCM experiments (with and without initialization) and with other stochastic forecasts, showing that even this simplest two parameter SLIMM is somewhat superior. In future, using a space–time (regionalized) generalization of SLIMM, we expect to be able to exploit the system memory more extensively and obtain even more realistic forecasts.

Due to their sensitive dependence on initial conditions, the classical
deterministic prediction limit of GCMs (general circulation models) is about
10 days – the lifetime of planetary-sized structures (

For these longer scales, following Hasselmann (1976), the high-frequency
weather can be considered as a noise driving an effectively stochastic
low-frequency system; the separation of scales needed to justify such
modelling is provided by the drastic transitions at

Independent of its origin, the transition justifies the idea that the weather is essentially a high-frequency noise driving a lower-frequency climate system, and the idea is exploited in GCMs with long integrations as well as in Hasselmann-type stochastic modelling, now often referred to as linear inverse modelling (LIM; sometimes also called the “stochastic linear forcing” paradigm), e.g. Penland and Sardeshmuhk (1995), Newman et al. (2003), Sardeshmukh and Sura (2009); analogous modelling is also possible on much longer timescales using energy balance models. For a review, see Dijkstra (2013); for a somewhat different Hasselmann-inspired approach, see Livina et al. (2013).

In these phenomenological models, the system is regarded as a multivariate
Ohrenstein–Uhlenbeck (OU) process. The basic LIM paradigm is based on the
stochastic differential equation

Fourier transforming Eq. (1) and using the rule
F. T.

The basic problem with the LIM approach is that although we are interested in
the low-frequency behaviour, for LIM models it is simply white noise and this
has no memory (put d/d

While the difference in the value of

ENSEMBLES experiment, LIM and SLIMM hindcasts for global annual temperatures for horizons of 1 to
9 years. The light lines are from individual members of the ENSEMBLE
experiment; the heavy line is the multimodel ensemble adapted from Fig. 4 in
García-Serrano and Doblas-Reyes (2012). This shows the RMSE comparisons
for the global mean surface temperatures compared to NCEP/NCAR (2 m air
temperatures). Horizontal reference lines indicate the standard deviations of

We have discussed the phenomenological linear stochastic models introduced in
atmospheric science by Hasselmann and others from 1976 onwards. Yet there is
an older tradition of stochastic atmospheric modelling that can be traced
back to the 1960s: stochastic cascade models for turbulent intermittency
(Novikov and Stewart, 1964; Yaglom, 1966; Mandelbrot, 1974; Schertzer and
Lovejoy, 1987). Significantly, these models are nonlinear rather than linear,
and the nonlinearity plays a fundamental role in their ability to
realistically model intermittency. By the early 1980s it was realized that
these multiplicative cascades were the generic multifractal processes, and
they were expected to be generally relevant in high-dimensional nonlinear
dynamical systems that were scale invariant over some range. By 2010, there
was a considerable body of work showing that atmospheric cascades were
anisotropic – notably with different scaling in the horizontal and vertical
directions (leading to anisotropic, stratified cascades) – and that this
enabled cascades to operate up to planetary sizes (see the reviews Lovejoy
and Schertzer, 2010, 2013). While the driving turbulent fluxes were modelled
by pure cascades, the observables (temperature, wind, etc.) were modelled by
fractional integrals of the latter (see below): the Fractionally Integrated
Flux (FIF) model. The analysis of in situ (aircraft, dropsonde) and remotely
sensed data, reanalyses as well as weather forecasting models showed that at
least up to 5000 km, the cascade processes were remarkably accurate, with
statistics (up to second order) typically showing deviations of less than

The success of the cascade model up to planetary scales (

Although this (temporally) extended space–time cascade model well reproduces the
basic space–time weather statistics (for scales

To summarize, there are three key empirically observed macroweather characteristics that models should respect: low temporal intermittency, high spatial intermittency and statistical space–time factorization. According to the analysis in Lovejoy and de Lima (2015), the CEFIF (Climate EFIF model) approximately satisfies these properties but has some disadvantages. A practical difficulty is that it – much like GCMs – requires the explicit modelling of fine temporal (weather-scale) resolution. This is computationally wasteful since for macroweather modelling, the high frequencies are subsequently averaged out in order to model the lower-frequency macroweather. An arguably more significant disadvantage is that CEFIFs theoretical properties – including its predictability – are nontrivial and are largely unknown.

SLIMM is an attempt to directly model space–time macroweather while
respecting the factorization property and while using the comparatively
simple, nonintermittent scaling process – fractional Gaussian noise (fGn) –
to reproduce the low-intermittency temporal behaviour. In the temporal
domain, it is thus based on a linear stochastic model (fGn) with reasonably
well-understood predictability properties and predictability limits. The
strong spatial macroweather variability can be modelled either by using
multifractal spatial variability (representing very low-frequency climate
processes), or alternatively – in the spirit of LIM modelling – it can be
modelled as a system of (fractional-order) ordinary differential equations.
In the former case, developed in Lovejoy and de Lima (2015), it turns out to
be sufficient to take the product of a spatially nonlinear (multifractal)
stochastic model, with a space–time fGn process. The result is a model that
is well defined at arbitrary spatial resolutions and with temporal scaling
exponents that are the same at every spatial location (this restriction is
somewhat unrealistic). In the latter LIM-like case, one fixes the grid scale
(the spatial resolution) and then treats each grid point as a component of an

In this paper, we concentrate on the simplest scalar SLIMM, and we illustrate
this by hindcasting global-scale temperature series. The key change to the
LIM model is thus a modification of the low-frequency scaling: rather than

Alternatively, Eq. (4) can be solved in real space directly. First, operate
on both sides of the above by
(

If we are only interested in frequencies lower than

Formally, the solution to Eq. (8) with

While below we use simple averaging to obtain small-scale convergence of fGn,
for many purposes, the details of the smoothing at resolution

Fractional Brownian motion has received far more attention than fractional Gaussian noise, and it is possible to deduce the properties of fGn from fBm. However, since we are exclusively interested in fGn, it is more straightforward to first define fGn and then – if needed – define fBm from its integral.

The canonical fractional Gaussian noise process

It is more common to treat fBm whose differential d

A comment on the parameter

Some useful relations are

The relationship Eq. (23) can be used to obtain several useful relations for
a finite resolution fGn. For example,

Since

Since fGn is stationary, its spectrum is given by the Fourier transform of
the autocorrelation function. The autocorrelation is symmetric
(

The spectrum is one way of characterizing the variability as a function of
scale (frequency); however, it is often important to have real space
characterizations. These are useful not only for understanding the effects of
changing resolution, but also on a given timescale

An anomaly is the average deviation from the long-term average, and since

The classical fluctuation is simply the difference (the “poor man's wavelet”):

As pointed out in Lovejoy and Schertzer (2012a), the preceding fluctuations
only have variances proportional to

Using the definition (Eq. 11) of fGn, we can define the temperature as

We can therefore define the resolution

Using Eq. (35), the

Since an fGn process at resolution

The standard approach that they followed yields nontrivial integral equations
(which they solved) in both the finite- and infinite-data cases. In what
follows, we use a more straightforward method – the general method of
innovations (see, e.g., Papoulis, 1965, ch. 13) – and we obtain relatively
simple results for the case with infinite past data (which is equivalent to
the corresponding Gripenberg and Norros (1996) result). In a future
publication we improve on this by adapting it to the finite-data case. The
main new aspect of the forecasting problem with only finite data is that it
turns out that not only do the most recent values (close to

We now derive the forecast result for resolution

Using Eqs. (43) and (44), the error variance is

This definition of skill is slightly different from the root mean square
skill score (RMSSS) that is sometimes used to evaluate GCMs (see,
e.g., Doblas-Reyes et al., 2013). The RMSSS is defined as 1 minus the ratio
of the RMS error of the ensemble-mean prediction divided by the RMS
temperature variation:

If the process scales over an infinite range in the data but we only have
access to the innovations over a duration

In the real world, after the removal of the anthropogenic component (see
Lovejoy and Schertzer (2013) and Fig. 4c), the scaling regime has a finite
length (estimated as

It is instructive to compare the skill obtained with the full memory with the
skill obtained if only the most recent variable

In persistence,

In order to test the method, we chose the NASA GISS Northern Hemisphere and
global temperature anomaly data sets, both at monthly and at annually
averaged resolutions. A significant issue in the development of such global
scale series is the treatment of the air temperature over the oceans, which
is estimated from sea surface temperatures; NASA provides two sets, the
Land–Ocean Temperature Index (LOTI) and Land-Surface Air Temperature
Anomalies only (Meteorological Station Data; d

The prediction formulae assume that the series has the power law dependencies
indicated above, with RMS anomaly or Haar fluctuations following

Therefore, as a first step, using the Frank et al. (2010) data (extended to
2013 as described in Lovejoy, 2014a), we removed the anthropogenic
contribution, using

From Table 2 we see that the sensitivities do not depend on the exact range
over which they are estimated (columns 2–4). As we move to the present
(column 4 to column 2), the sensitivities stay within the uncertainty range
of the earlier estimates, with the uncertainties constantly diminishing,
consistent with the convergence of the sensitivities as the record lengthens.
As a consequence, if we determine

An obvious criticism of the method of effective climate sensitivities is that anthropogenic forcing primarily warms the oceans and, only with some lag, the atmosphere. Systematic cross-correlation analysis in Lovejoy (2014a, b) shows that while the residues are barely affected (see rows 2 and 3 in Table 2 and Lovejoy (2014b) for more on this), the values of the sensitivities are affected (see, e.g., column 4 in Table 2). We may note that using Eq. (52) (no lag), or the same relation but with a lag, are equivalent to assuming a linear climate with Green's function given by a Dirac delta function. This and more sophisticated (power law) Green's functions will be discussed in a future publication.

Finally, we can note that the difference between LOTI and d

In order to judge how close the residues from the CO

A comparison of root mean square (RMS) variances (data residues) and hindcast errors (from deterministic and stochastic models) of global-scale, annual temperatures. See also Fig. 2. Note that the GCM hindcasts are all “optimistic” in the sense that they use the observed volcanic and solar forcings, and these would not be available for a true forecast. In comparison, the stochastic models forecast the responses to these (unknown) future forcings.

The climate sensitivities estimated by linear regression of

As further evidence that residues provide a good estimate of the true natural
variability, in rows 5–10 we also show the annual RMS errors of various GCM
global temperature hindcasts. For example, in rows 5–6 we compare hindcasts
of CMIP 3 (Coupled Model Intercomparison Project, phase
3) GCMs, both with and without annual data
initialization (rows 5 and 6).
Without initialization (row 5), the results are half way between the CO

The various standard deviations of the temperature residues
(

The hindcast standard deviations (in K) at the finest resolutions (1 month, 1 year) for natural variability temperatures obtained from the unlagged and 20-year lagged climate sensitivities. Note that the lag makes very little difference to the hindcast error variance.

Very similar results are indicated in rows 8–10 for other GCM hindcast
experiments. These are shown graphically in Fig. 2, which is adapted from a
multimodel ENSEMBLES experiment. The hindcasts are discussed in García-Serrano and
Doblas-Reyes (2012). The multimodel mean is consistently close to – but
generally a little above –

Having estimated

Also shown for reference in Fig. 4c is the GISS-E2-R millennium control run
(with fixed forcings) as well as the RMS fluctuations for three
pre-industrial multiproxies. We see that up to about 100-year scales, all
the fluctuations have nearly the same amplitudes as functions of scale, giving
support to the idea that

As a final comparison, Fig. 4d shows RMS Haar fluctuations for the global
averages (from Fig. 4c), land only averages and from the oceans – the
Pacific decadal oscillation (PDO). The PDO is the amplitude of the largest
eigenvalue of the Pacific sea surface temperature autocorrelation matrix
(i.e. the amplitude of the most important empirical orthogonal function:
EOF). For the land-only curve, notice the sharp rise for
scales

The theory for predicting fGn leads to the general equation for the variance in forecast error (

While our approach has the advantage of being straightforward (and it was
tested on numerical simulations of fGn), in future applications improvements
could be made. For example, by using a Girsanov formula, we could rewrite fGn
in terms of a finite integral (see Biagini et al., 2008), and the discretized
numerics would then be more accurate (this is especially important for

In order to obtain good hindcast error statistics, it is important to make
and validate as many hindcasts as possible, i.e. one for each discretized
time that is available. However, due to the long-range correlations, we want
to use a reasonable number of past time steps in the hindcast for memory, so
that the earliest possible hindcast will be later than the earliest available
data by the corresponding amount. The compromise used here consisted of
dividing the 134-year series into 30 annual blocks (annual resolution) and
20-year blocks (monthly resolution). In each block in the annual series, the
first 20 years were used as “memory” to develop the hindcast over the next
10 years so that for estimating the hindcast errors a total of
134

The hindcasts can be evaluated at various resolutions and forecast horizons.
Eqs. (46), (49) and (50) give the general theoretical results. The cases of
special interest are the temperature hindcasts and the anomaly hindcasts with
resolutions and horizons of (

The dimensionless ratios (

Since the anomaly errors are power laws (Eq. 54), they can be conveniently evaluated on a log-log plot (see Fig. 6). Note that the RMS anomaly errors decrease with forecast horizon. The reason is that while forecasts further and further in the future lose accuracy, this loss is more than compensated for by the decrease in the variance due to the lower resolution, so that the anomaly variance decreases. Finally, we may note that the method has been applied to explaining the “pause” or “hiatus” in the global warming since 1998 as well as to make a forecast to 2023 (Lovejoy, 2015b).

Another way to evaluate the hindcasts is to determine their nondimensional
skills, i.e. the fraction of the variance that they explain (see the general
formula Eq. 46). From the formula, we can see that the skill depends only on
the nondimensional forecast horizon

A log-log plot of the standard deviations of the anomaly hindcasts,
with the theoretical reference line corresponding to

The anomaly forecast skill as a function of forecast horizon (horizontal axis) on a log-linear plot for both series
(annual thin, monthly thick; global red, Northern Hemisphere blue). Also
shown are pairs of theoretical predictions (constant skill independent of the
forecast horizon) for various values of

The forecast skill for the temperature at fixed resolutions (one
month, bottom left; 1 year, upper right) for global (red) and Northern
Hemisphere (blue) series. Also shown are the exact theoretical curves (for

The skill in usual temperature forecasts (i.e. with fixed resolution

A final way to evaluate the hindcasts is to calculate the correlation
coefficient between the hindcast and the temperature:

As in the previous hindcast error analyses, the series were broken into
blocks and the forecasts were repeated as often as possible; each forecast
was correlated with the observed sequence and averages were performed over
all the forecasts and verifying sequences (the mean correlation shown by the
solid lines in Fig. 9). The uncertainty in the hindcast correlation
coefficients was estimated by breaking the hindcasts into thirds: three
equal sized groups of blocks with the error being given by the standard
deviation of the three about the mean (dashed lines). Also shown in Fig. 9
are the theoretical curves (Eq. 54) for

The empirical correlations of the forecast temperatures (left
column) and anomalies (right column). The same hindcasts but with
different empirical comparisons and also with comparisons with theory for

As predicted by Eq. (57), the anomaly correlations are relatively constant up to about 5 years for the annual data (top row) and nearly the same for the monthly data (bottom row). In addition, the Northern Hemisphere series (blue) are somewhat better forecast than the global series (red). It can be seen that temperature forecasts (i.e. with fixed resolutions) have statistically significant correlations for up to 8–9 years for the annual forecasts, for up to about 2 years for the monthly global and nearly 5 years for the monthly Northern Hemisphere forecasts (bottom dashed lines). The anomaly forecasts are statistically significantly correlated at all forecast horizons. Figure 9 provides more examples of nondimensional plots with no free parameters, and again the agreement with the hindcasts validation is remarkable.

Although the results for the anomaly correlations are quite close to those of hindcasts in García-Serrano and Doblas-Reyes (2012), the latter are for the entire temperature forecast, not just the natural variability as here. This means that the GCM correlations will be augmented with respect to ours due to the existence of long-term anthropogenic trends in both the data and the forecasts that are absent in ours (but even with this advantage, their correlations are not higher).

In Table 1 and Fig. 2, we have already compared GCM hindcast errors with
estimates of the natural variability (

Table 2 and Fig. 2 also compare SLIMM RMS errors to those of LIM hindcasts modelled with
20 degrees of freedom (involving

Finally, in Table 1, rows 12 and 13, we have compared the errors with those of an early attempt at scaling temperature forecasts using the autoregressive fractionally integrated moving average process (ARFIMA) (Baillie and Chung, 2002b) along with the corresponding order-1 autoregressive (AR(1)) process. Unfortunately, the forecasts were made by taking 10-year segments and, in each, removing a separate linear trend so that the low frequencies were not well accounted for (see the footnote to the table for more details). The AR(1) results were not so good as they were close to the standard deviations of the detrended temperatures. As expected – because they assume a basic scaling framework – the ARFIMA results were somewhat better. Yet they are substantially worse than those of the other methods, probably because they did not remove the anthropogenic component first.

GCMs are basically weather models whose forecast horizons are well beyond the
deterministic predictability limits, corresponding to many lifetimes of
planetary-scale structures: the macroweather regime. In this regime – that
extends from about 10 days to

In this regard, the problem with the GCM approach is that in spite of massive improvements over the last 40 years, the weather noise that they generate is not totally realistic nor does their climate coincide exactly with the real climate. In an effort to overcome these limitations, stochastic models have been developed that directly and more realistically model the noise and use real-world data to exploit the system's memory so as to force the forecasts to be more realistic.

The main approaches that could potentially overcome the GCM limitations are the stochastic ones. However, going back to Hasselmann (1976), these have only used integer-ordered differential equations. They have implicitly assumed that the low frequencies are white noises and hence cannot be forecast with any skill. Modern versions – the LIM – add sophistication and a large number of (usually, but not necessarily) spatial parameters, but they still impose a short (exponentially correlated) memory and they focus on periods of up to a few years at most. This contrasts with turbulence-based nonlinear stochastic models which assume that the system scales over wide ranges. When they are extended to the macroweather regime (the Extended Fractionally Integrated Flux – EFIF model), these scaling models have low intermittency, scaling fluctuations with temporal exponents close to those that are observed by a growing macroweather scaling literature. Contrary to their behaviour in the weather regime, in macroweather they are only weakly nonlinear. However, empirically, the spatial macroweather variability is very high so that Lovejoy and Schertzer (2013) already proposed that the EFIF model be spatially modulated by a multifractal climate process (yielding the CEFIF) whose temporal variability was at such low frequencies so as to be essentially constant in time over the macroweather regime.

The CEFIF model is complex both numerically and mathematically and its prediction properties are not known. In this paper, we therefore make a simplified model, the ScaLIng Macroweather Model (SLIMM) that can be strongly variable (intermittent) in space and Gaussian (nonintermittent) in time (see Lovejoy and de Lima (2015) for this regional SLIMM). The simplest relevant model of the temporal behaviour is thus fractional Gaussian noise (fGn), whose integral is the better-known fractional Brownian motion (fBm) process. A somewhat different way of introducing the spatial variability is to follow the LIM approach and treat each (spatial) grid point as a component of a system vector. In this case, SLIMM can be obtained as a solution of a fractional-order generalization of the usual LIM differential equations. Although in future publications we will show how to make regional SLIMM forecasts, in this paper, we only discuss the scalar version for single time series (here, global-scale temperatures).

In Sect. 2, we situate the process in the mathematical literature and derive
basic results for forecasts and forecast skill. These results show that a
remarkably high level of skill is available in the climate system; for
example, for forecast horizons of one nondimensional time unit in the future
(i.e. horizons equal to the resolution), the forecast skills – defined as
the fraction of the variance explained by the forecast – are 15, 35 and
64 % for land, the whole globe and oceans respectively (Fig. 1b; taking
rough exponent values

The SLIMM forecasts the natural variability. While the responses to solar and
volcanic forcings are implicitly included in the forecast, the responses to
the anthropogenic forcings are not; we must therefore remove the
anthropogenic component, which becomes dominant on scales of 10–30 years. For
this, we follow Lovejoy (2014b), who showed that the CO

Using the method of innovations, we developed a new way of forecasting fGn
that allows SLIMM hindcasts to be made; the long-time forecast horizon RMS
error is thus

This paper only deals with single time series (global-scale temperatures), but it is nevertheless ideal for revisiting the problem of the pause, “slow down” or hiatus in the warming since 1998, which is a global-scale phenomenon. Lovejoy (2015b) shows how SLIMM hindcasts nearly perfectly predict this hiatus. However, most applications involve predicting the natural variability on regional scales. A future publication will show how this can be done and will quantify the improvement that the additional information (from the regional memory) makes to the forecasts. As forecasts from months to a decade or so, the SLIMM forecast are potentially better than alternatives.

We thank C. Penland and P. Sardeshmuhk for helpful discussions. There are no conflicts of interest. This work was unfunded, but L. del Rio Amador thanks HydroQuebec for a scholarship. Edited by: H. A. Dijkstra