We show how factorial regression can be used to analyse numerical model experiments, testing the effect of different model settings. We analysed results from a coupled atmosphere–ocean model to explore how the different choices in the experimental set-up influence the seasonal predictions. These choices included a representation of the sea ice and the height of top of the atmosphere, and the results suggested that the simulated monthly mean air temperatures poleward of the mid-latitudes were highly sensitivity to the specification of the top of the atmosphere, interpreted as the presence or absence of a stratosphere. The seasonal forecasts for the mid-latitudes to high latitudes were also sensitive to whether the model set-up included a dynamic or non-dynamic sea-ice representation, although this effect was somewhat less important than the role of the stratosphere. The air temperature in the tropics was insensitive to these choices.

The question of whether seasonal forecasting has useful skill is getting increasingly relevant with the progress in climate modelling. Another question is how we can learn more about such skills, and one strategy is to examine the models used in seasonal forecasting. These include state-of-the-art coupled atmosphere–ocean–land-surface models, built on our knowledge of physical processes and formulated in terms of computer code (Palmer and Anderson, 1994; Stockdale et al., 1998; Palmer, 2004; George and Sutton, 2006). They can be used for seasonal forecasting if a correct initial state is provided, and from which the subsequent evolution can be simulated. Their skill depends on several factors, such as the quality of the initial states, the representation of all relevant processes, and whether the seasons ahead truly are predictable in the presence of non-linear chaos (Palmer, 1996). Thus, in order to address the initial question of useful skill for seasonal predictions, we need to understand what is important and what is irrelevant for the outcome of the predictions, which includes choices about the model set-up. Here we look at seasonal forecast results for the air temperature. We know that the atmosphere in the high latitudes is subject to non-linear dynamics, and that the effect of different factors may interfere and amplify or dampen each other (Charney, 1947; Gill, 1982; Lindzen, 1990; Held, 1993; Feldstein, 2003).

It is well known that numerical weather prediction (NWP) has a limited forecast horizon because small initial errors will grow over time in a non-linear fashion (Lorenz, 1963). The case for seasonal forecasting is somewhat different, as it relies on slow changes in the ocean and cryosphere, which act as persistent boundary conditions. NWP and seasonal forecasting represent two types of predictability referred to as “type 1” and “type 2” (Palmer, 1996). Whereas NWP is more an initial value problem (type 1), the seasonal forecasts embeds a degree of the boundary value problem aspect (type 2). Furthermore, seasonal forecasts tend to present the statistics of the weather over a given interval, rather than the exact state at any instant. In other words, seasonal forecasts can be compared with predicting a change in the statistics of a sample of measurements, whereas weather forecasting is more like predicting the details about one specific data point in that sample.

Models used for seasonal forecasting have traditionally involved a model for
the atmosphere coupled to an ocean component, and were originally developed
for the tropical region and the El Niño–Southern Oscillation (Anderson,
1995; Stockdale et al., 1998; Palmer and Anderson, 1994). Aspects, such as
sea ice, the stratosphere, and snow cover, were not emphasised as they were
not believed to play an important role for the seasonal weather evolution.
More recent studies have looked at the potential influence of sea ice
(Balmaseda et al., 2010; Petoukhov and Semenov, 2010; Overland and Wang,
2010; Francis et al., 2009; Deser et al., 2004; Magnusdottir et al., 2004;
Seierstad and Bader, 2008; Benestad et al., 2010; Orsolini et al., 2012),
especially after the recent dramatic downward trends in the sea-ice extent
(Kumar et al., 2010; Boé et al., 2010; Holland et al., 2008; Wilson,
2009; Kauker et al., 2009; Stroeve et al., 2007, 2008). Other studies have
involved the effect of snow cover on the atmospheric circulation (Cohen and
Entekhabi, 1999; Ge and Gong, 2009; Ueda et al., 2003; Hawkins et al., 2002;
Watanabe and Nitta, 1998; Orsolini et al., 2013) or the influence of
stratospheric conditions on the lower troposphere (Baldwin and Dunkerton,
2001; Baldwin et al., 2003; Thompson et al., 2002). Few of these studies,
however, have looked at how these different factors in combination may
interfere with each other, nor has there been many sensitivity tests for
investigating how the model set-up, with different combinations of the
components representing these different aspects, affects the results. One
question we would like to address is whether the response to these different
factors adds linearly or if the response is a non-linear function of these
factors. Furthermore, it is interesting to find out which of these factors
are more dominant than others. Moreover, our objective was to try to
understand which

The model used in this study was the EC-Earth version 2.1 state-of-the-art
earth system model (Hazeleger et al., 2010), which had been developed by a
consortium of meteorological institutes/universities across Europe. The
atmospheric component of the EC-Earth model was based on ECMWF's Integrated
Forecasting System (IFS) cycle 31R1 with a new convection scheme and a new
land surface scheme. The ocean component was based on version 2 of the NEMO
model (Madec, 2008), with a horizontal resolution of nominally
1

The synthesis experiments consisted of a set of 12 coupled model simulations. Six of these simulations used the L62 vertical resolution for the atmospheric component, which extended up to 5 hPa, while the other six used the higher resolution L91 version, which extended up to 0.01 hPa. These two sets of experiments were designed to determine the sensitivity of model results to a better representation of the stratosphere. Further, to evaluate the role of sensitivity to the representation of sea ice, the LIM2 sea-ice model was implemented as a standard thermodynamic–dynamic model (DyIce) and as a thermodynamic-only model (NoDyIce). Finally, sensitivity to initial conditions was tested by introducing perturbations to initial conditions corresponding to positive/negative NAO SST (North Atlantic Oscillation sea surface temperature) anomaly patterns over the North Atlantic (Melsom, 2010). All simulations started on 1 January 1990 and lasted 90 days. The initial conditions used in this experiment came bundled with the earlier (test) versions of EC-Earth (up to V2.1) and were based on ERA-Interim. An overview of the model simulations is listed in Table A1.

Map of monthly mean air temperature difference at 200 hPa between the high-top and low-top experiments for month 3.

Here the experiments and analysis used an approach known as “factorial design” (Yates and Mather, 1963; Fisher, 1926; Hill and Lewicki, 2005; Wilkinson and Rogers, 1973; Benestad et al., 2010), where a factorial regression was used to assess which influence each of the choices in the model set-up has on the forecasts. It is a technique that can analyse sets of factors which are considered to have potential effects on the outcome in experiments, where an analysis of variance (ANOVA; Wilks, 1995) provides estimates for error bars and the level of statistical significance. Hence, factorial regression offers an alternative to traditional ways for estimating statistical significance used in meteorology and climate sciences, such as difference tests between two ensembles. Factorial regression can be applied to data that are generated by a process that involves two or more factors (set-up options or categories) and are difficult to quantify due to their discrete nature (e.g. some factors may either be present or absent). It has been used to analyse the effect of introducing different crop varieties in agriculture (e.g. Baril et al., 1995; Vargas et al., 1999, 2006; Voltas et al., 2005). It is based on the concept “factorial experiment”, or factorial design, in statistics, which involves two or more factors, each of which can be assigned a category or a discrete value. This kind of analysis takes all possible combinations of levels over all such factors including their interactions into account.

The model response to different initial conditions or different model set-ups
with different options for three configurations (SST perturbation, model top,
and sea-ice model) was investigated, and a comparison was made between the
different experiments in terms of vertical and horizontal cross sections of
temperature anomalies. If the final response

We did not know the relative strength of the different factors in terms of an input; however, the factorial regression quantified the differences between output from different combinations of subsets. It was also used to estimate the probability that the response in the different combinations of these subsets would be due to chance. The results from the factorial regression were subsequently used to explore the combined effect of several factors.

The Walker test was used to assess the false discovery rate of the

Figures 1 shows the difference in the forecasts' associated stratosphere, more specifically between the low-top (L62) and high-top (L91) versions of the atmosphere for month 3. It presents horizontal transects at the 200 hPa level, and shows the monthly mean temperature starting with a 2-month lead time. The left panels show results with no initial perturbation (neutral NAO conditions), the middle panels show results from model simulation with initial conditions set at a positive phase of NAO, and the right panels results for which the initial conditions were the negative phase of the NAO. All the panels show that there were differences between the low- and high-top results, and the difference between the low- and high-top model simulation was most pronounced at negative and positive NAO-type initial conditions (not shown). Hence, the forecasted air temperature was sensitive to the inclusion of the upper part of the atmosphere, and the effect can be seen extending throughout the entire vertical extent of the atmosphere (not shown). The differences between the upper and lower rows show the effect of dynamic vs. non-dynamic sea-ice representation. With a non-dynamic sea ice, the inclusion of a stratosphere resulted in stronger vertical dipole patterns at certain longitudes and for positive NAO initial conditions. For the negative NAO initial conditions, the dynamical sea-ice representation amplified the differences between the L91 and L62 model simulations.

Figure 1 suggests that the effect of including the stratosphere and the representation of sea-ice matter for the mid-latitude regions to the polar regions, and the choice of the vertical levels had less impact in the tropics. The response suggests mid-latitude wave-like structures in the 200 hPa temperatures, albeit with a tendency of a coherent anomaly over the North Pole. The choice of the sea-ice representation had a visible impact on the simulation of the monthly mean temperature after 3 months, seen as the difference between upper and lower panels. The horizontal picture at 200 hPa (Fig. 1) suggests radically different wave structure for the negative NAO phase, however, whereas for the respective “positive” and “neutral” NAO states, the differences were seen in both regional details and in magnitude. The exact geographical structure in these maps is not the important point here, as the longitude of action will depend on the initial condition. The important information here is the pronounced response in the mid-latitudes to high latitudes.

Coefficients and error estimates from the factorial regression of
air temperature at 60

In summary, it is apparent from Fig. 1 that the effect of different model
aspects such as the choice of model top and sea-ice representation influenced
the model forecasts. Furthermore, we see that the influence varied with the
initial SST conditions, and that different sea-ice representation introduced
changes in the forecast of similar magnitude as the influence of the model
top. It is difficult to compare these effects with that of the initial
conditions merely from Fig. 1; however, we compared the effect from these
different aspects through the means of a factorial regression. The ANOVA for the factorial regression yielded a set of
coefficients

Figure 2 presents the coefficients and the error estimates from the factorial
regression. The top panel shows the mean air temperature for the model
forecasts with a model set-up of dynamical sea-ice component, no perturbation
in the SST, and 62 vertical levels (low top). Panels b–e show the
differences in the forecasts due to different choices in the model set-up in
terms of the regression coefficients

The previous results have indicated a high sensitivity to the various choices
in the model set-up; however, we need to examine the relationship between the
regression coefficients and error estimates in order to infer whether any has
a systematic effect on the model predictions. Figure 3 shows the ratio
response to error for sea ice (upper), positive NAO SST perturbation (second
from the top), negative NAO SST perturbation (third), and the stratosphere
L91 (bottom). Only a small region had a response that was greater in
magnitude than the error estimate for the sea ice, whereas for the SST
perturbations and the stratosphere, the regions where the response-to-error
ratio had a magnitude greater to unity were more extensive. Both large
negative and positive values indicate that the signal is stronger than the
noise

The ratio of the factorial regression coefficients to the error
estimate for different factors:

The factorial regression gave the highest number of low

Monthly mean air temperature at 60

The question of degree of non-linearity can be addressed by comparing the sum
of the influence from the different factors with simulations with and
without a set of factors combined; i.e. we check for the equivalency:

The set of sensitivity experiments shows that seasonal forecasts at
mid-latitudes to high latitudes are sensitive to a number of factors concerning the
model set-up, and that the choice of subjective and subtle options can have
as strong an effect on the monthly mean temperature poleward of the
mid-latitudes as the initial conditions. A factorial design experiment allows
us to assess the relative magnitudes of different model height with that of
different sea ice or different SST perturbations. We can also test the
response in the model to see if it is close to being a linear
superposition of the different single factors, or if the model response is
highly non-linear. The statistical significance was estimated based on the
factorial regression. The magnitude of the effect of the sea ice, SST
perturbations, and the model top height were roughly similar, although the
response to the sea ice was somewhat weaker than the others. The lower ratio
of estimate-to-error also reflected the degree of non-linearity, and the
relatively higher

There is previous work in which model sensitivity and uncertainty have been assessed (e.g. Rinke et al., 2000; Wu et al., 2005; Pope and Stratton, 2002; Jacob and Podzun 1997; Knutti et al., 2002; Dethloff et al., 2001); however, most of these assessments have been carried out for climate simulations as opposed to seasonal forecasts. In seasonal forecasting, the emphasis has been more on multi-model forecasts and their spread (Weisheimer et al., 2009), rather than the configuration of single models. However, Jung et al. (2012) discussed the effect of the spatial resolution on seasonal forecast based on an experimental design with a single model. The use of factorial regression was also discussed by Rinke et al. (2000) in conjunction with climate simulations, and Benestad et al. (2010) used it in a study of seasonal predictability and the effect of boundary conditions associated with sea ice and initial conditions. This study applied factorial regression to a new set of model configuration options, including the model top, the representation of sea ice, and initial conditions. In this case, we emphasised the individual factors rather than their interaction because of the limited sample of model runs. An inclusion of these interactive factors can give an indication of the effects of changing more than one option at the time (given a sufficient sample), e.g. how the combination of different vertical extent, sea ice model, and initial conditions results in a different outcome. However, we addressed this issue separately in this study by comparing the different terms in Eq. (1), which indeed suggested that the results from changing more than one factor give a non-linear response. These aspects require more efforts to form a better understanding, both in terms of larger ensemble experiments and understanding of the physics involved. However, the objective here was to try to find potential additional explanations for why seasonal forecasting has been associated with such low skill in mid-latitudes, in addition to the higher degree of non-linear dynamics in connection to weather patterns.

These experiments involved global coupled atmosphere–ocean models that are used for operational seasonal forecasting, especially for the El Niño–Southern Oscillation (ENSO); however, our analysis focused on the mid-latitudes. The results nevertheless allow for a comparison between the tropics and higher latitudes. They suggest that the outcome of the predictions in the mid-latitudes is sensitive to the choice of the top of the atmosphere and the representation of sea ice, but the low latitudes are insensitive to these factors. Hence, they support the hypothesis that the lack of seasonal prediction skill reported in the mid-latitudes may be linked to non-optimal model configuration. Further insight from these experiments moreover includes (1) that subjective choices in terms of model set-up (vertical levels and type of sea-ice representation) have an effect on the outcome of the seasonal forecasts in the high latitudes, (2) that factorial regression can be used as a means to describe the effect of different model options, and (3) that the effect of these different choices results in a non-linear response. These aspects have rarely been discussed in the past, perhaps because they do not have a strong effect on the simulation of processes in the tropics (e.g. ENSO).

A set of sensitivity tests revealed that seasonal predictability of the temperature at the mid-latitudes to high latitudes was as sensitive to subjective choices regarding the model set-up as the initial SST conditions. Hence, these results illustrate the difficulties associated with seasonal forecasting at the higher latitudes with an effect on the forecast skill. The tropical temperatures were insensitive to these choices, and the sea-ice representation and the stratosphere do not have a visible effect on, e.g., ENSO forecasts.

The data presented here are available from

We are grateful to Wilco Hazeleger and the EC-Earth community for providing a stand-alone version of the EC-Earth model, and Simona Stefanescu at the ECMWF for all her assistance. Comments from two reviewers have also improved this paper. This work was carried out under the SPAR project (“Seasonal Predictability over the Arctic Region – exploring the role of boundary conditions”; project 178570, funded by the Norwegian Research Council and the Meteorological Institute) and SPECS (EU Grant Agreement 3038378), and the model simulations used computational resources at NOTUR – the Norwegian Metacenter for Computational Science. The data used in this analysis can be obtained by contacting the authors.Edited by: B. Kravitz Reviewed by: two anonymous referees