Recent studies demonstrate that weather and climate predictions potentially improve by dynamically combining different models into a so-called “supermodel”. Here, we focus on the weighted supermodel – the supermodel's time derivative is a weighted superposition of the time derivatives of the imperfect models, referred to as weighted supermodeling. A crucial step is to train the weights of the supermodel on the basis of historical observations. Here, we apply two different training methods to a supermodel of up to four different versions of the global atmosphere–ocean–land model SPEEDO. The standard version is regarded as truth. The first training method is based on an idea called cross pollination in time (CPT), where models exchange states during the training. The second method is a synchronization-based learning rule, originally developed for parameter estimation. We demonstrate that both training methods yield climate simulations and weather predictions of superior quality as compared to the individual model versions. Supermodel predictions also outperform predictions based on the commonly used multi-model ensemble (MME) mean. Furthermore, we find evidence that negative weights can improve predictions in cases where model errors do not cancel (for instance, all models are warm with respect to the truth). In principle, the proposed training schemes are applicable to state-of-the-art models and historical observations. A prime advantage of the proposed training schemes is that in the present context relatively short training periods suffice to find good solutions. Additional work needs to be done to assess the limitations due to incomplete and noisy data, to combine models that are structurally different (different resolution and state representation, for instance) and to evaluate cases for which the truth falls outside of the model class.

Although weather and climate models continue to improve, they will inevitably remain imperfect

The foundation of modern weather and climate prediction rests on the assumption that when an estimate of the climate state is at disposal at a particular instance in time, its time evolution can be calculated by a proper application of a numerical discretization of the fundamental laws of physics, supplemented by empirical relationships describing unresolved scales and a complete specification of the external forcing and boundary conditions. Integration in time subsequently yields a predicted climate trajectory into the future and formally frames the climate prediction endeavor as a mixed initial and boundary conditions problem

An illustrative example of this propagation of model errors is presented in

Reducing model errors early in the prediction is precisely what supermodeling attempts to achieve

The supermodeling approach was originally developed using low-order dynamical systems

A crucial step in supermodeling is the training of the connection coefficients (for connected supermodels) or weights (for weighted supermodels) based on data, the observations. The first training schemes of supermodels were based on the minimization of a cost function dependent on long simulations with the supermodel

Before supermodeling becomes suitable for the class of large-dimensional state-of-the-art weather and climate models, we need to have training schemes that are computationally suitable for that context. In this paper, we develop, apply and compare CPT and the synch rule to train a weighted supermodel based on the intermediate complexity global coupled atmosphere–ocean–land model SPEEDO

In Sect.

To make the supermodeling approach more explicit, we formally write the model equations of a weather or climate model

A weighted supermodel based on two imperfect models is given by

For completeness and for comparison of the weighted supermodels with the connected supermodel from

A connected supermodel allows for more flexibility in the event that the ensemble is not perfectly synchronized

The SPEEDO global climate model consists of an atmospheric component (SPEEDY) that exchanges information with a land (LBM) and an ocean–sea-ice component (CLIO) using coupling routines (Fig.

The atmospheric model SPEEDY describes the evolution of the two horizontal wind components

SPEEDY exchanges water and heat with the land model LBM that uses three soil layers and up to two snow layers to close the hydrological cycle over land and a heat budget equation that controls the land temperatures. The horizontal discretization is the same as for the atmosphere model. The land surface reflection coefficient for solar radiation is prescribed using a monthly climatology. Each land bucket has a maximum soil water capacity. The runoff is collected in river basins and drained into the ocean at specific locations of the major river outflows.

SPEEDY exchanges heat, water and momentum with the ocean model CLIO

Formally, the SPEEDO equations can be written as

Schematic representation of the SPEEDO climate model. The atmosphere needs surface characteristics (temperature, roughness, reflectivity, soil moisture) in order to calculate the exchange of heat, water and momentum. Coupler software communicates this information between the components and interpolates between the computational grids.

The training experiments of this study are evaluated in a noise-free observation framework, with perfect observations generated by sampling a reference model trajectory. This “perfect model” provides a set of time-ordered observations, called the “truth”. We consider the SPEEDO climate model with standard parameter values as truth and create imperfect models by perturbing parameter values in the atmospheric component. A supermodel is formed by combining the imperfect atmosphere models through a weighted superposition of the time derivatives of the imperfect models (Eq. 2) which are each coupled to the same ocean and land model (Fig.

Schematic representation of the SPEEDO climate supermodel based on two imperfect atmosphere models. The two atmosphere models exchange water, heat and momentum with the perfect ocean and land model. The ocean and land models send their state information to both atmosphere models. The atmosphere models exchange state information in order to combine their time derivatives.

Two different learning strategies are evaluated in this study in order to train the SPEEDO weighted supermodel: learning based on CPT as developed and applied to low-order dynamical systems in

The CPT learning approach is based on an idea proposed by

Adapted from

In the case of a multi-dimensional model, such as SPEEDO, it is possible that at each time step different models are closest to the truth for different state variables and at different grid locations. In this case, we continue per state variable with the model that is closest. This means that the initial state for the next time step can consist of a combination of models. As the values for the different state variables might not be in agreement with each other, this creates imbalances that can lead to numerical instabilities. A (partial) solution is to decrease the time step, as we shall see in Sect.

The training period is terminated when the CPT trajectory starts to deviate from the truth beyond a given pre-specified threshold. After training, an optimal trajectory is obtained that is produced by a combination of different imperfect models (Fig.

CPT trajectory after a training period of 20 time steps. Model 1 is used for 6 out of 20 time steps; hence, model 1 will get a weight of 0.3.

For the training of a supermodel based on synchronization, a learning rule (the synch rule) is used that updates the weights such that synchronization errors between truth and supermodel are minimized. In contrast to CPT learning, initial values for the weights need to be chosen and the weights are updated during training. Under certain conditions, the supermodel will fall into synchronized motion with the truth as the weights are updated and the supermodel is nudged to the truth (black arrows in Fig.

At each observation (dots) of the truth (continuous black line), the weights of the imperfect models (red, blue) are updated which gives a new supermodel solution (green dotted line). The black arrows indicate the nudging to the truth.

The synch rule for the weights is an application of the general synchronization-based parameter estimation approach suggested in

In the context of two dynamical systems that differ in parameter values only, the general synch rule for parameter estimation is given by

In training a supermodel, we assume that the truth can be described by a weighted dynamical combination of imperfect models with the weights as adjustable parameters. In this case, the function

Integration of the synch rule implies that as long as the time series of the synchronization error

In training the SPEEDO supermodel, we regard the atmospheric model with standard parameter values as truth, whereas imperfect atmospheric models are created by perturbing those parameter values. Figure

During training, the truth and imperfect models all share their states. In the case of CPT, this state information is used by each imperfect model to check which model is closest to the truth and continue the integration from that state. In the case of the synch rule, this state information is used to calculate the synchronization error between the supermodel and the truth.

Schematic representation of the SPEEDO system during training

Application of the synch rule to a weighted SPEEDO supermodel of two imperfect models implies integration of the following set of equations:

In order to be able to compare results of the weighted supermodels of this study to the connected supermodels in

The first supermodel that we will train will consist of a weighted superposition of models 1 and 2. The second supermodel will consist of a weighted superposition of models 1, 3, 4 and 5. The parameter values of these models are chosen such that they form a so-called convex hull around the true parameter values (see

The third supermodel consists of a weighted superposition of models 1 and 6. In this case, both imperfect models have parameter values that are smaller than the corresponding true values. A weighted superposition with positive weights does not correspond to a model with parameter values that are closer to the truth. Note that both models overestimate the average temperature and precipitation (Table

Parameter values of perfect and imperfect models.

Global mean average difference between the imperfect models and the perfect model, calculated over the last 30 years of the simulation.

For both CPT and the synch rule, we choose to work with global weights, which means that for each meteorological variable we use the same weight at every grid point. In principle, one could allow different weights per each grid point but it could induce dynamic imbalances that pull the model away from its attractor. The model's reaction is then to restore the dynamical balances and return to its own attractor

The SPEEDO model has five prognostic variables: temperature, vorticity, divergence, specific humidity and surface pressure (

We found that smaller time steps were required during CPT training as compared to standard integrations. Gravity waves induced by the state replacement during training require a smaller time step in order to prevent numerical instabilities. We found that a 15 min time step was sufficient with our choice of imperfect models, which is half the time step of the standard integration.

In CPT training, the sum of the weights is normalized to 1. In the application of the synch rule, on the other hand, the sum of the weights is not explicitly constrained. One can start from zero weights and let the synch rule find the optimal set of weights. Initializing weights with a sum larger than 1 easily leads to numerical instabilities because the weighted mean state becomes more energetic. Imposing the constraint of the sum of weights being 1 during the training also led to numerical instabilities. We chose to initialize with equal weights that sum to 1.

The synch rule contains an adjustable rate of learning scaling factor

In the experimental setup during training, we assume a perfect ocean and land models which receive fluxes from the perfect atmosphere. However, in the supermodel setup, perfect fluxes are not available and we use a weighted combination of the fluxes from both imperfect models instead. In the connected supermodel of

We describe the learning results and the forecast short- and long-term capabilities of the three supermodel configurations separately.

Calculation of weights for a supermodel constructed from two imperfect models using two different training schemes.

We first trained a weighted supermodel based on imperfect models 1 and 2 (see Table

Ideally, both CPT and the synch rule should produce converged weights, i.e., weights that remain stable if the training period is extended. The required length of the training period for the convergence of the two methods turns out to be very different. For CPT, a training period as short as a couple of days produces converged weights, whereas for the synch rule it takes about a year. Note that we limit the CPT training period to a week, as the CPT trajectory starts to deviate significantly from the truth after approximately 10 days. The reason that CPT diverges from the truth is because we have a limited ensemble size. With non-linear processes causing rapid error growth, the truth soon falls outside the limited ensemble. The problem is exacerbated by replacing a model state with state variables mixed from different models which introduces imbalances that cause additional error growth.

In order to check the difference between the CPT weights during a year, the CPT method is applied for each week during 1 year. After each week, the values for all prognostic variables are reset to the truth, and the procedure is repeated. Figure

Using the synch rule, weights for temperature and vorticity converge within the first couple of weeks, whereas for divergence the weights cannot be learned faster than within a year in order to avoid numerical instabilities (see Fig.

Global mean time series for the perfect model, the imperfect models and the two supermodels trained by CPT and the synch rule. The normalized root mean squared error (RMSE) in the climatology of model years 2011–2040 with respect to the climatology of the truth is given in each panel. The normalization is such that the expected value of the perfect model error is 1.

Weights for the supermodel trained by CPT and the synch rule. Between brackets, the standard deviation over the year (CPT) or the standard deviation over the last 10 weeks of training (synch rule) is given.

The imperfect models and the supermodel are integrated for 40 years in time, starting from 1 January of model year 2001. The climatology is defined as the average over years 11–40. The error in the climatology is defined as the root of the global mean squared error (RMSE) between the model and the truth. In addition, the perfect model is integrated for 40 years from a slightly perturbed initial condition, in order to obtain an estimate of the sampling error, i.e., to estimate the representativeness of the errors of the different models. Global mean time series for surface air temperature, precipitation, surface solar radiation and cloud cover for the different models show that both weighted supermodels behave very similar and remain close to the perfect model (Fig.

A spatial characterization of the performance of the supermodel in simulating the climatology of the zonal wind at 200 hPa is given in Fig.

Difference in the zonal wind at 200 hPa averaged over model years 2011–2040 for the various models with respect to the truth. Contours denote areas where the difference is larger than the sampling error at 95 % confidence (solid for positive difference; dotted for negative). Positive values imply stronger mean winds blowing eastward.
Units: m s

In the context of simpler models,

In order to assess the quality of short-term forecasts, we initialized the various models from slightly perturbed states of the truth and integrated the models for 2 weeks. We selected 25 initial states, 2 weeks apart, starting 1 January, so the forecasts cover almost 1 year. The quality of the forecast is measured by the RMSE in the global surface air temperature forecast, averaged over the 25 forecasts, and is shown in Fig.

Forecast quality as measured by the RMSE of the truth and a model with a perturbed initial condition. The control is the difference between the perfect model and the perfect model with a perturbed initial condition.

As explained in Sect.

CPT weights calculated during a training period of 1 week for 1 year

Weights for the supermodel trained by CPT and the synch rule. Between brackets, the standard deviation over the year (CPT) or the standard deviation over the last 10 weeks of training (synch rule) is given.

The values of the weights for vorticity trained by the synch rule are very close to the values obtained by CPT training. This is not the case for temperature and divergence. For temperature, CPT puts 10 % less weight on imperfect model 3 compared to the synch rule and a 10 % stronger weight on model 1 for divergence. The synch rule puts (almost) zero weight on imperfect model 1. This is because imperfect model 1 calculates exactly the same vorticity change as imperfect model 4; hence, the synch rule suggests that imperfect model 1 has no added value in the weighted supermodel. Again, the synch rule training yields sum of weights equal to 1 as an optimal solution. Using these weights, we will compare the climatology and forecast skill of both supermodels.

We repeated similar climate integrations as in the case of the supermodels based on two imperfect models and assessed the climatological errors. By comparing the 40-year time series of global mean values in Fig.

Global mean time series for the truth, the perfect model, the imperfect models and the two supermodels trained by CPT and the synch rule. Included is the RMSE of the model years 2011–2040 with respect to the truth. The normalized RMSE in the climatology of model years 2011–2040 with respect to the climatology of the truth is given in each panel. The normalization is such that the expected value of the perfect model error is 1.

This experiment demonstrates the potential of supermodels to mitigate common errors and thereby clearly outperform the standard MME approach. Since all imperfect models overestimate the global average temperature and simulate too much precipitation, a standard weighted MME approach results in a climatological forecast worse than the best imperfect model. In the case that the imperfect parameters form a convex hull around the true parameter values, we may expect that a supermodel can be constructed with a climatology much closer to the truth as compared to the best imperfect model. In the case that the imperfect models do not form a convex hull around the true parameter values, allowing negative weights in the weighted supermodel might still improve the climatology and forecast skill. This will be explored in the next section.

We repeated the same forecast experiment as in the case of the supermodel based on two imperfect models. Also, in this case, the supermodels have forecast errors that are substantially reduced as compared to the imperfect models, up to a factor of 3 smaller (not shown). Both supermodels have comparable forecast skill in this measure.

The CPT training method only produces positive weights, since the weights are defined as being equal to the frequency that the solution of a particular model is closest to the truth during the training period. The synch rule training, on the other hand, does not impose any constraint on the weights. The weights came out positive due to the convex hull principle: the imperfect models considered so far surrounded the truth and with positive weights the effect of the true parameter values can be approximated. But in the event that the imperfect models have parameter values that are all smaller or larger than the truth, only by allowing negative weights one can construct a linear superposition of imperfect models that is closer to the truth. To test if such a supermodel with negative weights indeed shows the desired physical behavior and to test if we can obtain such a model with the synch rule, we construct a weighted supermodel based on two imperfect models (models 1 and 6) with parameter values on the same side of the true parameter values (Table

After a training period of 1 year using the synch rule, stable weights are obtained, which indicates that at least a local minimum is reached. And as expected, the training produces negative weights (Table

Weights for the supermodel trained by the synch rule. Between brackets the standard deviation over the last 10 weeks of training is given.

Global mean average difference with the perfect model, calculated over the last 30 years of the simulation.

Stable climate simulations turn out to be possible with a weighted supermodel using negative weights. The climatology of the supermodel has improved significantly compared to both imperfect models, as displayed in Table

Difference in the east–west component of the wind at the 200 hPa pressure level averaged over model years 2011–2040 for the various models with respect to the truth. Contours denote areas where the difference is larger than the sampling error at 95 % confidence (solid for positive difference; dotted for negative). Positive values imply stronger mean winds blowing eastward. Units: m s

Forecast quality as measured by the RMSE of the truth and a model with a perturbed initial condition. The control is the difference between the perfect model and the perfect model with a perturbed initial condition.

The forecast errors are evaluated in a similar fashion as in the previous cases and shown in Fig.

We conclude this section with a summary of the climatological errors of the weighted supermodels of this study and the connected supermodel of

Overview the RMSE of the different supermodels (the connected supermodel of

We have demonstrated the potential of weighted supermodeling to improve weather and climate predictions using the global coupled atmosphere–ocean–land model SPEEDO in the presence of parametric error. Weighted supermodels are constructed based on SPEEDO with perturbed parameters. The perturbations are chosen such that the spread in imperfect models reflects the uncertainty in climate models realistically. The weights are trained using data from the perfect model (i.e., our reference simulated truth) using two different training schemes having low computational cost. The first method is based on CPT, where different model trajectories are “crossed” in order to create a larger ensemble of possible trajectories. The second method is a synchronization-based learning rule (synch rule), which adapts the weights of the different imperfect models during training such that the supermodel synchronizes with the perfect model.

Both training methods yield supermodels that outperform the individual imperfect models in short-term forecasts as well as in long-term climate simulations. CPT training required shorter training periods (1 week as opposed to a year for the synch rule), but both are much more efficient than cost-function-based approaches that are known to require many climate simulations in an iterative process to reach convergence on optimal weights

In the application of CPT in this study, we encountered numerical issues due to the partial state replacement. A possible solution is the use of data assimilation techniques to combine state information from different models in a dynamical consistent manner

The weighted supermodels of this study have smaller climatological errors as compared to the connected supermodel based on the same two imperfect models in

In the second supermodel experiment of this paper, the parameter perturbations of four imperfect models were chosen such that they formed a so-called convex hull around the true parameter values. This implies that a linear combination with positive weights of these four models is able to reproduce the model equations with the true parameter values, provided that the parameters appear only linear in the equations. This is not exactly true in this case, but the trained weighted supermodel based on these four models turned out to have a climatology close to the truth. As all four imperfect models have a warmer and wetter climatology than the truth, simply taking the MME mean with positive weights thus does not improve the climatology. This experiment is a clear example of the potential benefit of the supermodeling approach to ameliorate common model errors. This benefit arises due to the fact that model errors are compensated at an early stage, in the time derivative, and not a posteriori, as in the MME approach where model errors have propagated spatially across the globe, across scales and across the different meteorological fields and other components of the climate system.

In the final supermodel experiment, we have explored the use of negative weights in order to improve predictions in the case that model errors do not compensate; i.e., both imperfect models have parameter perturbations and climatological errors of the same sign. A supermodel trained using the synch rule yielded negative weights. With these weights, stable and credible simulations turn out to be possible and forecast errors as well as climatological errors are reduced with respect to the imperfect models. Substantial errors remain as not all prognostic equations are combined (only temperature, vorticity and divergence, not humidity and surface pressure) and the parameters do not appear linearly in the equations.

Although the synch rule training does not impose that the weights sum to 1, the training inevitably yielded sum of weights equal to 1. An example based on the Lorenz 1963 equations

The ultimate goal of our research is to apply supermodeling to realistic climate models. But will it work? Based on the current results, we believe that this is possible, although the application is not as straightforward as for SPEEDO. First, state-of-the-art models are far bigger and more complex, making their numerical computation a substantial burden. This makes numerical efficiency a key aspect to consider. Second, the real world is not simply a perturbed parameter version of these complex models. In this paper, we have worked under the hypothesis that model error only originates by error in the model parameters in the atmosphere. The imperfect atmosphere models were coupled to the same ocean and land model, which constrains the variability on longer timescales. So far we have demonstrated that the long-term behavior of the supermodel improves while training only short-term prediction errors. It remains to be seen how much the long-term evolution will improve in the presence of imperfections in the slow components of the climate system. Furthermore, it is essential to extend the approach to other sources of model error towards the application with real climate models. In that case, on top of parametric error, model error can arise from the presence of unresolved scale, numerical discretization or incorrect physics.

Together with the realisms of the models (and of the related model error), those of the observations are also of central importance. In all previous studies with supermodeling, including the current, observations were assumed to be perfect, i.e., to be complete and noise free. To use real data, it will thus be necessary to study the robustness of the supermodeling approach to noisy and unevenly distributed observations and to extend the methods to account for the observational noise. This latter problem is the subject of ongoing research of scientists which are making use of ideas and techniques from data assimilation. Data-assimilation-based supermodeling is also envisioned to account for generic source of model error in the construction of the supermodel, and it will be the subject of future research.

No data sets were used in this article.

FSc conceived the study, carried out the research and led the writing of the manuscript. FSe provided codes and technical advice and provided with AC and NK input for the interpretation of the results and the writing.

The authors declare that they have no conflict of interest.

Alberto Carrassi has been funded by the Trond Mohn Foundation under the project no. BFS2018TMT0.

This research has been supported by the H2020 European Research Council (grant no. STERCP (648982)).

This paper was edited by Andrey Gritsun and reviewed by two anonymous referees.