The degree of trust placed in climate model projections is commensurate with how well their uncertainty can be quantified, particularly at timescales relevant to climate policy makers. On inter-annual to decadal timescales, model projection uncertainty due to natural variability dominates at the local level and is imperative to describing near-term and seasonal climate events but difficult to quantify owing to the computational constraints of producing large ensembles. To this extent, emulators are valuable tools for approximating climate model runs, allowing for the exploration of the uncertainty space surrounding selected climate variables at a substantially reduced computational cost. Most emulators, however, operate at annual to seasonal timescales, leaving out monthly information that may be essential to assessing climate impacts. This study extends the framework of an existing spatially resolved, annual-scale Earth system model (ESM) emulator (MESMER,

Climate model emulators are computationally inexpensive devices that derive simplified statistical relationships from existing climate model runs to then approximate model runs that have not been generated yet. By reproducing runs from deterministic, process-based climate models at a substantially reduced computational time, climate model emulators facilitate the exploration of the uncertainty space surrounding model representation of climate responses to specific forcings. A wide toolset of Earth system model (ESM) emulators exists with capabilities ranging from investigating the effects of greenhouse gas (GHG) emission scenarios on global and regional mean annual climate fields

Recently, the Modular Earth System Model Emulator with spatially Resolved output (MESMER)

Considering the importance of monthly and seasonal information when assessing the impacts of climate change

This study focuses on extending MESMER's framework by a local monthly downscaling module (MESMER-M). This enables the estimation of projection uncertainty due to natural variability as propagated from annual to monthly timescales since MESMER-M builds upon MESMER, which has already been validated as yielding spatio-temporally accurate variabilities. In constructing MESMER-M, we furthermore place emphasis on representing heteroscedasticity of the intra-annual temperature response as well as changes in skewness of individual monthly distributions in a spatio-temporally accurate manner. The structure of this study is as follows: we first introduce the framework of the emulator in Sect. 3.1 and the approach to verification of the emulator performance in Sect. 3.2; we then provide the calibration results of the emulator in Sect. 4 and verification results in Sect. 5, after which we proceed to the conclusion and outlook in Sect. 6.

In the analysis, 38 CMIP6 models

MESMER is an ESM-specific emulator built to statistically produce spatially resolved, yearly temperature fields by considering both the local mean response and the local variability surrounding the mean response. Within MESMER, local temperatures

We divide MESMER-M into a mean response module and a residual variability module, each calibrated on ESM simulation output data at each grid point individually according to the procedure described in this section and summarized in Fig.

Modular framework for the monthly emulator illustrated for an example ESM. The local mean response module is represented using a harmonic model (Fourier series). The local variability module starts with a Yeo–Johnson transformation before fitting the AR(1) process (distinguished into temporally varying

The mean response module was conceived to represent the grid point level mean response of both monthly temperature values and the amplitudes of the seasonal cycle (intra-annual variability) to changing yearly temperatures. We employ a harmonic model consisting of a Fourier series with amplitude terms fitted as linear combinations of yearly temperature, centred around a linear function of yearly temperature as shown in Eq. (

The local residual variability (otherwise simply called “residuals”), i.e. the difference between actual local monthly temperature and its mean monthly response to variations in local yearly temperature (given by the harmonic model), is assumed to be solely a manifestation of intra-annual variability processes. It can thus be thought of as short-term spatio-temporally correlated patterns. Similar to previous MESMER developments, we represent the residual variability using an AR(1) process. Since the residual variability distributions display month-dependent skewness (see example Shapiro–Wilk test for January and July in Appendix

The AR(1) process is applied to the residual variability terms after they are locally normalized as according to Eq. (

The AR(1) process accounts for autocorrelations up to a time lag of 1 and is suited in representing the residual variability, which is assumed to have rapidly decaying covariation such that longer-term patterns (if any) covary with yearly temperature.

Following this,

To evaluate how well the monthly cycle's mean response,

In order to evaluate how well the emulator reproduces the deviations from the harmonic model simulated by the ESMs, 50 emulations are generated per training run. First, we check that short-term temporal features are sufficiently captured for each individual grid point. To distinguish such short-term temporal features, we isolate the high-frequency temporal patterns present within the residual variability: each residual variability sequence is decomposed into its continuous power spectra, from which we verify, by computing Pearson correlation coefficients, that the top 50 highest-frequency bands within the training run residual variabilities appear with similar power spectra in the corresponding emulated residual variabilities. Second, we verify how well the spatial covariance structure is preserved by calculating monthly spatial cross-correlations across the residuals produced by each individual emulation and, by using Pearson correlations, comparing them to those of their respective training runs. Where test runs are available, a similar verification between them and training runs is done, thus yielding an approximation of how actual ESM initial-condition ensemble members would relate to each other.

The full emulator, consisting of both the mean response (i.e. harmonic model) and residual variability modules, is evaluated for its representation of regional, area-weighted average monthly temperatures of all 26 SREX regions (

Calculate the area-weighted regional average of a given SREX region for each ESM run and each of its respective emulations at a given month.

Extract the

Within a defined time period (e.g. 1870–2000), calculate the proportion of time steps for which the regionally averaged ESM value is less than the respective emulated

The quantile deviation of the ESM from the emulator is then given as

Any variability in monthly temperatures that cannot be explained by variability in yearly temperature alone is stochastically accounted for in MESMER-M's local residual variability module (see Sect. 3.2.2), following existing downscaling theory

To isolate the contribution of secondary biophysical feedbacks to the monthly temperature response, we consider them as inducing the residual differences between the ESM and harmonic-model realizations. This follows from the harmonic model representing the expected direct mean response to evolving yearly temperatures, with any systematic departure from it being driven by secondary forcers. To rudimentarily represent these contributions, a simple, physically informed model consisting of a suite of gradient-boosting regressors (GBRs)

To optimize the selection of the biophysical variables used as predictors, we first compare the performance of different physically informed models trained using different sets of biophysical variables for each ESM. The best globally performing model is selected as a benchmark to assess how well the residual variability module, described in Sect. 3.2.2, statistically represents properties within the monthly response arising from secondary biophysical feedbacks. Pearson correlations, between ESM test runs and harmonic-model test results augmented by biophysical variable,

List of biophysical variables used in training the gradient boosting regressor.

When calibrating the harmonic model constituting the mean response module, the highest orders of the Fourier series were found in tropical to sub-tropical regions where the seasonal cycle has a relatively low amplitude (first row, Fig.

Calibration parameters obtained from emulator fittings for all CMIP6 models. For the mean response module, the latitudinally averaged order of the Fourier series considered in the harmonic model is plotted against latitude (row 1). For the monthly residual variability module, parameters are displayed for January (rows 2–4) and July (rows 5–7).

The residual variability module calibration results are shown in Fig.

The lag-1 autocorrelation coefficients (

Localization radii vary from model to model and are generally higher in January than July reflecting seasonal differences in residual behaviour possibly due to northern hemispheric winter snow cover yielding larger spatial patterns. CanESM5 and MIROC6 display notably higher localization radii, which can again be tracked back to them having more training runs: more time steps are available during the leave-one-out cross-validation, thus making it generally possible to robustly estimate spatial correlations up to higher distances, which in turn leads to selecting larger localization radii. It should however be stressed that the ESM itself is the main driver behind the calibration results (e.g. even with only one ensemble member, MCM-UA-1-0 has a high localization radius).

To illustrate the regional behaviour of the calibrated emulator, we focus on four select ESMs. Figure

Regionally averaged temperature time series (rows) of January and July for four example ESMs (columns). Three ESM ensemble runs (coloured), their respective harmonic-model results (black), and 50 full emulations (emus) for each of the three patterns (grey colour scale) are plotted. Temperature values are given as anomalies with respect to the annual climatological mean over the reference period of 1870–1899. The regions are, from top to bottom, global land without Antarctica, western North America (WNA), and West Africa (WAF).

Figure

Regionally averaged variance of intra-annual temperatures (i.e. variance of each year’s monthly temperatures around the yearly mean) scatterplotted against yearly temperature (rows) for four example ESMs (columns).Three ESM ensemble runs (coloured), their respective harmonic-model results (black), and 50 full emulations for each of the three patterns (grey colour scale) are plotted. Temperature values are taken as anomalies with respect to the annual climatological mean over the reference period of 1870–1899. Each dot represents the temperature variance calculated from the monthly values for 1 individual year. The regions are from top to bottom: global land without Antarctica, western North America (WNA), and West Africa (WAF).

We evaluate the ability of the harmonic model, constituting the mean response module, in capturing the monthly cycle's response to evolving yearly temperatures. Pearson correlations between the harmonic model and ESM training runs range from 0.7 up to almost 1 (Fig.

Local mean monthly response verification for all CMIP6 models by means of Pearson correlation between the harmonic model and training runs (indicated by the colour of the lower triangle), over all global land grid points (without Antarctica) for each month. The correlations between the harmonic model and test runs are given in colour in the upper triangles to demonstrate how well the harmonic model performs for data it has not seen yet (a grey upper triangle means that no test run is available for this model).

To establish if temporal patterns within the ESM residual variabilities are successfully emulated, the correspondence of their respective power spectra at a grid point level is considered. Results shown in Fig.

Verification of the time component of the local variability module by means of Pearson correlations between the power spectra of the 50 highest-frequency bands present within the training runs (i.e. considering all months together) and the power spectra at which the same 50 frequency bands appear within the respective emulations (boxplots, whiskers indicating 0.05 and 0.95 quantiles) calculated per grid point. Fifty emulations are evaluated per training run. Where test runs are available, their correlations with training runs are also given (black crosses). The example 2D histogram shows the power-spectra-to-frequency ratio for CESM2 training runs versus the corresponding power-spectra-to-frequency ratio within its emulations.

For verification of the residual variability's spatial component, we consider the spatial cross-correlations within four geographical bands centred around the grid point for which temperature is being emulated (Fig.

Verification of the spatial representation within the local variability module. Pearson correlations between ESM training run and emulated spatial cross-correlations are considered for four geographical bands centred around the grid point for which temperature is being emulated (rows) at individual months of January (black boxplots) and July (red boxplots). Boxplot whiskers indicate 5th and 95th quantiles. Fifty emulations are evaluated per training run. Where ESM test runs are available, their correlations with training runs are also given for January (black crosses) and July (red crosses). Example 2D histograms of the January spatial cross-correlations for CESM2 training runs versus those of its emulations are given for each geographical band.

Regionally aggregated 5 %, 50 %, and 95 % quantile deviations of the ESM training (and where available test) runs from the emulated quantiles (derived from the full emulator consisting of both the mean response and local variability module) are plotted over the periods of 1870–2000 and 2000–2100 for the example months January (Figs.

January 5 % (left), 50 % (middle), and 95 % (right) quantile deviations (colour) of the climate model runs from the emulated quantiles of the ESM training (top block) and test (bottom block) run values from their monthly emulated quantiles, over the period 1870–2000 for global land (without Antarctica) and SREX regions (columns) across all CMIP6 models (rows). The monthly emulated quantile is computed based on 50 emulations per ESM run, and quantile deviations are given as averages across the respective number of ESM training/test runs. The number of test runs averaged across is indicated in brackets next to the model names in the bottom block. Red means that the emulated quantile is warmer than the quantile of the ESM run and vice versa for blue.

Same as Fig.

Same as Fig.

Same as Fig.

Generally, emulated 5 % (95 %) quantiles are warmer (colder) than those of the ESM training and test runs. Such under-dispersivity for regional averages is linked to the localization of the spatial covariance matrix within the residual variability module, such that spatial correlations drop faster within the emulator than they do in the actual ESM. For January over the time period of 1870–2000, the lowest magnitudes in 5 % and 95 % quantile deviations are observed for southern hemispheric regions (e.g. AMZ, NEB, WSA, SSA), along with slight over-dispersivity (see the blue 5 % quantile and red 95

In-depth analysis of the benchmarking approach outlined in Sect. 3.3 is conducted for four selected ESMs which exhibit diverse genealogies

Global land (without Antarctica) and regional performances (columns) of the physically informed model trained using different predictor sets (rows) shown for four selected CMIP6 models, for each SREX region. Acronyms for the predictor sets (

As HACS (biophysical configuration of sensible heat fluxes, latent heat fluxes, albedo, cloud cover, and snow cover) performs the best globally (appears as 1 in global land) across all four ESMs, we choose it as the benchmark physically informed model to compare the residual variability module to. Figure

Comparison of the performance of the harmonic model

We extend MESMER's framework to include the monthly downscaling module, MESMER-M, trained for each ESM at each grid point individually, thus providing realistic, spatially explicit monthly temperature fields from yearly temperature fields in a matter of seconds. We assume a linear response of the seasonal temperature cycle to its yearly mean values and represent it using a harmonic model. Any remaining response patterns are expected to arise from regional-scale, physical, and intra-annual processes, such as snow–albedo feedbacks or the modulation of atmospheric circulation patterns due to changes in ENSO, and have asymmetric, non-uniform (i.e. non-linear, non-stationary, affecting variance and skewness) effects across months. To capture them, we build a month-specific residual variability module which samples spatio-temporally correlated terms, conserving lag-1 autocorrelations and spatial cross-correlations whilst accounting for specificities in the residual variability structure across months. By letting the skewness of the residual sampling space additionally covary with yearly temperatures, non-uniformities in secondary feedbacks are furthermore inferred through their manifestations within the monthly temperature distributions.

Verification results across all ESMs show the emulator altogether reproducing the mean monthly temperature response, as well as conserving temporal and spatial correlation patterns and regional-scale temperature distributions up to a degree sensible to its simplicity. To further assess how well the emulator is able to represent non-uniformities in the monthly temperature response arising from secondary biophysical feedbacks, we compare its performance to that of a simple physically informed model built on biophysical information. The emulator overall reproduces the cdfs of the actual ESM just as well as, and in most cases even better than, the physically informed model, evidencing the validity of such a statistical approach in inferring temperature distributions and thus the uncertainty due to natural variability within temperature realizations. Given that the uncertainty due to natural variability is a property intrinsic to climate models and largely irreducible

In this study, MESMER-M was only trained on SSP5-8.5 climate scenario runs so as to demonstrate its performance over the extreme spectrum of climate response types. A further step would be to investigate the inter-scenario applicability of MESMER-M, and this has already been done for MESMER with satisfactory results

This study demonstrates the advantage of constructing modular emulators such that the emulator framework can be extended according to the area of application. Additional module developments which increase the impact relevance of the emulations and improve the fidelity in global and regional representation of monthly temperatures under different climate scenarios should be given priority. A module that comes to mind would be one representing changes in land cover, such as de/afforestation, which has been
historically assessed to have biophysical impacts of a similar magnitude on regional climate as the concomitant increase in GHGs

Another modular development could include an explicit representation of the main modes of climate variability, such as ENSO, so as to strengthen MESMER-M's inter-annual variability representation. Since these modes are potentially coupled to the overall GHG-induced temperature response however, such an inclusion would be more complicated. One possible approach could be to introduce soil moisture as an additional variable term and investigate its lag correlations to monthly temperature variabilities. Alternatively we could explore building upon existing approaches such as the one of

The original MESMER is publicly available on GitHub (

The CMIP6 data are available from the public CMIP archive at

QL, CFS, and SIS identified the need to extend MESMER's framework by a monthly downscaling module. SN designed the monthly downscaling module with support and guidance from QL and LB. SN led the analysis and drafted the text with help from QL in developing the storyline. All authors contributed to interpreting results and streamlining the text.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Lukas Gudmundsson, Joel Zeder, and Christoph Frei for their invaluable statistical insight into the development and analysis of the emulator modules. Finally, we thank the climate modelling groups listed in Table

This research has been supported by the Bundesministerium für Bildung und Forschung and the Deutsches Zentrum für Luft- und Raumfahrt as part of AXIS, an ERANET initiated by JPI Climate (grant no. 01LS1905A), the European Commission, Horizon 2020 Framework Programme (AXIS (grant no. 776608)), and the European Research Council, H2020 (MESMER-X (grant no. 964013)).

This paper was edited by Gabriele Messori and reviewed by three anonymous referees.