These authors contributed equally to this work.

Bias correction and statistical downscaling are now regularly applied to climate simulations to make then more usable for impact models and studies. Over the last few years, various methods were developed to account for multivariate – inter-site or inter-variable – properties in addition to more usual univariate ones. Among such methods, temporal properties are either neglected or specifically accounted for, i.e. differently from the other properties. In this study, we propose a new multivariate approach called “time-shifted multivariate bias correction” (TSMBC), which aims to correct the temporal dependency in addition to the other marginal and multivariate aspects. TSMBC relies on considering the initial variables at various times (i.e. lags) as additional variables to be corrected. Hence, temporal dependencies (e.g. auto-correlations) to be corrected are viewed as inter-variable dependencies to be adjusted and an existing multivariate bias correction (MBC) method can then be used to answer this need. This approach is first applied and evaluated on synthetic data from a vector auto-regressive (VAR) process. In a second evaluation, we work in a “perfect model” context where a regional climate model (RCM) plays the role of the (pseudo-)observations, and where its forcing global climate model (GCM) is the model to be downscaled or bias corrected. For both evaluations, the results show a large reduction of the biases in the temporal properties, while inter-variable and spatial dependence structures are still correctly adjusted. However, increasing the number of lags too much does not necessarily improve the temporal properties, and an overly strong increase in the number of dimensions of the dataset to be corrected can even imply some potential instability in the adjusted and/or downscaled results, calling for a reasoned use of this approach for large datasets.

Climate and Earth system models (ESMs) and their simulations are the
main physical tools to investigate the potential future evolutions of the
climate system

Those issues make that impact models (such as for hydrology, energy,
environment, etc.) cannot directly employ the climate simulations as
input

Over the last two decades, many such post-processing methods were developed, either in a “perfect prognosis” (PP) context, generally for
downscaling (DS), or in a “model output statistics” (MOS) one, generally for bias correction (BC) – see e.g.

However, the univariate correction of simulations (i.e. one variable at a time and one site at a time) may not be enough. Indeed, the use of several
1D corrections separately for different physical variables and/or sites will not correct the dependencies between them

One main conclusion provided in the multivariate bias correction (MBC) comparison study by

All those approaches, although different, share the fact that they try
to correct temporal properties by a (parametric or non-parametric) model that is specific to the variable “time”. In other words, they separate time from the other variables (variables at various locations) of interest. However, one can wonder if there is a real need for such a specificity. Indeed, let us take an example in one dimension for the sake of clarity (this can be easily generalized to

To do so, the rest of the paper is organized as follows. Section

To apply any (M)BC method, it is necessary to dispose of a dataset of reference – i.e. it is supposed to be as close as possible to real observed climate – and a dataset of simulations (e.g. stemming from a GCM) that are biased with respect to the reference dataset.

Here, the climate simulations to be corrected are daily temperature and precipitation times series over the south-east of France, for the time period 1951–2010, extracted at a

Left panel: map of elevation (in m) over France. The region of interest lies in the south-east box. Right panels: mean summer (JJA) temperature (in

Regarding the reference dataset, regional climate simulations are used in this study, instead of observational or reanalysis data. Those are EURO-CORDEX daily temperature and precipitation from the KNMI-RACMO22E regional climate model

The GCM data are then interpolated with a nearest-neighbour method to
the

In addition to the evaluations that will be done based on those climate simulations, a preliminary analysis will first be performed on synthetic data, i.e. data artificially generated from statistical models. Here, a VAR process is employed. A VAR process is a
multivariate auto-regressive (AR) process (i.e. allowing multivariate data)
modelling the statistical link between the components of a vector (i.e.
multivariate data) when they change in time. In the following, a VAR is used to generate multivariate time series

To generate such synthetic data and analyse them in a comprehensive way, the dimension

The main philosophy of the proposed time-shifted multivariate bias
correction (TSMBC) approach has been briefly introduced (with lag 1) in
Eq. (

Such a transformation can be made to create

In this study, we propose a method based on a reconstruction

In this example, the bold rows of the matrix

Note that a reconstruction “by column” could also be performed: each column of

Finally, because TSMBC uses an underlying MBC, potentially any MBC
method can be used, as MBCn

In this section we test our TSMBC method on synthetic data, generated from two VAR processes (see Sect.

Because the reconstruction step preserves the dependence structure, we
propose to test which part of the correction is due to the underlying method
(here dOTC), and which part is due to the reconstruction. To do so, a second
underlying bias correction method is then used as a benchmark. It corresponds to a very naive method: the correction is randomly drawn from the reference
dataset, i.e. for any

We fix the number of lags

To measure the similarity of the corrections from different starting
rows, we compute the matrix of Pearson correlations between the pair

The first row

From this experiment we can conclude first that the choice of the starting row has only a very marginal influence on the correction. Therefore, from now on, we use the integer part of

In this section the starting row is fixed at

As for the previous sub-section, the correlations between corrections, reference and biased dataset are computed and represented in
Fig.

Furthermore, we have added in Fig.

The (cross-)auto-correlations between the 2 dimensions for various lags are also given in Fig.

Generally, from the synthetic VAR dataset, we can see the ability of the TSMBC approach to correct the (cross-)auto-correlations. The choice of the starting row has little influence on the final corrections, and we fix it now at

We now apply the TSMBC method with the underlying dOTC method to the bias correction and downscaling of the IPSL GCM simulations with respect to the RCM simulations taken as references. Following the strategy proposed by

Each variable and grid point are corrected independently. This
approach will be referred to as “L1V” (local 1 variable). The BC method employed here is dOTC in its univariate version (when

The dependence between temperature and precipitation (i.e.
inter-variable dependence) is taken into account in the correction, but not the spatial dependence. This approach is denoted “L2V” (local 2 variables) and employs the bivariate version of dOTC (when

The spatial dependence is corrected, but not the relations between
temperature and precipitation. This approach is denoted “S1V” (spatial 1 variable) and uses dOTC in a 16 (longitude)

All dependencies (i.e. inter-variable and spatial) are corrected.
This approach is denoted “S2V” (spatial 2 variables), and dOTC has thus a 2 (variables)

Furthermore, for each of these approaches, we apply TSMBC to account for various lags, up to some maximum lags: 0 (i.e. corresponding to dOTC,
without any lag), 5 and 10 d lags, denoted dOTC, TSMBC-5 and TSMBC-10,
respectively. Hence, we have finally 12 correction approaches, with dimensions varying from 1 (dOTC without any kind of dependence) to

Summary of the dimensions of the bias correction for each method used in Sect.

Recall that only the results for summer are given in the rest of this article (winter results are provided in the Supplement); the calibration period is 1951–1980, and the validation and projection period is 1981–2010.

We start by controlling the ability of the different methods to reduce the bias of the first two statistical moments: the mean (noted

Boxplots of bias reduction in mean and standard deviation (

Boxplots of bias reduction in lag-2 (cross-)-auto-correlations (

This criterion lives in the interval [

The same boxplots are now represented in Fig.

Map of lag-1 (two first rows) and lag-4 (two last rows) auto-correlations of precipitation in summer. The first and third rows are for the calibration period, the second and fourth rows are for the projection period. The first (and second, third, fourth and last, respectively) column gives the maps of auto-correlations of the RCM to be corrected (the GCM, the correction dOTC, the correction TSMBC-5 and the correction TSMBC-10 with method L2V, respectively). A key point of this figure is to compare the evolution between calibration and projection periods for the RCM, GCM and corrections. The evolution of the TSMBC corrections is similar to the GCM evolution, which is different from the RCM evolution, leading to a failure of the correction in projection period.

Figure

Globally, TSMBC is able to reduce biases in means and standard deviations as well as dOTC but clearly improves the corrections of the auto-correlations. We now propose to further study the dependence structure of the corrections brought by TSMBC.

The present sub-section targets the evaluation of the TSMBC corrections in terms of spatial structure of auto-correlations between variables and grid points. This requires a new tool: the

Auto-correlogram (lag 1) between temperatures and precipitation in calibration period in summer for

In order to evaluate spatial dependencies present in a univariate sample, correlograms (i.e. correlations expressed as function of the distance) are classically used

Bias reduction of dependence (

Figure

As the Wasserstein metric is sensitive to the scale of the multivariate data (here, the DCP sets) it is applied to, two normalizations of the DCP sets are proposed before the computation of the

Same as Fig.

Starting with the first normalization (for each method and lag separately), allowing us to compare only the pattern of the DPC sets, in Fig.

Continuing with the

The goal of bias correction (BC) is to transform biased climate simulations in order to make their statistical properties more similar to those from reference data. Over the last decades, many univariate BC methods were
developed and applied, working on one climate variable at a time and one
location at a time. Over the last few years, various multivariate bias
correction (MBC) methods were also designed to correct not only some marginal
properties of the simulations (e.g. means, variances, distributions) but also their dependencies (e.g. correlations), either in a multivariate context, inter-site context or both. Some methods were even specifically developed to adjust the temporal properties

From the synthetic data experiment, a comparison with a “reasonably naive” multivariate bias correction method, RBC, based on random sampling, has been proposed. The results showed the following:

TSMBC(dOTC) provides a clear improvement compared to TSMBC(RBC).

The choice of the starting row only has a marginal influence on the
corrections. In the case of a starting row

For a relatively low number

Those first conclusions indicate some robustness of the proposed TSMBC methodology that, despite some choices to make by the user (starting row, number of lags to include), provides stable corrections.

In order to evaluate the results in a fully multivariate manner (i.e. inter-variable, inter-site and temporal aspects), a new statistical criterion has been proposed. It is based on the Wasserstein distance between the set of distance–correlation pairs (DCP) from references and that of a dataset (from corrections or simulations). This distance can be computed on lagged data, using multiple variables and at different locations, hence providing assessments of cross-auto-correlations, generalizing the traditional correlogram tool.

The results obtained by applying TSMBC to climate simulations provided the following conclusions:

In terms of means and standard deviations, for both temperature and
precipitation, the inclusion of lagged data does not strongly modify the results of the dOTC correction method. Although some evidence of
degradation might appear when the number of lags increases (e.g. for

This is also mostly the case for auto-correlation bias reductions

Moreover, the main spatio-temporal patterns of the TSMBC results are globally improving those from the raw GCM.

However, biases in the intensities of the (inter-variable, inter-site
or temporal) correlations might remain. This is typically related to very small differences between two Wasserstein distances very close to zero: if the raw simulations already have a DCP set close to the reference, its Wasserstein distance will be near zero. Therefore, the relative reduction of bias

Finally, if the TSMBC methodology seems to reasonably adjust temporal (cross-auto-)correlations, while still performing well on multivariate properties, when the number of lags increases (e.g. from 5 to 10 d), the gain in the quality of the corrections is not obvious and the latter can even be degraded. It is thus required to limit the temporal constraints to a few time steps, depending on the variable of interest. This would avoid having to apply the MBC method in an overly high dimensional context and then allow robust results.

Despite its promising results, the TSMBC approach can be further investigated and improved. For example, in the present study, only the dOTC multivariate method was used as a correction technique. Other MBC methods exist. Hence, it would be interesting to test how those alternative MBCs – such as
“R

Note also that the chosen lag in TSMBC should be adapted to the type of variable and the area. For example, taking 3 d (

In addition, when dealing with precipitation, the rainfall occurrence is not treated differently from the non-occurrence (dry days) by the TSMBC approach proposed here (i.e. using dOTC as underlying MBC method). However, the sequences of dry days and wet days can bear a major part of the auto-correlation information. Hence, it could be interesting to account for this specific aspect of precipitation when performing the underlying MBC method.

Moreover, some adjustment methods were designed to account specifically for the correction of temporal properties

Finally, the Wasserstein cross-auto-correlation-based metric introduced in this study could be used more generally to compare various datasets and/or assess their diverse properties with respect to a reference. It can then be useful to make evaluations of climate simulations (adjusted or not) in a more holistic way.

A bias correction method is classically defined as a map

The dOTC method is given by the

From the probability distribution

The CMIP5 and CORDEX databases are freely available. Source codes of TSMBC are freely available in the R/Python package SBCK under the GNU-GPL3 license (

The supplement related to this article is available online at:

MV and YR had the idea of the method together. They designed the study and the experiments together. YR made the computations and plots. YR and MV jointly analysed the results. MV wrote most of the article with inputs from YR.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We acknowledge the World Climate Research Program's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups for producing and making available their model output. For CMIP the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.

We acknowledge the World Climate Research Programme's Working Group on Regional Climate and the Working Group on Coupled Modelling, the former coordinating body of CORDEX and responsible panel for CMIP5. We also thank the climate modelling groups for producing and making available their model output. We also acknowledge the Earth System Grid Federation infrastructure, an international effort led by the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison, the European Network for Earth System Modelling, and other partners in the Global Organisation for Earth System Science Portals (GO-ESSP).

Mathieu Vrac has been supported by the CoCliServ project. Mathieu Vrac and Yoann Robin have been supported by the EUPHEME project. Both CoCliServ and EUPHEME are part of ERA4CS, an ERA-NET initiative by JPI Climate, cofunded by the European Union (grant no. 690462). Mathieu Vrac has also been supported C3S (grant no. 428J). Yoann Robin has also been supported by project C3S 62 (Prototype Extreme Events and Attribution Service, grant no. 2019/S 102-247355).

This paper was edited by Daniel Kirk-Davidoff and reviewed by four anonymous referees.