Statistical bias correction (BC) is a widely used tool to post-process climate model biases in heat-stress impact studies, which are often based on the indices calculated from multiple dependent variables. This study compares four BC methods (three univariate and one multivariate) with two correction strategies (direct and indirect) for adjusting two heat-stress indices with different dependencies on temperature and relative humidity using multiple regional climate model simulations over South Korea. It would be helpful for reducing the ambiguity involved in the practical application of BC for climate modeling and end-user communities. Our results demonstrate that the multivariate approach can improve the corrected inter-variable dependence, which benefits the indirect correction of heat-stress indices depending on the adjustment of individual components, especially those indices relying equally on multiple drivers. On the other hand, the direct correction of multivariate indices using the quantile delta mapping univariate approach can also produce a comparable performance in the corrected heat-stress indices. However, our results also indicate that attention should be paid to the non-stationarity of bias brought by climate sensitivity in the modeled data, which may affect the bias-corrected results unsystematically. Careful interpretation of the correction process is required for an accurate heat-stress impact assessment.

Climate models unavoidably produce biased representations of the simulated
variables, and it is more problematic not to know how these biases translate
into the modeled response to external forcings such as the CO

A variety of BC methods with different levels of complexity and performance have been developed and implemented for both global and regional climate simulations (François et al., 2020; Teutschbein and Seibert, 2012; Kim et al., 2021). Generally, their aim is to correct certain features in the target's distribution, such as the simple statistics of the mean (linear scaling, LS; Teutschbein and Seibert, 2012) and variance (variance scaling, VA; Chen and Dudhia, 2001) or the more advanced quantiles (quantile mapping, QM) for adjusting the entire distribution by parametric (PQM) or empirical (EQM) transformation (Switanek et al., 2017; Gudmundsson et al., 2012). Continuous efforts have also been made to eliminate the drawbacks of existing BC approaches. Quantile delta mapping (QDM; Cannon et al., 2015), for example, is designed to explicitly preserve the long-term trend that may be artificially distorted in QM. Nonetheless, all the approaches described above correct bias in a univariate context. They cannot adjust the inter-variable dependencies, which are important for representing physical processes and estimating compound hazards. It was not until quite recently that the multivariate BC technique was considered and proposed (e.g., Bárdossy and Pegram, 2012; Cannon, 2018; Mehrotra and Sharma, 2015, 2016; Robin et al., 2019; Vrac, 2018), and they have been applied to various climate change impact studies (Zscheischler et al., 2019; Qiu et al., 2022; Meyer et al., 2019; Dieng et al., 2022). Although it is intuitively recognized that multivariate BC could be more suitable for dealing with climate variables characterized by a strong physical linkage in nature, an unambiguous assessment of univariate and multivariate BC methods is essential to understand the potential limitations of individual methods and to avoid misleading application.

Despite the BC method used, when correcting the multivariate indices representing compound hazards, the index can also be either directly adjusted using BC techniques, as in the majority of studies (Schwingshackl et al., 2021; Kang et al., 2019; Coffel et al., 2017), or indirectly corrected so that its components are individually corrected prior to the index calculation (Casanueva et al., 2019; Zscheischler et al., 2019). In this regard, there have been few systematic comparisons of how the direct and indirect use of univariate and multivariate BC methods, respectively, affect the multivariate indices' adjustment. Only Casanueva et al. (2018) tested the direct and indirect use of EQM in correcting the multivariate fire danger index, while several studies compared the indirect use of univariate and multivariate BC methods in impact assessments (e.g., Cannon, 2018; François et al., 2020; Zscheischler et al., 2019). Although Casanueva et al. (2018) pointed out that the direct application of EQM outperforms the indirect one, how it compares with the multivariate BC method remains unknown. Therefore, there is room for a more comprehensive assessment of the effects of univariate and multivariate BC under direct and indirect application strategies, which may vary along with the dependence structure of the multivariate indices and may affect correction efficiency since the multivariate approach has a higher computation cost.

In this study, we investigate the effects of different BC methods
(univariate vs. multivariate) applied with different strategies (direct vs.
indirect) on the statistical adjustment of heat-stress indices that
represent the combined effect of human exposure to temperature (

The 3-hourly data used for BC are the near-surface

Two popular heat-stress indices are evaluated in this study: WBGT (ACSM,
1984) and AT (Steadman, 1984). There are several different formulations for
both indices, and we employ the versions using only

Contours lines of equal-level heat-stress indicators: WBGT (red) and AT (blue).

All 3-hourly data are used for the BC procedure, but the daily maximum of
WBGT/AT during summer (June–July–August, JJA), together with the

The principle of BC is to use observations to calibrate the simulated output
(e.g., climate model output). In this study, four BC methods are applied,
including LS, VA, QDM, and a multivariate BC algorithm with an

During the BC process, univariate BC methods are applied to

As an illustrative example, Fig. 2 provides the quantile–quantile plots of
the WBGT corrected using various approaches for one grid point from one RCM
during 1979–1996. ORI shows a cold bias inherited from the driving global climate model (GCM; M.-K. Kim
et al., 2020), leading to a notable underestimation over the entire
distribution compared to ERA5. For the direct correction of WBGT, LS reduces
the cold bias, but with a non-negligible overestimation, especially in the
range of 30–32.5

For cross-validation of the BC methods, we use a historical period of 1979–2014 and adopt the “jack-knifing” split-sample test that first splits the historical period into two halves and uses one part for calibration and the other for validation, and then we reverse the two parts systematically (Refsgaard et al., 2014). Specifically, the 18-year period of 1979–1996 is first set as the calibration part with the period of 1997–2014 as the validation part; then, the periods are swapped using 1997–2014 for calibration and 1979–1996 for validation. For each test, the ERA5 data in the corresponding calibration period are used to obtain the correcting algorithms that are then applied to the validation period. To distinguish the two tests, the one using 1997–2014 for calibration is marked with a letter “r”, standing for “reverse”, and the default is the one using 1979–1996 for calibration. The statistical metrics used for evaluation are noted in the Supplement.

The quantile–quantile plots of ORI (blue) and data after BC (red)
adjusted by

Figure 3 presents the performance of WBGT and AT in ORI simulations.
Substantial bias can be seen across the entire distribution of the
heat-stress indices. For 1979–1996, both WBGT and AT generally exhibit a
cold bias covering the whole domain. There is more bias in the bottom and
top 15 % of the distribution, but the bias of WBGT is more skewed to the
left tail, whereas that of AT is more skewed to the right. Taking the
90th percentile (90p) as an indicator representing heat events, Fig. 3b
and c show a greater cold bias in the low-elevation regions (e.g., basins
in southeastern Korea), where an RCM with a spatial resolution of
around 20 km is highly unlikely to capture the local high temperatures owing
to an inadequate representation of topography (Qiu et al., 2020). For
1997–2014, however, i.e., the next 18 years within the historical period,
the cold bias is systematically reduced, with a certain area even displaying a
slight warm bias. This can be explained by the high climate sensitivity in
the driving GCM (i.e., UKESM; Zelinka et al., 2020), leading to a different
level of warming between the simulations and ERA5 during this historical
period. According to Fig. 3d and e, the model shows around 0.5

Figure 4 shows the median absolute error (MAE; Eq. S2 in the Supplement) over South Korea
(land only) in all RCMs after BC using different methods. Two
indicators – the 90p and the mean of monthly maximum (MMX) – are selected to
represent extreme heat events. The diamonds standing for ENS are marked for
ease of comparison. During the calibration period, LS, as the simplest BC
approach used in this study, shows the largest bias among the four methods.
For direct correction of WBGT, the other four methods have a reasonable MAE
of less than 0.25

The MAE over South Korea (land only) for the calibration period
(1979–1996,

Similar results are found in AT and AT

To assess the quantitative differences in the marginal distributions
corrected by different BC methods, Fig. 5a, b, e, and f present the maximum
differences calculated from the Kolmogorov–Smirnov (K–S) test (Eq. S3)
between the observed (i.e., ERA5) and bias-corrected empirical cumulative
distribution functions (CDFs). A smaller value stands for a better
correction output. For the direct correction, QDM and MBCn show better
performances than LS and VA across all the indices and matrices considered.
However, for indirect correction, MBCn shows its unique advantage in the
multivariate index, depending unequally on the components (i.e., WBGT

K–S test

Figure 6 investigates the spatial distribution of bias in the QDM and MBCn
corrections, using the 90p as an example for WBGT and AT. A similar pattern
can also be seen in the case of MMX (Fig. S3). For the calibration period,
the biases are well reduced to less than 0.5

Spatial maps of the bias in the 90p during the calibration period
(C) and validation (V) period corrected by QDM and MBCn in ENS. The first
and third rows are the directly corrected WBGT and AT. The second and
fourth rows are the WBGT

Same as Fig. 6 but for the reverse test.

Spatial patterns of

On the other hand, the spatial maps of bias also clearly demonstrate the
superiority of MBCn for the indirect correction of the heat-stress indices
over the entire domain in both the calibration and validation periods. Since
the heat-stress indices are functions of

Previous studies have challenged the applicability of univariate BC for
adjusting individual components of multivariate hazard indicators and proved
the benefit of multivariate BC in compound event evaluations (François
et al., 2020; Zscheischler et al., 2019). Our study also demonstrates MBCn's
advantage in correcting the interdependence of the relevant variables, which
results in a substantial improvement in the indirect BC of heat-stress
indices. Such an advantage is more prominent for the index relying more
equally on the composing variables (e.g., WBGT), which was also pointed out
by Zscheischler et al. (2019). However, to the best of our knowledge, no
study has been conducted to compare the multivariate BC methods with the
direct application of univariate BC on multivariate indices. Our results
show that QDM applied directly to the multivariate indices can provide a
similar result to MBCn in heat-stress assessments, while MBCn additionally
provides a more reasonable underlying inter-variable dependence. In this
regard, if only considering heat-stress indices, the more
computationally efficient direct QDM correction may be sufficient for the
impact assessment. However, if the relationship between

On the other hand, regarding the study of heat stress under future warming that is not evaluated in this study, more aspects should be considered. This study uses historical climate simulations comprising non-stationarity combined with two “jack-knifing” split-sample tests. It is found that the non-stationarity of bias in the modeled heat-stress indices, as combined effects of internal climate variability and climate model sensitivity, can significantly affect the BC output. Teutschbein and Seibert (2012) once suggested that the more advanced correction methods (e.g., QM) are more robust to a non-stationary bias compared to the simpler ones (e.g., LS), but our result shows no significant difference. In fact, lying under the fundamental assumption of stationary bias, current BC approaches may not be able to provide a suitable solution to this issue. Therefore, a case-by-case evaluation of BC approaches for a certain climate model and study area, as well as a clear understanding of the relevant processes including the uncertainties underlying original model data, is required for reliable data post-processing using BC methods. Meanwhile, for the continuous development in future projections of multivariate heat-stress indices, there are also potential problems worth investigating. For example, we may need to consider whether there is any substantial change in the modeled multivariate dependence structure, which is also highly likely under global warming (Singh et al., 2021; Hao et al., 2019). Although both QDM and MBCn are supposed to preserve the simulated trend in the corrected variables, MBCn, as well as other multivariate BC methods, does not consider the change in the multivariate relationships. In this regard, the direct correction of QDM may outperform MBCn. However, as direct correction of QDM may discard the physical consistency in the input variables, in terms of both the variable representation and the projected change, it can hide the compensating bias (Schwingshackl et al., 2021) and thus introduce additional uncertainty in climate change signal (Casanueva et al., 2018) in the multivariate heat-stress indices. To solve these problems, a deeper understanding and continuous enhancement in climate models, particularly for the uncertainty and credibility of projections, may be prerequisites for better evaluation and application of the statistical procedures (i.e., BC approaches).

Near-surface temperature and relative humidity data from the CORDEX-East
domain downscaling product used in this study are archived in the
institutional repository at

The supplement related to this article is available online at:

ESI and SKM conceptualized the study. LQ was responsible for investigation, formal analysis, methodology, software, and visualization. ESI supervised all LQ's work and provided investigations. LQ and ESI wrote the original draft, and ESI and SKM reviewed and edited it. LQ, SKM, YHK, DHC, SWS, JBA, ECC, and YHB created the data used in the study.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the reviewers, who have provided valuable comments for the study.

This study was supported by the Korea Meteorological Administration Research and Development Program under grant no. KMI2021-00912.

This paper was edited by Sagnik Dey and reviewed by Nicholas Osborne and one anonymous referee.