Vulnerability of European ecosystems to two compound dry and hot summers in 2018 and 2019

. In 2018 and 2019, central Europe was affected by two consecutive extreme dry and hot summers (DH18 and DH19). The DH18 event had severe impacts on ecosystems and likely affected vegetation activity in the subsequent year, for example through depletion of carbon reserves or damage from drought. Such legacies from drought and heat stress can further increase vegetation susceptibility to additional hazards. Temporally compound extremes such as DH18 and DH19 can, therefore, result in an ampliﬁcation of impacts due to preconditioning effects of past disturbance legacies. 5 Here, we evaluate how these two consecutive extreme summers impacted was mainly explained by the preconditioning role of DH18 on the impacts of DH19 due to interannual legacy effects and possibly by increased susceptibility to biotic disturbances, which are also promoted by warm conditions. Dry and hot summers are expected to become more frequent in the coming decades posing a major threat to the stability 20 of European forests. We show that state-of-the-art process based models could not represent the decline in response to DH19 because they missed the interannual legacy effects from DH18 impacts. These gaps may result in an overestimation of the resilience and stability of temperate ecosystems in future model projections.


25
Extreme dry and hot summers in western and central Europe have become more frequent over the past decades (Coumou and Rahmstorf, 2012;Seneviratne et al., 2014), a trend that is expected to continue as global mean temperatures rise (Barriopedro et al., 2011). Hot extremes in Europe are promoted by changes in atmospheric circulation (Coumou et al., 2015;Drouard et al., 2019) and amplified by strong feedbacks between the land-surface and the atmosphere, being therefore also associated with severe droughts (Miralles et al., 2014;Samaniego et al., 2018), i.e. compound dry and hot events (DH). 30 In Europe, DH events have usually strong negative impacts on ecosystems, such as reduced ecosystem productivity Bastos et al., 2020b). After severe drought and heat stress, plant recovery can be lagged, for example due to reduced growth, or non-reversible losses in hydraulic conductance or carbon reserve depletion (Ruehr et al., 2019). This, in turn may increase vulnerability to another DH, if it occurs before complete recovery. Repeated droughts have been linked to increased forest vulnerability in the northern mid-latitudes, although with variable responses . Impaired 35 functioning during the recovery period can additionally increase the hazard of subsequent disturbances, e.g. insect outbreaks (Rouault et al., 2006). However, reductions in leaf area, increases in root allocation (McDowell et al., 2008) or reduced growth, by reducing evaporative tissue and enhancig water uptake capacity, could also confer an advantage to subsequent droughts (Gessler et al., 2020). It remains unclear whether the increased vulnerability to a subsequent drought can be explained by compounding hazards (e.g. accumulated water-deficits or compound heat) or modulating effects due to vegetation responses to 40 the first event.
In Europe, the summer of 2018 was the hottest since 1500 (Sousa et al., 2020) and associated with an unprecedented area affected by drought (Albergel et al., 2019;Bastos et al., 2020a). This DH event resulted in decreases in ecosystem productivity by up to 50% in central Europe (Bastos et al., 2020a;Buras et al., 2019) and crop yield losses (Beillouin et al., 2020). Part of the central European region affected by DH18 registered another extremely hot and dry summer in 2019 (Boergens et al.,45 2020; Sousa et al., 2020).
From a hydrometeorological perspective, each of the dry and hot summers in 2018 and 2019 (DH18 and DH19, respectively) can be considered a multivariate compound event in that both high temperatures and strong drought conditions were observed (Zscheischler and Fischer, 2020). Taken together, they can be considered a temporally compound event . For example, Boergens et al. (2020) have shown that while soil-moisture deficits in summer 2019 were not as 50 pronounced as in 2018, total water storage was lower in 2019 due to the water storage deficit resulting from the 2018 event.
Given the unprecedented magnitude of DH18, it is likely that at least some ecosystems had not yet fully recovered in 2019.
Therefore, from an ecological perspective, DH19 could additionally be considered a preconditioned compound event, where the impact of DH18 may affect the response to DH19 (Fig. 1). Finally, vulnerability to DH events can further be modulated by long-term environmental changes: water-savings from reduced stomatal conductance should attenuate drought stress (Peters 55 et al., 2018), but concurrent decrease in evapotranspirative cooling along with "hotter droughts" may amplify heat stress Obermeier et al., 2018). compouding atmospheric drivers (synoptic patterns, preceding climate anomalies, land-atmosphere interactions). The DH18 impacts were preconditioned by seasonal legacy effects in ecosystem functioning from a sunny and warm spring. We hypothesise that legacies from the DH18 event also acted as a precondition of the response to DH19. Further modulating effects (not shown) include impacts of anthropogenic climate change on drivers, hazards and vegetation condition and of land-cover in modulating responses to hazards.
Separating the modulating effects controlled by vegetation responses to global change or by legacies from past disturbances (Kannenberg et al., 2020) and seasonal legacy effects (Buermann et al., 2018) in observations is problematic as it requires considering the compounding effects of multiple drivers (e.g., synergistic effects of heat and drought stress) and separating the role 60 of seasonal and inter-annual legacies both in physical variables (e.g., soil-moisture depletion) and in vegetation vulnerability to those drivers. This can be done by designing counter-factual scenarios to force process-based models, as recently done to evaluate seasonal legacy effects of hot and dry springs (Lian et al., 2020;Bastos et al., 2020a). However, it has been argued that Earth System models fail at modelling woody biomass trajectories following droughts (Anderegg et al., 2015), so that they might miss inter-annual legacy effects from DH events, although no simulations designed to isolate the individual impact of 65 drought over subsequent years have been performed. Alternatively, statistical models can be used to separate such effects based on observational data (Chan et al., 2021).
Using both remote-sensing data and an update of the simulations by Bastos et al. (2020a), we attempt to separate these different effects, namely: how the combination of hot and dry conditions affected the vulnerability of ecosystems to the two events (multivariate compound event), how the repetition of a dry and hot summer affected the response to DH19 (temporally 70 compound event) and how inter-annual legacy effects due to impacts of DH18 affected ecosystem vulnerability to DH19 (preconditioned compound event). We first use a statistical modelling approach to evaluate whether signs of increased vegetation vulnerability to DH18 and DH19 can be found and to attribute changes in vulnerability to inter-annual legacies and other modulating effects. We then compare observation-based results to updated simulations by state-of-the-art land-surface models and dynamic global vegetation models (for simplicity referred to as LSMs) designed to isolate the impacts of DH18 and their 75 legacy effects (Bastos et al., 2020a).

Climate variables
In ecological studies, drought is better characterized by soil-moisture anomalies i.e. agricultural drought (Sherriff et al., 2011;Seneviratne et al., 2012;Samaniego et al., 2018) than atmospheric drought indices. We therefore base our drought assessment 80 on two complementary soil-moisture datasets. The first is the observation-based soil moisture data obtained from SoMo.ml (Sungmin and Orth, 2021), used as reference in this study, and the second, for comparison with SoMo.ml, is given by ERA5 volumetric soil-water content (Hersbach et al., 2020).
The SoMo.ml data are generated using a Long Short-Term Memory neural network model trained with meteorological forcing from ERA5 and land surface characteristics as inputs and global in-situ soil moisture measurements (Dorigo et al.,85 2011; Zeri, 2020) as target variables. The data cover soil-moisture in the first 50cm of the soil and are available at 0.25°lat/lon resolution and daily time-steps for the period 2000-2019. We remapped the fields to the finer resolution of the MODIS grid and aggregated the data to monthly means. We then subtracted the mean seasonal cycle and long-term linear trend, and divided by the corresponding standard deviation to obtain standardized soil-moisture anomalies (SM anom ).
Monthly temperature and volumetric soil-water content from the ECMWF ERA5 Reanalysis were obtained from the Coper-90 nicus Climate Change Service at 0.25°lat/lon resolution (Hersbach et al., 2020) at monthly time-steps and selected for the period 2000-2019 (common with SoMo.ml) and remapped to the finer resolution of the MODIS grid using conservative remapping. Standardized anomalies were calculated as described for SM anom for ERA5 temperature and soil-moisture fields (T anom ,SM ERA5 anom ). Soil-moisture anomalies from ERA5 in layers 1-2 (top 28cm) are used for comparison of drought conditions with those estimated by SoMo.ml, although the two datasets are not fully independent.

Vegetation and soil data
We used the 16-day Enhanced vegetation Index (EVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor from the MOD13C1 CMG product. The MOD13C1 CMG provide continuous cloud-free spatial composites from 1km data projected on a 0.05°lat/lon grid (Didan et al., 2015), and were selected for the period 2001-2019. Standardized EVI anomalies (EV I anom ) were calculated following the same approach as for climate variables. The standardization allows com-100 paring the relative magnitude of anomalies for pixels with distinct temporal variability patterns and with vegetation productivity simulated by LSMs, which have different physical units.
We used land-cover distribution in 2018 from the ESA Climate Change Initiative land-cover (Kirches et al., 2014) (CCI-LC). The data are originally provided in land-cover classes at 300m spatial resolution and were converted to fractional cover at 0.05°lat/lon resolution for forest, grassland, crop classes using the LC-CCI user-tool. 105 We used isohydricity fields from global satellite measurements from Konings et al. (Konings et al., 2017) at 1°lat/lon resolution. Anisohydric plants (low isohydricity) show weak regulation of stomatal opening, and prioritize carbon assimilation over water savings during droughts. High isohydric plants show strong stomatal regulation of productivity and thereby preserve water at the cost of carbon assimilation during drought.
We use soil Available Water Capacity (AWC) from Ballabio et al. (2016) and Panagos et al. (2012), which used the Land 110 Use and Cover Area frame Statistical survey (LUCAS) topsoil database to map soil properties at continental scale.

Drought characterization
We use the observation-based SoMo.ml as a reference dataset to define agricultural drought conditions. Regions with average SM anom below −1σ (Seneviratne et al., 2012) during summer (June, July and August, JJA) are considered drought-affected areas during the DH events. Then, a regional domain affected by both DH18 and DH19 events is selected to evaluate the 125 impacts of two consecutive DH events. Within this region most pixels had negative SM anom and the majority registered SM anom < −1.5σ, but they differ in the magnitude of agricultural drought in DH19. This allows comparing responses across pixels for different combinations of stress between DH18 and DH19. Since we are interested in evaluating how recovery from DH18 affected impacts of DH19, we limit our analysis to pixels with negative EV I anom in DH18.

DH18 and DH19 impact characterization
To characterize different response types to DH18 and DH19, we group pixels using unsupervised clustering of EVI during the two extreme summers. Using an unsupervised method allows avoiding making assumptions about the magnitude of impacts or the trajectory between DH18 and DH19 (DH18→DH19) when grouping pixels. For this, we applied a K-means cluster analysis (Hamerly and Elkan, 2003) using two features, corresponding to the EV I anom fields in DH18 and DH19. Four clusters 135 captured the most significant differences in the impacts of DH18 and corresponding DH18→DH19 responses: moderate/strong DH18 impacts and moderate/strong impacts by DH19. These clusters were then used to evaluate how LSMs simulate the summer GP P anom and SM anom .

Detecting increased vulnerability to drought and heat stress
To better understand the impacts of the two events, we frame them as a combination of temporally and preconditioning com-140 pound events ( Fig. 1): a sequence of two DH events, whose impacts may be preconditioned by ecosystem vulnerability to DH, especially in the case of DH19. Vulnerability to DH is defined as the impact of the physical hazard (hot and dry conditions) on vegetation and assessed by remotely-sensed EVI and modelled GPP anomalies.
The difference between the reference and factorial simulations by LSMs allow separating the modulating effects of DH18 legacies to the DH19 impacts (dashed arrow in Fig. 1). Separating the legacies in observations is more challenging, because 145 the EVI signal at any time-step includes signals from both concurrent climate and past legacies, and possibly also long-term global change. To do this, we hypothesise that preconditioning effects due to past disturbance legacies (modulating DH19) and global change (modulating DH18 and DH19) should be detectable by changes in ecosystem sensitivity to similar hazards.
Increased vulnerability corresponds thus to EV I anom values lower (more negative or less positive) than those expected for a given drought or temperature anomaly based on past sensitivities. Inversely, increased resistance would result in EV I anom 150 being less negative or more positive than expected for a given SM anom .
We assess whether changes in the sensitivity to climate anomalies is detected in DH18 and DH19 using a statistical modelling approach to predict EV I anom in DH18 and DH19 based on 2001-2017 climate-vegetation relationships. We do this in two steps: first by fitting a linear regression model for mean EV I anom in each cluster, and then, for more detailed insights, by fitting a random forest model at pixel scale, in which we include potential seasonal legacy effects. In both cases, the training period 155 includes other DH events Orth et al., 2016), with similar climate anomalies, particularly 2003, thereby reducing the risk of attempting to predict EV I anom based on "unseen" climatic conditions. On a first step, for the spatially-averaged variables within each cluster, we fit the following models: Where EV I Ci anom and V AR Ci anom corresponds to the cluster (C i ) spatial average values of EV I anom and climate variable 160 (growing-season SM anom or T anom ), respectively. b 0 , b 1 are the coefficients of each linear regression trained on 2001-2017 values. Each model is then used to estimate DH18 and DH19 EV I anom . Negative model residuals (observations minus predictions) can indicate increased vulnerability, while positive residuals can be a sign of increased resistance.
However, departures from a linear model could also result from non-linear interactions between soil-moisture and temperature or from legacy effects from spring (Bastos et al., 2020a;Lian et al., 2020). To account for such effects and evaluate 165 potential spatial asymmetries in the departures from long-term climate-vegetation relationships, we fit a random-forest (RF) model using as target variable EV I anom in each pixel (i) from 2001-2017, and the corresponding SM anom and T anom in spring (March, April and May, MAM) and in summer (JJA) as predictors: To reduce the risk of over-fitting due to the small sample size (17 years) and large number of predictors (4), we fit the RF 170 model on 3x3 moving windows centered around each pixel (i.e. 17 × 9 samples). We assess the model performance outside of the training samples by calculating the out of bag scores in addition to the training sample scores. The importance of each predictor is estimated by the Shapley additive explanation values (Lundberg and Lee, 2017). We then predict EV I anom in DH18 and DH19 using the respective anomalies in T spr anom , SM spr anom , T sm anom , SM sm anom . The EV I anom predicted by the RF model for DH18 and DH19 correspond to the expected DH impacts from past relation-175 ships between the hazards and impacts in Fig. 1. As for the linear case, the difference between the RF model predictions and the actual EV I anom (model residuals) provides an indication of changes in ecosystem vulnerability to the DH18 and DH19 impacts.
For comparison with LSM simulations, the EV I anom clusters were remapped to 0.25 degree by largest area fraction calculation, and subsequently GP P anom and SM anom model ensemble means for each cluster were compared with corresponding 180 EV I anom and ERA5 SM anom . We first evaluate the linear relationships between the averaged GP P anom for each cluster and the corresponding climate anomalies, for comparison with EV I anom . Then, we estimate the legacy effects from DH18 on GP P anom during 2019 based on the difference between the reference and factorial LSM simulations.

Modulating effects
To understand how land-cover can contribute to modulate the impacts of DH18 and DH19 we analyse the land-cover com-185 position of each cluster. Given that central Europe is characterized mostly by a very heterogeneous landscape, we calculate land-cover selectivity in each cluster for forests, natural grasslands and croplands. Selectivity is defined as the difference between the probability a given land-cover class being present within a cluster compared to its overall presence in the whole region. The probabilities are calculated by fitting a kernel-distribution function to the fractional cover fields for the whole region and for separate clusters. Positive (negative) selectivity means that a given land-cover type is more (less) likely to be 190 found in a given cluster compared to its overall presence in the region.
For other modulating effects we evaluate how the spatial distribution of EV I anom residuals for DH18 and DH19 relates to climatic and ecological variables: SM anom and T anom in spring and summer, number of dry months in the year of the DH event and the preceding year (i.e. 2017-2018 for DH18, and 2018-2019 for DH19), EV I anom in the preceding summer (EV I yr−1 anom ), the number of dry months in a given year and its preceding year (DM), isohydricity (IsoH) and available water 195 capacity (AWC, related to the maximum amount of water available for plants).
We include some of the drivers used to train the temporal climate-driven RF model to diagnose possible changes in the vulnerability explained by stronger vegetation sensitivity to climate anomalies than in the training period. EV I yr−1 anom is used to evaluate the preconditioning role of past extreme summers or disturbances (summer is the peak of the growing season in this region). The number of dry months and AWC are also included as they may explain non-linear relationships between SM anom 200 and vegetation stress. Isohydricity provides a measure of the degree of stomatal regulation by plants. Since many of these variables have strong spatial co-variation (e.g. T anom and SM anom , we evaluate their relationships with EV I anom residuals by calculating the partial rank correlation (Spearman's ρ ) between each variable, controlling for the others separately. Since these effects might depend on land-cover type, we analyse separately pixels with high and low tree cover.
To further evaluate how inter-annual legacy effects affect long-term vegetation dynamics, we apply a second temporal RF 205 model to pixel-level EV I anom (Section 3.2.2) with EV I yr−1 anom as an additional predictor. The model is trained for the period 2002-2017 on 3×3 moving windows and is then used to predict EV I anom in DH18 and DH19. The resulting model residuals were then compared to those of the climate-driven RF model.

DH18 and DH19 impacts 210
Following the extreme summer in central Europe in 2018, mild temperatures and strong soil-moisture deficits remained until January 2019, when SM anom returned to normal conditions (Fig. B1, Fig. B2). In central Europe, June 2019 was extremely hot, but July and August 2019 were milder (Fig. B1, (Sousa et al., 2020)), and soil-moisture deficits became very pronounced in July (Fig. B2). In this region, except April 2019, the months preceding summer were not particularly dry and were even slightly wetter than average in February, March and May, the latter also colder than average. Therefore, the DH18 and DH19 constitute 215 more a sequence of two compound events than a single drought. The areas experiencing severe dry and hot conditions in both summers correspond to a region covering central and eastern Europe and southern Sweden. This region is our study domain and indicated by the rectangle in Fig.2). Both DH events led to vegetation browning, though negative EV I anom were more widespread in DH18 than DH19. Within the study region, 79% of the area showing negative EV I anom in DH18 (EV I DH18 anom ) also registered negative EV I anom in DH19 (EV I DH19 anom ).

220
The spatial distribution of the clusters resulting from the unsupervised classification based on (EV I DH18 anom , EV I DH19 anom ) pairs and corresponding centroids are shown in Fig. 3 (left and top right panels), as well as the corresponding (SM DH18 anom , SM DH19 anom ) and (T DH18 anom , T DH19 anom ) (center and bottom right panels). The four clusters aggregate pixels according to different impacts in DH18 and DH19. One cluster, covering 20% of the area, includes pixels with moderate impacts in DH18 and further browning in DH19, being therefore referred to as (C Decline ) (dark brown, EV I DH19 anom below the 1:1 line in Fig.3, top right panel). This 225 cluster is associated with mixed cover of forests (10-40%, dominated by needle-leaved) and grasslands (15-60%), (Fig.B3).
Cluster C HighV (high vulnerability, covering 15% of the area) corresponds to pixels experiencing strong impacts in both events and is associated with high grassland and cropland fractions and low tree cover. Pixels with strong impacts in DH18 and weakly negative EV I DH19 anom , i.e. partial recovery in DH19 (C PRecov , 21% of the area), are mainly dominated by croplands. Finally, a group of pixels shows moderate EV I DH18 anom and positive EV I DH19 anom (C Greening , 44%), corresponding mostly to mixed 230 forest-grassland pixels (30-65% of forest, dominated by needle-leaved).
All clusters align along proportional DH18:DH19 values of SM anom and T anom , with predominantly negative SM anom and positive T anom in both DH events but alleviation of soil-moisture deficits and heat stress in DH19 compared to DH18 (Fig. 3). The two recovery clusters (C PRecov and C Greening ) correspond to pixels with less severe drought conditions and milder temperatures in DH19, and C Greening corresponds to pixels where dry and hot conditions in DH18 were also more moderate.

235
C HighV corresponds to pixels experiencing drier and hotter anomalies in both summers and shows accordingly stronger impacts.
Cluster C Decline , however, shows increasing browning in DH19 in spite of drought and heat stress alleviation (Fig.3). The distributions of climate anomalies for each cluster overlap each other and, in some cases, the 1:1 line, indicating that the intensity of the hazards (temperature, drought) cannot account for the resulting impacts alone.

240
All clusters show significant positive linear relationships between summer EV I anom and SM anom and negative linear relationships with T anom in 2001-2017 (Fig. 4). The relationships include the two extreme summers of 2003 and 2015 which had comparable T anom and SM anom to DH18 and DH19 in most clusters. The long-term sensitivities estimated are, though, robust even if these summers are excluded.
The results correspond to a general summer water-limited regime, especially in clusters C Decline , C HighV and C PRecov , which 245 show stronger sensitivities to T anom and SM anom (slopes in Fig. 4) and higher variance explained by both models (R 2 0.58-0.68 for SM anom and 0.49-0.55 for T anom ). For these clusters, EV I anom is below the 95% confidence interval of the longterm linear relationships for DH18 (C PRecov and C HighV ) and DH19 (C Decline and C HighV ). SM anom and T anom in DH18 and DH19 are generally similar to those of 2003, but DH18 was drier than 2003 in C PRecov and C HighV .
These departures may be related with seasonal legacy effects from the warm spring in DH18 and or the onset of non-linear 250 responses to heat and drought. To account for these modulating effects, we model long-term (2001-2017) EV I anom -climate relationships using spring and summer SM anom and T anom as predictors using random forest regression (see Section 3.2.2).
The model is able to predict 48 -90% (median and maximum out of bag score across pixels) of the pixel-level temporal variability of summer EV I anom in 2001-2017 (Fig.B4). Analysis of the variable importance shows that the model estimates summer water limitation and negative legacy effects from spring warming (Fig. B5), consistent with a summer water-limited 255 regime and process-based modeling studies (Bastos et al., 2020a;Lian et al., 2020).
As in the linear case, the RF model estimates less negative or more positive EV I anom in DH18 and DH19 than observations (Fig. 5). The residuals are below the range of the training period for the high impact clusters: C Decline and C PRecov in DH19 and DH18, respectively, and C HighV in both (Fig. 5, bottom panel). In C Greening , residuals are predominantly positive (i.e. observed EV I anom more positive than predicted), but still partly overlap with the range of residuals in the training period (Fig. 5).

260
Pixels with high tree cover tend to show less negative or more positive residuals than pixels with low tree cover (<0.4%) in both DH events (Fig. 6), but in DH19 the range of residuals in high tree cover pixels is larger than DH18, including pixels with strongly negative values. The partial rank correlation of the spatial distribution of EV I anom residuals with respect to different explanatory variables is shown for pixels with high and low tree cover in Fig. 6. Given the large number of pixels, all correlations are significant.

265
In DH18, T anom in spring (T spr anom , + for high and low tree cover) and summer SM anom (SM sm anom , -for high tree cover and + for low tree cover) show the strongest relationships with EV I anom residuals. In DH19, EV I yr−1 anom (+), T spr anom and T sm anom (-) show strong correlations, with consistent sign for both high and low tree cover pixels. DH19 residuals of pixels with high tree cover show strong correlation with SM anom with opposite signs in spring (+) and summer (-) and with AWC (-). In DH19, pixels with low tree cover show negative correlation between IsoH and EV I anom residuals.

270
To test whether the importance of EV I yr−1 anom is particular to DH19, or if it may reflect long-term inter-annual legacy effects of anomalies in vegetation activity, we fit a second temporal RF model where EV I yr−1 anom is used as an additional predictor (Figs. B4 and B6). Including vegetation condition in the previous summer improves the predictive power of the long-term RF model (72-97% out of bag score, compared to 48-90% for the model trained with climate drivers only). Even though the residuals for the training period are considerably reduced relative to the climate-driven model, the residuals for DH18 and DH19 are 275 comparable.  (bottom panels), for pixels with high (dark green, top 5% tree cover fraction) and low (light green, lowest 5% tree cover fraction) tree cover (left panels). High TC pixels have tree cover fractions above 58% and low TC have virtually no trees (T C < 0.4%). The variables considered are: spring and summer Tanom and SManom (indicated by superscripts spr and sm, respectively), EV Ianom in the previous growing season (EV Iyr−1), plant isohydricity (IsoH) and the number of dry months (DM). Because of the large number of pixels considered, all correlations are significant (p − val << 0.01). The right panels show the distribution of residuals for pixels with high and low tree cover.

DH18 and DH19 impacts simulated by LSMs
The GPP from the LSM multi-model ensemble mean matches well the differences in impacts between clusters in DH18 (Fig.   7, top and middle panels) and the temporal evolution of GPP anomalies during the 2018 growing season (April to September, Table 1), with correlations with EV I anom of 0.74-0.90. Even though the root mean squared error (RMSE) is comparable in 280 the two growing seasons, the correlations of GP P anom with growing-season EV I anom are much lower in DH19(-0.09 -0.43).
GP P anom by LSMs is above-average in spring and early summer 2019 for all clusters, and anomalies in DH19 are either more positive or less negative, compared to EV I anom .
LSMs simulate a stronger attenuation of drought compared to the observation-based SM anom , though with consistent relative differences in SM anom between clusters (compare Fig. B7 and Fig. 3). LSMs simulate well the temporal evolution of SM anom in the two growing seasons, with high correlation with both SoMo.ml and SM ERA5 anom (correlations of 0.81-0.98). The RMSE for simulated SM anom is generally lower than that of GP P anom . The sensitivity of GP P anom to simulated SM anom and to T anom (Fig. B8) is consistent with that of EV I anom in all clusters (Fig. 4), although for C PRecov and C Greening LSMs estimate non-significant negative relationships between GP P anom and T anom . The deviations of GP P anom from the linear response for C HighV and C PRecov in DH18 are correctly captured by 290 LSMs, but not that of DH19 in C Decline .

Early signs of increased vulnerability
For three clusters covering 56% of the pixels negatively impacted by DH18, the extremely low EV I anom in response to DH18 and DH19 could not be predicted from EV I-climate relationships in 2001. These departures reveal increased 295 sensitivity to dry and hot conditions, and can be a sign of increased ecosystem vulnerability to such events. It should be noted, though, that we focused on pixels which were negatively impacted by DH18, but some pixels in the regional domain selected showed greening, even in DH18 (Fig. 2). These regional asymmetries result in partial regional compensation of the DH18 impacts, as shown in Bastos et al. (2020b).
In both DH18 and DH19, higher tree cover fraction is associated with more positive or less negative residuals (Fig. 6), 300 indicating that trees were more resistant to DH than grasses and crops. The predominance of crops and grasslands in C HighV , which had strong negative residuals in both events, and of high tree cover in C Greening also support this effect. Trees can better cope with drought with their deeper rooting depth (Fan et al., 2017) and through the use of carbon reserves to support activity under stress conditions (Wiley, 2020). Moreover, some trees and grasses with stronger stomatal regulation can buffer the drought progression and its impacts by avoiding hydraulic failure (McDowell et al., 2020;Teuling et al., 2010). This is 305 reflected in the small but positive relationship between isohydricity and EV I anom residuals in pixels with high tree cover.
Increased vulnerability may be explained by modulating effects of global change on vegetation condition (e.g., "hotter droughts" , Fig. 1) and, in the case of DH19, it may be further linked to inter-annual legacies from the impact of DH18. The first should be expressed by relationships between EV I anom residuals and climatic variables. The latter are more difficult to assess without comprehensive data about different competing factors, .e.g. defoliation or damage from 310 embolism (Ruehr et al., 2019), higher susceptibility to diseases and pests due to reduced health (McDowell et al., 2020) or increased hazard of insect disturbances due to warm conditions (Rouault et al., 2006;Suzuki et al., 2014). The relationships between EV I anom residuals and EV I yr−1 anom provide an approximation, but do not allow to identify the underlying drivers.
In DH18, we find a positive effect of spring warming in vegetation growth, leading to weaker departures from long-term vegetation-climate relationships (observed EV I anom more positive or less negative than modelled), but with associated water 315 depletion amplifying the impacts of DH18 in summer in pixels dominated by grasslands and crops (low tree cover in Fig. 6).
These results are in line with Bastos et al. (2020a) that showed contrasting seasonal legacy effects of warm springs in crop versus forest dominated regions.
On the contrary, spring and summer T sm anom in 2019 (or cooling, see Fig. B1) are negative correlated with EV I anom residuals in both high and low tree cover pixels. This indicates increasing damage from heat stress, for example due to reductions in 320 evapotranspirative cooling  or cascading impacts of compound heat and drought, such as insect attacks (Rouault et al., 2006).
Including EV I yr−1 anom in the long-term RF regression model improves the predictive skill for 2001-2017, but does not reduce the residuals in DH18 and DH19.The high correlation between EV I anom residuals and EV I yr−1 anom in DH19 can indicate either that pixels strongly impacted by DH18 were associated with amplified impacts by DH19 (negative residuals), or that pixels 325 affected moderately by DH18 (less negative EV I DH18 anom ) were associated with positive residuals, i.e. stronger recovery. Damage to roots and tissues or depletion of carbon reserves from DH18 leading to higher vulnerability to DH19 could explain the positive correlation in high tree cover pixels in C Decline . Conversely, the moderate DH18 impacts in C Greening may have resulted in increased resistance to DH19. The strong correlation found in low tree cover pixels is, though, surprising, as European crop species tend to be annual plants, and annual species can also be found in many grasslands. For these pixels, it is more likely that 330 the positive correlation is explained by management practices, e.g. through earlier harvest or active reduction of stand density in DH19 (Bodner et al., 2015). C Decline stands out from the other clusters, in that browning is found in spite of drought alleviation in DH19. The strong negative correlation of residuals with SM sm anom and AWC in forest dominated pixels is counter-intuitive and suggests that other environmental effects not considered in our analysis may modulate DH19 impacts. Insect outbreaks are a potential candidate 335 to explain such effects: the stronger correlation of residuals with EV I yr−1 anom in DH19 could reflect increased susceptibility of impaired trees, combined with favourable climatic conditions for insect growth, reflected in stronger negative effects of T sm anom in DH19 in high tree cover pixels.
Results from field inventories and forest plots support this hypothesis. Increased tree mortality and insect outbreaks in central Europe during 2018 have been reported (Schuldt et al., 2020). A recent assessment by the German Federal Minister 340 for Food and Agriculture (BMEL, 2020) reported crown damage in 36% of all tree types in summer 2019, a 7% increase compared to 2018 and predominating in trees over 60 years of age. According to this report, the mortality rate in both needleleaved and broad-leaved trees almost tripled from 2018 to 2019. Although no large scale data on insect outbreaks is currently available, local authorities in regions where C Decline is prevalent report increase in tree mortality from bark-beetle infestations: the Environment Ministry of North Rhine Westphalia in western Germany reported soaring rates of spruce affected by severe 345 bark-beetle infestations, from about 1% in 2018 to over 12% in 2019 (MULNV-NRW, 2019). In the Czech Republic, rates of spruce damaged by bark-beetles more than tripled, leading to increased mortality (Hlásny et al., 2021). In Belgium, a "bark bettle task force" was created in September 2018 by the economic office of Wallonia (OEW, 2018). Increased tree mortality and bark-beetle infestations have also been reported in eastern France (ONF, 2020).

Implications for earth system modelling 350
Temperate ecosystems are an important global sink of CO 2 (Pan et al., 2011) and are not usually considered hot-spots of drought risk and environmental degradation under climate change (Vicente-Serrano et al., 2020). Our results show that the past two extreme summers in central Europe reveal first signs of large-scale enhanced vulnerability in response to DH events (C HighV , C PRecov ), and of potential degradation trajectories induced by consecutive events (C Decline ). Even though limited to 20% of the study area, the patterns in C Decline highlight the risks associated with more frequent and intense droughts and heatwaves 355 expected in the coming decades (Barriopedro et al., 2011;Boergens et al., 2020;Hari et al., 2020). At the same time progressive warming conditions can increase the likelihood of compound occurrence of multiple disturbances, such as droughts and insect outbreaks, both promoted by warm and dry conditions. Interactions between compounding disturbances can further contribute to forest C losses (Seidl et al., 2017;Kleinman et al., 2019). To anticipate such impacts, process-based modelling of ecosystem response to such events is needed.

360
The LSMs perform well in simulating the magnitude and evolution of productivity anomalies in 2018, but not in 2019. The recovery simulated by LSMs in DH19 can be partly explained by a strong recovery of modelled soil-moisture (Fig. B7), but may also result from limited ability of LSMs in simulating changes in ecosystem vulnerability during the two DH events. The latter is supported by the fact that simulated SM anom shows good agreement in the temporal evolution of soil-moisture anomalies with both observation-based datasets but not of GP P anom (Table 1).

365
The comparison of the reference and factorial simulations allows showing that the poor performance in 2019 may be related with interannual legacy effects. LSMs estimate legacies from DH18 only in the early growing season (March to May 2019), but do not estimate any legacy effects in summer (Fig. 7 bottom panel). The poor relationships between EV I anom and simulated GP P anom in response to DH19 indicate that processes controlling legacy effects such as damage from embolism, carbonstarvation and resulting tree-mortality or disturbances induced by drought and heat such as insect outbreaks, currently missing 370 in LSMs, likely explain the amplified impacts of DH19.
LSMs are known to have limited ability to simulate drought-induced stress and tree mortality (Wang et al., 2012), and lack impacts of biotic disturbances, although rudimentary approaches have been attempted (Kautz et al., 2018). These model shortcomings add to limitations in simulating soil-moisture variability and transitions between energy-limited and water-limited regimes.

Conclusions
The summers of 2018 and 2019 were both exceptionally hot and dry over Central Europe, and both were associated with widespread vegetation browning and tree mortality events. Here we propose an approach that analyses this event as a combination of three types of compound events  that consider: (i) the compound effects of hot and dry conditions; (ii) the effect of repeated stress conditions in 2019 and (3) the legacy effects from DH18 impacts in preconditioning 380 the impacts of DH19. Using statistical and process-based modelling, we quantify these effects and identify modulating effects, e.g. by land-cover composition. This approach can be extended to other types of events that may not fall in a single type of compound event.
Based on remote-sensing data, we find signs of degradation trajectories in 20% of the study area, with vegetation browning in spite of drought alleviation in DH19. We showed that inter-annual legacies from DH18 played an important preconditioning 385 role in amplifying the impacts of DH19. While LSMs simulated well the impacts of the first event (DH18), they showed limited skill in simulating the impacts of the subsequent compound event (DH19).
Our results show that compounding effects of multiple and repeated stressors and ecological dynamics can result in nonlinear and unexpected impacts (Schuldt et al., 2020), that LSMs still cannot realistically simulate. Attribution of inter-annual legacy effects from DH18 and of LSM errors to internal processes (e.g. drought-induced damage and mortality) or others such 390 as insect outbreaks remains challenging because up-to-date datasets on tree mortality, tree carbon reserves or spatially-explicit information on biotic disturbances are very limited.
Since extreme DH events are projected to become more common in the coming decades, better understanding the interactions and feedbacks between climate extremes, natural disturbances and ecosystem dynamics is fundamental to anticipate threats to the stability of forests in the temperate regions and elsewhere. Overlooking these effects may result in an overestimation of the 395 resilience of the CO 2 sink to climate change. Kautz, M., Anthoni, P., Meddens, A. J., Pugh, T. A., and Arneth, A.: Simulating the recent impacts of multiple biotic disturbances on forest carbon cycling across the United States, Global change biology, 24, 2079Global change biology, 24, -2092Global change biology, 24, , 2018 Teuling, A. J., Seneviratne, S. I., Stockli, R., Reichstein, M., Moors, E., Ciais, P., Luyssaert, S., van den Hurk, B., Ammann, C., Bernhofer,   Figure B3. Selectivity of different land-cover composition for each cluster (Fig. 3). Selectivity is evaluated as the difference between the probability distribution of a given land-cover type (forest, left; grassland, middle; cropland, right) and the probability distribution of that land-cover type in the selected region. If selectivity is positive, the cluster is preferentially composed by the given land-cover type and the opposite for negative values. The 2018 land-cover classification maps from from ESA CCI-LC are used. Figure B4. Performance of the temporal RF model in predicting EV Ianom, given by the out of bag scores. The top panel shows the scores for the climate-driven RF model and the bottom panel the corresponding results for the same model, but including EV I yr−1 anom as an additional predictor. Figure B5. Importance of the four predictors used in the RF model to predict EV Ianom, spring (left) and summer (right), SManom (top) and Tanom (bottom), calculated from the Shapley additive explanation values (Methods).  Figure B7. The left panel shows the spatial distribution of the four clusters from unsupervised classification of (EV I DH18 anom ,EV I DH19 anom ) values remapped to the coarser grid of LSMs. The corresponding (GP P DH18 anom ,GP P DH19 anom ) values simulated by the multi-model mean in each cluster are indicated in the top right panel (circles indicate the spatial mean and the lines spatial standard deviation within each cluster).
The corresponding distribution of simulated SManom pairs in each cluster are shown in the bottom right panel. The grey line, indicates similar anomalies in the two DH events. Figure B8. Same as Fig. 4 but for GPP and soil-moisture anomalies simulated by a subset of land-surface models from (Bastos et al., 2020a) extended up to December 2019.