Identifying meteorological drivers of extreme impacts: an application to simulated crop yields

. Compound weather events may lead to extreme impacts that can affect many aspects of society including agriculture. Identifying the underlying mechanisms that cause extreme impacts, such as crop failure, is of crucial importance to improve their understanding and forecasting. In this study, we investigate whether key meteorological drivers of extreme impacts can be identiﬁed using the least absolute shrinkage and selection operator (LASSO) in a model environment, a method that allows for automated variable selection and is able to handle collinearity between variables. As an example of an extreme impact, we investigate crop failure using annual wheat yield as simulated by the Agricultural Production Systems sIMulator (APSIM) crop model driven by 1600 years of daily weather data from a global climate model (EC-Earth) under present-day conditions for the Northern Hemisphere. We then apply LASSO logistic regression to determine which weather conditions during the growing season lead to crop failure. We obtain good model performance in central Europe and the eastern half of the United States, while crop failure years in regions in Asia and the western half of the United States are less accurately predicted. Model performance correlates strongly with annual mean and variability of crop yields; that is, model performance is highest in regions with relatively large annual crop yield mean and variability. Overall, for nearly all grid points, the inclusion of temperature, precipitation and vapour pressure deﬁcit is key to predict crop failure. In addition, meteorological predictors during all seasons are required for a good prediction. These results illustrate the omnipresence of compounding effects of both meteorological drivers and different periods of the growing season for creating crop failure events. Especially vapour pressure deﬁcit and climate extreme indicators such as diurnal temperature range and the number of frost days are selected by the statistical model as relevant predictors for crop failure at most grid points, underlining their overarching relevance. We conclude that the LASSO regression model is a useful tool to automatically detect compound drivers of extreme impacts and could be applied to other weather impacts such as wildﬁres or ﬂoods. As the detected relationships are of


Introduction
Climate extremes such as droughts, heatwaves, floods and frost events can have substantial impacts on crop health (Shah and Paulsen, 2003;Singh et al., 2011;Lesk et al., 2016;Ben-Ari et al., 2018).However, not all climate extremes lead to an extreme impact, and large impacts can be related to moderate drivers (Zscheischler et al., 2016;Van der Wiel et al., 2019a, 2020;Pan et al., 2020).Whether a large impact occurs does not only depend on a climate hazard but also on the vulnerability of the underlying system (Oppenheimer et al., 2015), which varies strongly for crops during the course of the growing season (Iizumi and Ramankutty, 2015;Ben-Ari et al., 2018).The mechanisms that translate a climate hazard into crop failure are often very complex and associated with lagged effects that are difficult to disentangle (Frank et al., 2015).
While climate extremes may lead to large impacts, extreme climate-related impacts are often the result of multiple contributing factors (Tschumi and Zscheischler, 2020).The concept of compound events has recently been promoted to address climate impacts from an impact-centred perspective.For instance, compound events have been defined as extreme impacts that depend on multiple statistically dependent drivers (Leonard et al., 2014) or, more recently, simply as the combination of multiple drivers that contributes to environmental or societal risk (Zscheischler et al., 2018).Drivers in this context refer to climate and weather processes and phenomena.With respect to yields at the local scale, multiple drivers can compound an impact through a sequence of weather events (temporally compounding); one weather event may also change the vulnerability of the crop to a subsequent weather event (preconditioning), or multiple drivers may interact and impact crops at the same time (multivariate events) (Zscheischler et al., 2020).
Understanding the drivers that lead to extreme impacts helps to better predict and mitigate the potential impacts of such events.One way of identifying the relevant drivers of an impact is to perform a bottom-up analysis, that is, start from an impact and identify key drivers through statistical analysis (Zscheischler et al., 2013;Ben-Ari et al., 2018).In this context, linear regression analysis can identify the most relevant drivers of an impact variable and reveal potential interactions between drivers (Forkel et al., 2012;Ben-Ari et al., 2018).More sophisticated approaches such as random forests might yield higher predictive power at the cost of losing explainability (Vogel et al., 2019).When the set of possible predictors is very large, suitable variable selection approaches need to be applied to reduce the number of predictors.In order to be applicable to a large number of locations and a variety of impacts, an automatic approach is desired that only requires a limited amount of expert knowledge and parameter tuning.An example of such an approach is the least absolute shrinkage and selection operator (Tibshirani, 1996), or short LASSO regression, which obtains a reduced number of predictors by penalizing the number of variables in the loss function.
The aim of this study is to present a method that can identify drivers of extreme impacts in an automatic manner and that is suitable for many applications.We use crop failure as an example of an extreme impact in a model environment; that is, we use simulated data from a climate and a crop model.End-of-season crop yield is related to climate drivers via highly complex interactions at different temporal scales.Temperature and precipitation are the two basic climate variables that regulate crop health (Lobell and Asner, 2003;Lobell et al., 2011;Leng et al., 2016).Furthermore, vapour pressure deficit (VPD), the difference of water vapour pressure at saturated condition and its actual value at a given temperature, determines crop photosynthesis and water demand (Rawson et al., 1977;Zhang et al., 2017;Yuan et al., 2019).
Here, we use 1600 years of wheat yield data from a global gridded crop model driven by simulated meteorological data under present-day conditions.Based on this large database of yield data, we showcase approaches to identify multiple drivers of crop failure in different regions of the world and highlight results for the LASSO regression.Using a model environment to explore new analytical approaches to identify drivers of extreme impacts, we circumvent common limitations associated with observational data, such as a small sample size, measurement uncertainties and data coverage.Among the large amount of information provided by the crop model simulations, the statistical model summarizes the link between crop failure and climate conditions.This paper is structured as follows.The data and methods used in this study are introduced in Sect. 2. In this section, the reader can first find a description of the data, including an introduction to the global climate model and the crop model used in this study.We further describe which meteorological variables are considered in the statistical analysis; Sect. 2 also introduces the LASSO logistic regression to predict years of low yield based on meteorological drivers and the metrics employed to assess the performance of the statistical model.The results of the LASSO regression are shown in Sect.3, where the performance and the summary statistics for the variables that have been selected as being critical to predict crop failure events are presented.Finally, we sum-marize and discuss the LASSO regression's results in Sect. 4 and give some perspective to this study in Sect. 5.

Climate and crop model simulations
To investigate the influence of natural variability and climatic extreme events, a large ensemble simulation experiment was set up with the EC-Earth global climate model (v2.3;Hazeleger et al., 2012).We use this climate model data set, consisting of 2000 years of present-day simulated weather, to investigate if we can identify the drivers of extreme low crop yield seasons.Large ensemble modelling is at the forefront of climate science (Deser et al., 2020); due to the computational expenses involved, a balance between ensemble size, horizontal resolution and number of climate models has to be found.We have found the climate data used here to be suitable for the present study.A detailed description of these climate simulations is provided in Van der Wiel et al. (2019b); here, we provide a short overview of the experimental setup.The present day was defined as the 5year model period in which the simulated global mean surface temperature matched that observed in 2011-2015 (Had-CRUT4 data;Morice et al., 2012).Because of a cold bias in EC-Earth, in the model this period is 2035-2039.To create the large ensemble, 25 ensemble members were branched off from 16 long transient climate runs (forced by Representative Concentration Pathway (RCP) 8.5).Each ensemble member was integrated for 5 years.Differences between ensemble members were forced by choosing different seeds in the atmospheric stochastic perturbations (Buizza et al., 1999).This resulted in a total of 16×25×5 = 2000 years of meteorological data at T159 horizontal resolution (approximately 1 • ).
Biases in the EC-Earth simulations result in unrealistic growing conditions for crops.Therefore, minimum and maximum temperatures and precipitation fields were bias corrected.The Agricultural Model Intercomparison and Improvement Project (AgMIP) Modern-Era Retrospective Analysis for Research and Applications (AgMERRA) reanalysis (Ruane et al., 2015) was used as "truth".From AgMERRA, the years 1981-2010 were used as a training set, while EC-Earth uses the long transient runs (16 times for 2005-2034).Daily minimum and maximum temperatures were corrected on a grid point basis; a model bias field was defined as the difference between the model climatology and the AgMERRA climatology.The climatology was defined to be the mean plus the first three annual harmonics.Daily precipitation was corrected towards having the correct number of rainy days and total amount of precipitation.Firstly, for each month, the number of rainy days in AgMERRA was computed (threshold of 0.1 mm d −1 ); then the same threshold was determined for EC-Earth data, which resulted in the same number of rainy days.All days with simulated precipitation lower than this threshold were set to 0 mm d −1 .Lastly, the total amount of precipitation was corrected by means of a multiplicative factor, also on a month-by-month basis.Other meteorological variables were not bias corrected.
Northern Hemisphere winter wheat yields were simulated using the Agricultural Production Systems sIMulator (APSIM)-Wheat model (Zheng et al., 2014), which is a process-based model incorporating wheat physiology, water and nitrogen processes under a wide range of growing conditions.It was previously used for field (Li et al., 2014), regional (Asseng et al., 2013) and global-scale (Rosenzweig et al., 2014) wheat studies.A grid-point-specific sowing date was used based on Sacks et al. (2010).The application of nitrogen was exacted from Mueller et al. (2012).Soil parameters (including pH, soil total nitrogen, organic carbon content, bulk density and soil moisture characteristics curves for each of five 20 cm deep soil layers) were derived from the International Soil Profile Data Set (Batjes, 2012).In addition, we also input the grid-specific thermal time accumulation parameters, which were derived from phenology (Sacks et al., 2010) and AgMERRA data.The atmospheric CO 2 concentration was set to 394 ppm.The growing season of winter wheat spans 2 calendar years (e.g.sowing in November and harvest in June).As such, each climate model integration of 5 years covers four winter-wheat-growing seasons; the 2000 years of EC-Earth climate data thus result in 1600 simulated wheat-growing seasons.Further details on the settings of the APSIM-Wheat model can be found in Appendix A. For model validation, the grid-based wheat yield simulations were aggregated to country level and then validated against the yield statistics during 2011-2015 (FAO-STAT, 2020).Most simulated yields are closely related to observed yields (Fig. B1), indicating good model performance.

Data processing
The APSIM model provided crop data for 995 grid points in the Northern Hemisphere.For our analysis, we chose to discard all grid points for which the annual mean yield is below the 10th percentile of annual mean yield across all grid points because many of these grid points were also associated with unrealistically long (> 365 d) or short (< 90 d) growing seasons, or had an overall average crop yield of 0 kg ha −1 yr −1 .Overall, 895 grid points remained for the analysis.
At each grid point, a year with yield lower than the 5th percentile for this grid point is considered as a year with crop failure and called "bad year" in the remainder, whereas all other years are referred to as "normal years".Grid points for which the 5th percentile yield was equal to 0 were excluded to avoid the co-occurrence of years without yield in the bad and normal years.This excluded six more grid points so that 889 remained for further analysis.Figure 1a shows the simulated mean annual yield, and Fig. 1b displays the relative difference between the 5th percentile and the mean annual yield.These two figures also indicate grid points that were discarded for further analysis.Finally, we discarded individ-154 J. Vogel et al.: Identifying meteorological drivers of extreme impacts: an application to simulated crop yields ual years with a growing season longer than 365 d, leading to a slightly lower number of years than 1600 for 82 grid points, i.e. for about 5 % of the grid points.
The data were split into training and testing data sets by randomly assigning 70 % of the data to the former and 30 % to the latter.For the logistic regression (Sect.2.4), explanatory variables and yield were normalized by rescaling them to a range of [−1, 1] for each grid point individually.

Explanatory data analysis
The APSIM model uses six meteorological variables on a daily basis as input -dew-point temperature (T d ), precipitation (Pr), 10 m wind speed (Wind), incoming shortwave radiation (Rad), maximum temperature (T max ) and minimum temperature (T min ).From these variables, we additionally calculated VPD as an important variable for plant growth (Rawson et al., 1977;Zhang et al., 2017;Yuan et al., 2019).For a given grid point, the sowing date is the same for the 1600 simulated years, but the harvest dates differ.We therefore define the growing season for a given grid point as starting on the month containing the sowing date and finishing with the month containing the latest harvest date.Figure 2 illustrates the temporal evolution of composites of these seven variables over the course of a growing season for normal (blue) and bad years (red) for one grid point in France (47.7 • N, 1.1 • E; Fig. 2a).The composites provide some indication about which of the meteorological variables may contribute to crop failure.In addition, the temporal evolution of the two composites reveals during which part of the growing season the different variables are relevant.The various composites suggests that, for this grid point, 30 d Pr, VPD and T max during the summer (June-August) have a high impact on crop yield (Fig. 2c, f and h).The other variables appear to be less relevant (Fig. 2b, d, e and g).Similar composites for grid points in the US (44.3 • N, 90.0 • W) and in China (30.8 • N, 118.1 • E) are shown in Figs.B2 and B3, respectively.
In addition to the seven meteorological variables, we considered seven climate extreme indicators as potential predictors of crop failure (mean diurnal temperature range, dtr; number of frost days, frs; maximum temperature, TXx; minimum temperature, TNn; maximum five day precipitation sum, Rx5day; number of warm days, TX90p; number of cold days, TN10p; following Vogel et al., 2019) (Table 1).For both the monthly means of the meteorological variables and the growing season means/totals of the indicators of climate extremes, we calculated the Pearson correlation coefficient between the variables and annual yield (Fig. 3a and b for the same grid point as in Figs. 2 and 3c and d as average correlation over all grid points).These correlations are computationally and conceptually very simple, and together with Fig. 2, they serve as a first estimation of the importance of the available variables.Some variables, such as wind speed, do not have a discernible influence on yield and thus can be neglected for this study.We use monthly means of T max , Pr and VPD during the growing season, as well as the seven extreme indicators for further analysis.

LASSO regression
The aim of this study is to provide an interpretable statistical model that is able to predict years with extremely low yields (bad years) with meteorological variables.We use the LASSO (Tibshirani, 1996) logistic regression for an automatic selection of meteorological variables that are statistically linked to low yields.The approach is explained below.
For a given grid point, let Y ∈ {0, 1} n be the binary yield vector, with n the number of years.If the year i ∈ {1, . . ., n} is a bad year, then Y i = 1; otherwise, Y i = 0. Let X 1 , . . ., X p ∈ R n be the explanatory variables vectors (monthly meteorological variables and climate extreme indicators, rescaled as explained in Sect.2.2).Using a generalized linear model and, more specifically, a logistic regression, we can identify how much of the occurrence of bad yields is explained by which explanatory variable: where β 0 , β 1 , . . ., β p are the regression coefficients.However, a simple logistic regression presents two challenges here.Firstly, some variables might be highly correlated (e.g.correlation between temperature in May and temperature in June, or the correlation of extreme indices with meteorological variables).This correlation implies a high variability of the coefficients.For instance, if the variables X j and X k are highly correlated, the information brought by a high absolute value of β j and a low absolute value of β k might be the same as the information brought by a low absolute value of β j and a high absolute value of β k .Another issue is the large number of potential explanatory variables (up to 43 for some grid points).The relatively straightforward relationship of a generalized linear model (simpler than the crop model equations themselves) allows us to reveal which meteorological variables explain bad yields best.However, if the number of a priori explanatory variables is very large, the regression becomes rather complex and many coefficients will be close to zero, rendering an interpretation difficult.
LASSO regression tackles both challenges with an automatic variable selection using a regularization by penalizing the number of coefficients different from 0 using the 1 norm on the vector of coefficients (Tibshirani, 1996).Thus, the regression coefficients are obtained by minimizing an objective function consisting of the sum of the usual loss function for logistic regression and a penalty term on the coefficient norm: for a fixed λ > 0. The penalty term on the coefficient norms prevents a high variability of these coefficients.Furthermore, the 1 norm implies a variable selection.Coefficients associated with non-relevant explanatory variables are set to 0.
We use the R package glmnet (Friedman et al., 2010) to perform the LASSO regression with R version 3.6 (R Core Team, 2019).Through 10-fold cross-validation in the training data set, we obtain the optimal λ min and λ 1SE = λ min +SE with "SE" the standard error of the lambda that achieves the minimum loss, and the coefficients β = (β 1 , . . ., β p ), which is the solution to the optimization in Eq. ( 2) for λ = λ 1SE .Our preference for λ = λ 1SE is motivated by the balance between number of selected variables and accuracy of the loss function minimization (Friedman et al., 2010;Krstajic et   2014).Indeed, less variables are selected with λ 1SE than with λ min , because λ 1SE > λ min and thus the penalty term on the norm of coefficient is stronger, but the minimization of the Eq. ( 2) is still sensible, because λ 1SE lies within the uncertainty range of the optimal λ.

Other models
To compare the performance of the LASSO regression with other regression methods we also perform the analysis with a generalized linear model (GLM) and a random forest binary classification.
For the application of the GLM, a pre-selection of the initial variables is required, since the number of predictors is limited.Only the variables with the highest Pearson correla- tion coefficients (ρ > 0.30) were selected as initial predictors from an initial data set composed of all months of the growing season for each of the three variables (T max , Pr and VPD) and the seven extreme indicators.Next, the subset of best predictor variables is identified with the leaps algorithm (Furnival and Wilson, 1974).We use the implementation of the R package bestGLM (McLeod et al., 2020), using a binomial family with a logit link function.Overall, GLM achieves lower performance (Sect.2.7) compared to the LASSO logistic regression (not shown).The weaknesses of this approach include its sensitivity to outliers and multicollinearity, and overfitting.
Finally, a random forest approach -a common machine learning technique -was also performed using the R package randomForest (Breiman, 2001;Liaw and Wiener, 2002) serving as a benchmark for the model performance of the LASSO logistic regression.The random forest binary classification achieves comparable performance (Sect.2.7) but is not superior to the LASSO approach.

Segregation threshold adjustment
The segregation threshold for assigning a continuous prediction to either a bad or normal year was adjusted in a gridpoint-wise manner to account for the unbalanced data set with 19-fold higher occurrences of normal years than bad years.Let s be the local segregation threshold between a bad year predicted and a good year predicted.In other words, if the probability p = P[Y = 1] predicted for a given grid point by the LASSO logistic regression model is greater or equal to s (lower than s), then the year is predicted as a bad year (normal year).We want to choose s as a good compromise in prediction of normal years and bad years, given that bad years are rare.In other words, we want to find an optimal trade-off between specificity and sensitivity.To this purpose, a cost function C = C(s) is calculated based on the false positive rate R FP = R FP (s), the associated cost for a false positive instance C FP , the sum of observed normal years O NY , the false negative rate R FN = R FN (s), the associated cost for a false negative instance C FN and the sum of observed bad years O BY of the training data set (Hand, 2009).A false positive means that a normal year was observed while a bad year was predicted, and a false negative refers to the observation of a bad year, whereas a normal year was predicted.For a given grid point, FP, FN, TP and TN denote the total number of false positives, false negatives, true positives and true negatives, respectively (Fig. 4).The value of C(s) is given by where R FP = FP FP+TN , R FN = FN FN+TP and C FP = C FN = 100.In this study, the costs associated with false positive C FP and false negatives C FN are given equal weight.
The optimal segregation threshold s * for a given grid point is s * = argmin s∈(0,1) C(s).The segregation threshold selected in this study is the mean value of s * over all grid points.

Model performance assessment and sensitivity analysis
Model performance is assessed using the critical success index (CSI).The CSI is frequently used for evaluating the prediction of rare events, as it neglects the number of correct predictions of non-extremes, which dominate the confusion matrix (Mason, 1989)  of normal years and are therefore not meaningful for the assessment of model performance in unbalanced data sets with under-represented extreme events.The CSI is defined as To evaluate the robustness of our model, in addition to the 5th percentile threshold, we repeated the analysis with thresholds of 2.5 % and 10 %, reaching qualitatively similar performance.In addition to the normalization by rescaling the data to the interval [−1, 1], we also performed a z-score transformation, which yielded comparable results.Therefore, our choice of normalization is arbitrary to a degree and a z-score transformation can potentially also be applied in the LASSO logistic regression model.Moreover, we applied two more combinations of splitting training and testing data sets: a 60/40 and an 80/20 split.With increasing size of the training data set, the CSI increased slightly, however, at the expense of stochastic under-representation of bad yield years in the smaller testing data sets.As a trade-off, we decided for the 70/30 split.
The adjustment of the segregation threshold was carried out with equal weight to false positive and false negative predictions.It can be argued that the latter case -where a normal year is predicted, but crop failure is observed -is more detrimental and should therefore be given a higher weight.Due to the subjectivity in the determination of this weight, an adjustment of the weight term was not applied.However, it should be noted that the attribution of a higher weight of false negative predictions would yield a lower segregation threshold and hence improve the overall CSI.

Overall performance
The LASSO logistic regression model can predict bad years with an average CSI = 0.43 across all grid points.Best per-formance is obtained in the eastern half of the United States with a maximum of CSI = 0.82 (Fig. 5), which decreases westwards in the Great Plains and is lowest in the wheatgrowing regions located close to the Rocky Mountains.Furthermore, especially the most northern and southwestern grid points in North America show a lower performance in general.Also central Europe shows high performance up to CSI = 0.80.A notable regional exception with low performance can be found in the Alps.Many Asian and African growing regions show medium prediction accuracy such as northern China, Myanmar, Turkey and the Maghreb, with exceptions of some regions including Pakistan, southern China and Japan, which show a low performance in general.For 30 grid points, it is not possible to obtain reasonable predictions of bad years with our approach, indicated by a CSI equal to 0. Overall, regions with high prediction accuracy of bad years are often those that also have high mean yields (Fig. 1).CSI is positively correlated with mean yield with a Pearson correlation coefficient of ρ = 0.46 (Fig. 6a); an even stronger correlation is found with yield variability (ρ = 0.57) (Fig. 6b).

Explanatory variables
Here, we summarize properties of the variables selected by the LASSO logistic regression as relevant explanatory variables, i.e. those which are statistically linked to bad years.A median of 11 variables per grid point has been selected as explanatory variables, and for 50 % of grid points the number of selected variables lies between 7 and 14 (Fig. 7a).The inclusion of extreme indicators provides a useful addition to the monthly predictors, shown by a median number of two selected extreme indicators per grid point (Fig. 7b).Grid points without extreme indicators are found only in few areas such as eastern Europe, the Alps and southern China.In total, 72 % of all grid points include monthly predictors of VPD, Pr and T max , and almost all grid points (97 %) incorporate VPD (Fig. 7c).Interestingly, in the Great Plains (USA), in many cases temperature is not included, whereas in most other regions of the US all meteorological variables are selected to achieve a good prediction.In southern China, temperature is not needed by the models, whereas in the northern areas, usually all meteorological variables are part of the model.In most wheat-growing regions, particularly in the northeastern US, southeastern Europe and Turkey, all four seasons contain relevant predictors for predicting bad years (Fig. 7d).Generally, the number of seasons included decreases towards the southeastern regions in the US, whereas in western Europe no clear pattern emerges.In lower latitudes such as in southern Asia, growing seasons are generally shorter (Fig. B4), and consequently often only predictors from one or two seasons are included in the respective models.
At the global scale, VPD in May and June, as well as Pr in April, are the predictors which are most often included in  the LASSO regression, followed by the climate extreme indicators diurnal temperature range (dtr) and number of frost days (frs) (Fig. 8a).In nearly all cases, the sign of the coefficient is positive for VPD in May and June, and negative for Pr in April.This implicates that higher VPD increases the risk of crop failure and is similar for the other variables.In North America and Europe, in addition to dtr and frs, VPD and Pr in spring to early summer are the most frequent monthly predictors (Fig. 8b and c).The growing season for wheat varies with latitude.Consequently, in more northern regions, mostly in Europe and North America, monthly predictors from the months between March and July are included in the LASSO regression, whereas in southern regions such as in Asia and Africa, November to May are usually the most frequent months (Fig. 8d).
This clear latitudinal shift can be visually identified in North America from February to July, especially for VPD (see maps in the Supplement).In central Europe, the growing season ends latest; thus, VPD in August is usually included in the model.In addition to the most common climate extreme indicators, dtr and frs, Rx5day and TXx are among the most frequent predictors in Asia and North America, respectively.Overall, frs is mostly included in northern grid points, with notable exceptions in western Europe (Fig. B5a) and mainly with a positive coefficient (higher frs leads to more crop fail-ure events), which can likely be attributed to the influence of mild maritime climate in those regions.In contrast, dtr is important in most Asian grid points and especially in western Europe and the Maghreb, whereas in the Pannonian Basin and Turkey it is a less common predictor (Fig. B5b).The coefficient associated with dtr in the LASSO regression is mainly positive, except in parts of India and Myanmar.Some variability in the mean diurnal temperature range might be beneficial for regions close to the Equator where the variability in diurnal temperature is usually low.Generally, both low and high dtr values can influence wheat yield beneficially depending on the growing region; e.g. a low dtr can be advantageous because of a reduced occurrence of frost days, whereas a higher dtr might also indicate a favourable effect because of increased solar radiation (Lobell, 2007).Rx5day is predominant in the western US, the western Mediterranean and central Asia (Fig. B5c), which are all growing regions with comparably low average annual precipitation.TX90p is a common variable in low latitudes with a positive coefficient, especially in the southern US and Myanmar (Fig. B5d).This indicates that in these regions physiological temperature thresholds are occasionally exceeded, making TX90p a crucial variable in these areas. https://doi.org/10.5194/esd-12-151-2021 Earth Syst.Dynam., 12, 151-172, 2021

Predicting bad yield years
In this study, we presented a method for identifying drivers of extreme impacts using crop failure as an example.Such approaches are highly sought after to identify compound drivers of large impacts (Zscheischler et al., 2020;Van der Wiel et al., 2020).Our method allows us to investigate potential drivers at a global scale using a highly automated scheme based on LASSO regression.The benefits of LASSO regression include its usage for automatic variable selection, its consideration of correlation between explanatory variables and its performance.Moreover, the statistical model obtained provides a logistic linear relationship between crop failure and selected variables, which is much simpler to interpret than the crop model equations themselves or results obtained with more complex machine learning approaches.We defined bad years as years where the annual crop yield is below the 5th percentile and were able to predict those years by using the LASSO regression with an average CSI of 0.43.This means that on average, the sum of the numbers of false positives and false negatives is slightly higher than the number of true positives (or accurate predictions of bad years).Our model performance is somewhat comparable to results from Vogel et al. (2019), who were able to explain 46 % of variation in spring wheat anomalies using a similar set of predictors based on a random forest algorithm.In our case, more sophisticated machine learning regression models such as random forests did not yield better prediction skill, indicating that performance in the current setup using monthly predictors for a binary classification of bad years likely cannot be much improved.This is probably also related to the fact that predicting extremes of a continuous variable is challenging because no natural separation between extremes and non-extremes exists.Another challenge arises from the highly asymmetric distribution of observed bad and normal years.Even though in our case the total amount of samples per grid point is relatively large (1600, because we used simulated crop yield data) the number of observed bad years is only 80 and thus can still be considered fairly small.
We analysed the robustness of our results using (a) the 10th percentile as a threshold to discriminate between bad and normal years and (b) a smaller data subset with only 400 entries per grid point (i.e. a quarter of the available data).The spatial patterns of the selected predictors and the CSI using the 10th percentile threshold are very similar compared to those of the 5th percentile, and the average CSI increases slightly from 0.43 to 0.52.Using a sample size of 400 we still obtain an average CSI of 0.33, indicating that performance decreases only slightly with decreasing data size, while the spatial patterns remain consistent (results not shown).Furthermore, the spatial coherence of our results (Fig. 7) additionally suggests robustness of our analysis.An application of the approach on real data might still be challenging, as observational sample sizes generally are much smaller than even 400 years.In addition, observational data are often not available at such spatial resolution and extent as is the case for the crop model data used in this study.This will make it difficult to use spatial coherence of the identified drivers as an indicator of model robustness when using observational data.Furthermore, modelling winter wheat yield is particularly challenging due to its long growing season (Vogel et al., 2019).
A limitation to our study design is the pre-selection of potential predictor variables.Here, we used monthly mean values and a number of climate extreme indicators.More flexible averaging time periods for the predictors might result in higher prediction accuracy due to better overlap with sensitive periods of the impact variable.For instance, in our crop yield example, meteorological predictors need to coincide with the respective phenological development stage because their impact can vary depending on the phenophase.Wheat, for example, is known to require wet conditions in the vegetative phase; however, it prefers dry conditions during ripening (Seyfert, 1960).Therefore, the application of monthly meteorological predictors might be insufficient for accurate matching of meteorological drivers to the respective phenological phases.We explored the option of automatically genhttps://doi.org/10.5194/esd-12-151-2021 Earth Syst.Dynam., 12, 151-172, 2021 erating optimal time periods for the meteorological predictors by maximizing the difference between the composites between normal and bad years.For instance, 30 d cumulative precipitation differs between normal and bad years starting in February and ending in August for a grid point in France (Fig. 2c), whereas VPD only differs from May to September (Fig. 2h).Composite plots for a grid point in the US and in China are shown in Figs.B2 and B3, respectively.However, deciding when the separation between normal and bad years is large enough to start and end the optimal time periods is challenging and difficult to generalize and thus automate, which was the aim of our method design.Nevertheless, such a well-tuned selection of predictors has the potential to improve the prediction of bad years significantly and should thus be explored in future research.We find a strong correlation of the yearly mean and standard deviation of annual yield with the LASSO regression performance indicator CSI (Fig. 6).Low model performance at grid points with low yield variability suggests that the distinction between normal and bad years is challenging at these locations, e.g. in southern China and Japan (Figs. 1b and 5).Regions with high annual yield are found primarily in central Europe and the eastern half of the United States, which also represent the regions with highest model performance.In contrast, many regions in Asia generally have lower average yields and lower prediction skill of bad years.This could be related to a calibration bias in the crop model, leading to a better representation of wheat-growing processes in regions where wheat reaches higher yields in the real world.A further explanation for this phenomenon could be that the crop model is primarily designed for crop growth at typical environmental conditions, whereas growing regions with conditions at the edge of the ecological niche of wheat might be less well represented.
Our analysis was based on fitting a local model at each location, which is one of the three principal statistical methods used to link crop yield with weather conditions, along with cross-section models and panel models, which are global models that adjust for spatial variability using fixed or random effects (Lobell and Burke, 2010;Shi et al., 2013).Collinearity between explanatory variables is a recurrent issue when using these methods (Shi et al., 2013) that we addressed with the LASSO regression.One example is VPD and T max , which might be highly correlated but still might individually contribute relevant information because they have a different impact on the plant process, as explained in Kern et al. (2018).LASSO regression did not completely discard one of these two variables, despite their high correlation.

Important predictors
For most grid points, VPD is the most important monthly predictor of bad years, followed by Pr and T max , in that order.While their importance in time differs between grid points, depending on the timing of the respective growing season (Sippel et al., 2016), their order changes little across space.In addition, the order of importance of extreme indicators is quite similar in North America, Europe and Asia.One notable distinction is the higher importance of Rx5day in Asian grid points compared to North America and Europe.The consistent selection of similar predictors across large spatial scales may suggest that the LASSO regression is fairly robust.However, this may also be related to the inevitable simplifications of crop-growing processes in the employed crop model.In particular, the same model is applied at all locations, likely creating certain homogeneity by default.Kern et al. (2018) conducted a comparable analysis on observed winter wheat crop yield in Hungary.With a linear regression using a step-wise selection of monthly meteorological variables, they found that a positive anomaly in VPD and T min during May decreases yield.Additionally, April, May and June appear to be the most relevant months in our global analysis, which is consistent with regional studies (Kern et al., 2018;Kogan et al., 2013;Ribeiro et al., 2020).
Climate extreme indicators are important predictors as the occurrence of an extreme weather event may induce crop failure in a given year.However, in years without such extreme events, crop yields are still governed by the weather during the growing season (Iizumi and Ramankutty, 2015).We found that both climate extreme indicators as well as monthly means of common climate variables have proven to be valuable predictors of years resulting in crop failure.Droughts and heatwaves are well known to affect crop yield (Lesk et al., 2016;Jagadish et al., 2014), and temperature and precipitation explain a large fraction of interannual crop yield variability (Lobell and Burke, 2008).In contrast, VPD is often overlooked in statistical analyses of crop yield variability (Zhang et al., 2017).We show that VPD is a key predictor for crop failure.It is known to play a crucial role in plant functioning and is projected to increase as main limiting driver in the face of climate change (Novick et al., 2016;Grossiord et al., 2020).High VPD values can lead to plant mortality via carbon starvation and hydraulic failure (Mc-Dowell et al., 2011;Grossiord et al., 2020).However, its covariation with temperature and solar radiation makes it difficult to disentangle their respective effects (Stocker et al., 2019;Grossiord et al., 2020).There are well-defined temperature thresholds for wheat; e.g. a temperature of 31 • C before flowering is considered to evoke sterile grains and thus reduces yield (Porter and Gawith, 1999;Daryanto et al., 2016).T max is a secondary predictor in our statistical model, which is in line with results based on observed and simulated yields (Schauberger et al., 2017), and can be attributed to the rare exceedance of critical temperature thresholds in the growing season.Crops are particularly vulnerable during key development stages, so extreme events during that time span can lead to large yield reductions, even in the event of otherwise favourable weather conditions during the growing season (Porter and Gawith, 1999;Moriondo and Bindi, 2007).The vulnerability of wheat to climatic events depends largely Earth Syst.Dynam., 12, 151-172, 2021 https://doi.org/10.5194/esd-12-151-2021 on phenophases, and generally wheat possesses a higher sensitivity to temperature and precipitation during its reproductive phase than during its vegetative phase (Porter and Gawith, 1999;Luo, 2011;Daryanto et al., 2016).Future research could investigate the importance of time of occurrence of extreme indicators (Vogel et al., 2019).For instance, due to climate change, false spring events may become more likely in some regions (Moriondo and Bindi, 2007;Allstadt et al., 2015), and thus the timing of frost days could provide a valuable addition to the model.The frequent inclusion of the extreme indicators such as dtr and frs in our regression model highlights that short-term extreme events can potentially have larger impacts than gradual changes over time (Jentsch et al., 2007).The variable dtr was also identified as an important predictor by Vogel et al. (2019), whereas frs was of minor importance for explaining variation in spring wheat yield.By contrast, frs is one of the most predominant predictors in our study, which might be explained by the differing growing season of winter wheat, which is encompassing primarily the cold seasons.
We explored the relevance of interactions between predictors; however, this did not significantly improve model performance.This might hint at the inability of the APSIM crop model to account adequately for such compound effects, which is consistent with Ben-Ari et al. ( 2018), who linked the crop failure 2016 in France to an extraordinary combination of warm winter temperatures followed by wet spring conditions.The commonly used crop models employed for crop yield forecasts were not able to predict the 2016 yield failure in France (Ben-Ari et al., 2018).
Overall, our results illustrate the omnipresence of compounding meteorological events for crop failure.In nearly all grid points, most seasons and meteorological variables were relevant to predict years with crop failure (Fig. 7).This suggests that the co-occurrence of certain weather conditions as well as the combination of weather conditions in different seasons are associated with crop failure.With our approach we have identified meteorological conditions that are statistically linked to crop failure.Our results confirm prior findings by Van der Wiel et al. (2020) that such conditions are not necessarily extreme but can also be moderate.However, for interpretation of the selected variable set, one should be aware that the variables in our model are selected based on correlation, and thus attributing them as potential physical drivers needs further careful investigation.To identify such causal relationships, more advanced methods from the emerging field of causal inference could be employed (Runge et al., 2019).

Conclusions
In this paper, we presented a robust statistical approachnamely LASSO logistic regression -for predicting crop failure and automatically selecting relevant predictors among a large number of meteorological variables and climate extreme indicators.We illustrated our approach on 1600 years of simulated winter wheat yield for the Northern Hemisphere under present-day climate conditions.LASSO regression can serve as a tool for identifying important variables with automated variable selection while accounting for collinearity and achieving overall good predictive power.Consistent with earlier knowledge, we find that predicting crop failure requires accounting for a number of different meteorological drivers at different times of the growing season, which is illustrated by the large number of variables at all seasons included in our statistical model (Fig. 7).This indicates that compounding effects are ubiquitous across time and meteorological drivers, and highlights the usefulness of approaches such as LASSO regression to reveal multiple meteorological drivers of crop failure.We identified vapour pressure deficit as one key variable to predict crop failure, which underlines the importance of its consideration in statistical crop yield models, in particular because it is often overlooked in statistical analyses of crop yield variability.Furthermore, climate extreme indicators such as diurnal temperature range and the number of frost days have proven to be valuable additions to the predictive models, highlighting the necessity to address not only monthly mean conditions but especially also climatic extremes in such models.Overall, this study helps to enhance the knowledge required to improve seasonal forecasts and undertake adaptation measures against crop failure.The flexibility of our approach allows an application to other climate impacts that are influenced by a large range of variables varying with seasonality, for instance, wildfires or flooding. https://doi.org/10.5194/esd-12-151-2021 Earth Syst.Dynam., 12, 151-172, 2021

Figure 1 .
Figure 1.(a) Mean annual yield over the 1600 years (t ha −1 ).(b) Relative difference between the 5th percentile and the mean annual yield.Grid points discarded for our study are crossed out (specified in Sect.2.2). min

Figure 2 .
Figure 2. Daily evolution of meteorological variables used as input for the APSIM model over the course of the year for an exemplary grid point in France (47.7 • N, 1.1 • E; shown as a red dot in panel a).Red lines indicate the composite mean of the bad years (80 seasons); blue lines indicate the composite mean of the normal years (1520 seasons).Shading shows the range between the 10th and 90th percentiles of the respective years.Variables shown are (b) dew-point temperature, (c) 30 d running sum of precipitation, (d) incoming shortwave radiation, (e) wind speed, (f) maximum temperature, (g) minimum temperature and (h) vapour pressure deficit.

Figure 3 .
Figure 3. Linear correlations between potential meteorological predictors and annual yield.(a) Correlation between the monthly, seasonal and growing season (GS) averages of the meteorological variables and annual yield for a grid point in France (47.7 • N, 1.1 • E).(b) Correlation of the climate extreme indicators (Table1) and annual yield for the same grid point.(c, d) Average of the same correlations across all Northern Hemisphere grid points.Note that panel (a) shows the correlation for all months included in the growing season of the grid point in France, while panel (c) shows the average correlation for a given month computed over all grid points containing this month in their growing season.

Figure 4 .
Figure 4. Confusion matrix for classification of observed and predicted normal and bad years.

Figure 6 .
Figure 6.Correlation between CSI and annual crop yield mean and variability for the 889 grid points included in the LASSO logistic regression model.(a) Scatterplot between CSI and mean annual yield.(b) Scatterplot between CSI and annual yield standard deviation.

Figure 7 .
Figure 7. Maps illustrating the selected predictors by the LASSO logistical regression.(a) Total number of selected variables.(b) Number of selected climate extreme indicators.(c) Combination of selected meteorological variables."None" means that only climate extreme indicators were selected; "All" means that at least 1 month from each of the three meteorological variables (VPD, Pr, T max ) is selected.(d) Number of selected seasons (out of the four seasons -DJF, MAM, JJA, SON).

Figure 8 .
Figure 8.For each possible predictor, we show the percentage of grid points for which this predictor has a non-zero coefficient in the LASSO logistic regression.(a) All continents (889 grid points in total), (b) North America (419 grid points), (c) Europe (233 grid points) and (d) Asia (210 grid points).The extension "Y1" means that the respective month belongs to the first calendar year of the growing season, while "Y2" means it belongs to the second calendar year of the growing season.

Figure B2 .
Figure B2.As Fig. 2 but for a grid point in the United States (44.3 • N, 90.0 • W).

Figure B4 .
Figure B4.Number of months in the growing season (number of months between the earliest sowing date and the latest harvest date).The growing season starts at the month containing the sowing date and ends with the month containing the latest harvest date among the 1600 model years.We discarded years with harvest date later than 365 days after the sowing date.Some growing seasons are 13 months long because we include both the entire first month and the entire last month.

Table 1 .
Meteorological drivers used in the analysis.Note: percentiles are grid point based; i.e. they are representative of the local climate. *