Seasonal weather regimes in the North Atlantic region: towards new seasonality?

seasonality? Florentin Breton1, Mathieu Vrac1, Pascal Yiou1, Pradeebane Vaittinada Ayar2, and Aglaé Jézéquel3 1Laboratoire des Sciences du Climat et de l’Environnement, UMR8212 CEA – CNRS – UVSQ, Université Paris-Saclay and IPSL, Orme des Merisiers, Gif-sur-Yvette, France 2Institut National de la Recherche Scientifique | INRS · Eau Terre Environnement Centre, Québec, Canada 3LMD/IPSL, Ecole Normale Superieure, PSL research University, Paris, France Correspondence: Florentin Breton (florentin.breton@lsce.ipsl.fr)

The paper is organized as follows: Section 2 describes the reanalysis and climate model data used in this study, as well as the clustering method to define seasonal weather regimes; Section 3 displays the results; and in Section 4, we discuss the findings and conclude. 70 We use daily fields of geopotential height at 500 hPa (Z500) as a proxy of atmospheric circulation from the ERA-Interim (hereafter ERAI) reanalysis dataset (0.75°x 0.75°spatial resolution; Dee et al. (2011)) and simulations from 12 climate models of the Coupled Model Intercomparison Project fifth phase (CMIP5; Taylor et al. (2012)) over the North Atlantic region (22.5 to 70.5°N, 77.25°W to 37.5°E) from 1979 to 2017, and then from 1979 to 2100 (the datasets are briefly described in Table 1).

Data and preprocessing
Daily surface air temperatures (TAS) from the same datasets are also extracted to study temperature features of SWRs.

75
Raw year-round data is used rather than seasonal (e.g. summer or winter) data or deseasonalized anomalies to capture both the year-round seasonal cycle and any long-term trend. In order to make the analyses and comparisons easier, all datasets are first given the same format. Calendars are standardized to 365 days per year ignoring bisextile years except for the Hadley Center simulations (year of 360 days). Historical experiment runs from climate models over 1979-2005 are concatenated to RCP8.5 experiment runs over 2006-2100 (respectively 1981-2005 and 2006-2099 for the Hadley Center model). The spatial grids of 80 data from climate model simulations are bilinearly interpolated to the ERAI grid.
A principal component analysis (PCA) is applied to the regridded Z500 fields in order to reduce the dimension of the data while keeping most of the variability and seasonality. The raw Z500 data are scaled by the square root of the cosine of the latitude to give equivalent weight to all grid cells when performing the PCA (as in e.g. Cassou (2008)). Only the first principal component (PC1) is kept and used for clustering because it captures between about 49% and 60% of the variance and between 85 about 95% and 99% of the seasonal cycle (spectral power at 1/365 of frequency; 1/360 for Hadley Center) over  for ERAI (similar to Vrac et al. (2014) on another reanalysis) and all climate models (not shown). Including more PCs in the analysis provided similar results (not shown) but brought more noise (more variance but only little more seasonality).

Definition of seasonal weather regimes
We use the Expectation-Maximization (EM) algorithm (Dempster et al. (1977)) based on a Gaussian mixture model (GMM;90 Peel and McLachlan (2000)) to cluster probabilistically the 14235 days (13320 for Hadley Center) of the 1979-2017 period into Seasonal Weather Regimes (SWRs). The EM algorithm estimates a multivariate probability density function (pdf) f of the data (here, daily PC1 values) as a weighted sum of K Gaussian pdfs f k (k = 1, . . . , K) (Pearson (1894)): where α k contains the parameters (means µ k and covariance matrix Σ k ) of f k and π k is the mixture ratio corresponding to 95 the prior probability that x (i.e. PC1 value) belongs to f k . The parameters α k and π k (k = 1, . . . , K) of the GMM are unknown and must be estimated (cf. Appendix). Finally, each cluster C k of days is defined based on the Gaussian pdfs, according to the principle of posterior maximum: In other words, each day is assigned to the cluster for which the probability of belonging is maximum, and the obtained clusters 100 are SWRs which correspond to a classification of the daily data. The freedom of EM in the definition of the SWRs strongly depends on the number K of clusters and on the constraints applied to the covariance matrices (constraining the geometry of the clusters, cf. Appendix). We tried different values for K (from K = 1 to K = 15) and evaluated them through the Bayesian Information Criterion (BIC; Schwarz et al. (1978)). Optimizing the BIC achieves a compromise between overfitting the observations with the model and the complexity of the model (cf. Appendix). Four SWRs (hereafter SWR4) correspond models reproduce a seasonal cycle of SWRs similar to ERAI, with regime 1 (hereafter R1) representing a winter-like season, R4 a summer-like season, and R2 and R3 transitional seasons (R2 around winter and R3 around summer). The composite maps associated with each regime are shown in Figure 2. For climate models, each regime composite map is determined individually (i.e. average map) and the multimodel composite is calculated as the mean of the distribution of the twelve composites. The spatial patterns of the four average regimes found in the models are very similar to those from ERAI. They are also visually 130 similar to the usual North-Atlantic weather regimes from the literature (e.g. Cassou (2008), Yiou and Nogaj (2004)).
The first regime (R1) corresponds to the positive phase of the North Atlantic Oscillation (NAO+) and the second (R2) to its negative phase (NAO-; Hurrell et al. (2003)). The third and fourth regimes (R3 and R4) respectively represent the Atlantic Ridge (AR) and Scandinavian Blocking (SB) atmospheric conditions, resembling the weather patterns from Yiou and Nogaj (2004), and Vrac et al. (2014). However, note that the temporal patterns of our SWRs are based on full years (like Vrac et al.

135
(2014)), unlike the literature considering weather patterns in winter (Cassou (2008), Yiou and Nogaj (2004)) or in summer (e.g. Guemas et al. (2010)). Thus, if our seasonal weather regimes resemble the usual regimes, they present differences in their definition and then in their properties.
In general, the climate models reproduce atmospheric weather patterns that are very similar to ERAI, but individual models are less successful (see Supplementary Fig. 1-4 conditions) decreasing in frequency, starting slightly later, ending slightly earlier, and being less persistent, and the opposite for R4 (i.e., summer conditions). The spatial patterns of TAS associated with the regimes are also similar to Vrac et al. (2014) ( Supplementary Fig. 5).
3.2 Future changes in seasonal weather regimes (1979-2100) 155 We now use the same method as before to define SWRs but based on the full simulation datasets over 1979-2100 to detect potential future changes. The first approach is to use four regimes (SWR4). Between the first three decades  and the last three decades (2071-2100) of the period, R1 (NAO+) occurs less often but is more intense for both Z500 and TAS (Supplementary Fig. 6-7). The opposite happens for R4 (SB) that occurs more often with less intense patterns, i.e. becoming closer to the seasonal mean. R2 (NAO-) occurs more often but is less intense, while R3 (AR) occurs slightly less often but is 160 more intense. Note that these patterns are relative to the seasonal mean, which increases substantially over the North Atlantic between the first and last three decades (averaging about +90 hPa for Z500 and +4°C for TAS; not shown). later while ending about two months earlier, and persisting less, whereas R4 starts about one month earlier while ending about one month and a half later, and persists more (Supplementary Fig. 9-10).
Over 1979-2100, SWR spatial trends of Z500 and TAS are in agreement between GCMs (Supplementary Fig. 11-12) and are more robust than over 1979-2017. These maps of linear trends are obtained by calculating the linear regression of the evolution of the variable (raw values) by gridcell, grey areas correspond to trends that are not significant (p-value > 0.05).

175
Both regression values and p-values are calculated individually by climate model, and then averaged over the twelve values.
However, these SWR spatial trends show different spatial evolutions between Z500 and TAS within regimes, hence partially decoupled evolutions of atmospheric dynamics and surface temperature.
Even if using four regimes allows us to explore the future with a traditional number of seasons, the low number of clusters limits the freedom of the clustering to allow the appearance or disappearance of significant structures. Therefore, we applied a second 180 approach to overcome this limit. We tested different numbers of regimes and chose seven regimes as a showcase because it illustrates the clearest transitions between the disappearance of past structures and appearance of future (new) structures.
With seven regimes (SWR7), the patterns of atmospheric circulation are very similar to those of surface temperatures in both past  and future (2071-2100) (Figures 3-4). Regime patterns seem to follow the seasonal cycle (pale colors) except R1, R2 andR7. Past (1979-2008) R7 corresponds to rare and very intense conditions of Scandinavian Blocking associated with 185 summer heatwaves over Northern continents. Future (2071-2100) R1 corresponds to rare and very intense NAO+ conditions associated with cold spells over Northeastern America, Greenland and Scandinavia.
Overall, we observe a shift in the spatial patterns (Z500 and TAS) of the regimes (Figures 3-4) with past R1 patterns becoming future R2 patterns, past R2 patterns becoming future R3 patterns, and so on until R6, while R1 pattern becomes seasonally more extreme (rarer and more intense pattern) and R7 pattern becomes seasonally more normal (more frequent and less intense 190 pattern). We calculated the average seasonal cycle of the seven regimes in a similar way to Fig. 1 but over the first three decades  and the last three decades (2071-2100), shown in Figure 5. R7 is a new summer regime almost absent in the past period  that replaces R6 and "pushes" all the other regimes towards the winter calendar days while R1 (past or old winter regime) collapses until almost disappearance. This shift in the seasonal cycle of the regimes between past and future appears very consistent with the shift in the regime spatial patterns.

195
The timing of these changes in regime occurrence during the year can be investigated through the monthly frequencies of the regimes over 1979-2100 (winter months in Figure 6 and summer months in Figure 7). Figure 6 shows the collapse of R1 happening throughout the 21st century. R2 takes the place of R1 in the beginning of the 21st century, and becomes replaced by R3 at the end of the 21st century. Symmetrically, R6 is replaced by R7 during the second half of the 21st century ( Figure   7). The evolution of the starting and ending dates as well as persistence of R1 and R7 are very consistent with the evolution of All regimes except R7 show a similar pattern of Z500 change over the region: increase in the Southern part and decrease in the Northern part, whereas R7 shows widespread increase that is stronger in the South and not robust between climate models in the North of the region (Supplementary Fig. 15). Interestingly, these changes in circulation patterns seem to be opposite to the expected effects from Arctic amplification, such as amplified warming and geopotential height increase over circulation 205 dynamics that are linked to midlatitude weather (Barnes and Polvani (2015), Cohen et al. (2014), Overland et al. (2015)). The strongest warming over the region is observed in R1 and R7, whereas R3 to R6 show (unexpected) cooling over the continents ( Supplementary Fig. 16). The origin of this cooling is investigated later in the discussion of the paper (Section 4.3). The appearance and disappearance of regimes observed in SWR7 over 1979-2100 is absent from the 1979-2017 period where we tested with four up to seven regimes. The increasing trend of Z500 over the North Atlantic region, mainly due to human influence (Christidis and Stott (2015)), is expected to be driving the evolution of the SWRs but changes in spatial patterns could also play a role. To investigate this, we use SWRs based on detrended data (d-SWRs) and focus on the average d-SWRs of climate models. This detrending corresponds to removing the calendar (by day in the year) trend of the regional average Z500 (or TAS, see Methods 2.3). By comparison to ( Figure 8). However, spatial structures of d-SWRs present some minor variability for Z500 ( Figure 9) but major changes for TAS in which case future patterns are almost symmetrically opposite to past patterns ( Figure 10). This small evolution of Z500 spatial patterns in d-SWRs can be explained by spatial trends that are either not significant in individual climate models or in disagreement between climate models, as shown by large greyed areas in Supplementary Fig. 17. However, most of TAS 220 spatial trends in d-SWRs are robust and show warming over continents and cooling over oceans ( Supplementary Fig. 18). This warming contrast can be explained because of the higher heat capacity and evaporative cooling potential of ocean surface than land surface, and ocean mixing (e.g. Dai (2016)). These trends also show Arctic amplification (i.e. warming stronger at the pole than at lower latitudes), especially in winter (R1 to R3).
To further understand the roles of the large-scale increases in Z500 and TAS (hereafter LGI), and of the seasonal shift of The contribution of LGI corresponds only to widespread increasing Z500 and TAS in all regimes whereas the shift of SWRs towards winter corresponds to widespread decreasing Z500 and TAS in most regimes (except R1, R2, and unconditionally to regimes). The two opposing effects of LGI and the seasonal shift can explain the existence of decreasing spatial trends of Z500 and TAS observed earlier within SWRs.

4 Conclusive discussions
We used seasonal weather patterns (Vrac et al. (2014)) by clustering Z500 from the ERAI reanalysis and 12 CMIP5 climate models to study past  and future (1979-2100) seasonal structures of mid-troposphere atmospheric dynamics (Z500) and air surface temperature (TAS) over the North Atlantic region and their evolutions in time.  (2013)). The structures (spatial, temporal) and evolution (timing) of SWRs differ between climate models over 1979-2017 and even more over 1979-2100.

Projected evolutions of seasons
When looking at future (1979-2100) evolutions of SWRs with both four and seven regimes, the frequency of historical winter conditions decreases while that of historical summer conditions increases and occurrences of transitional regimes move towards 250 the winter period. These changes are attached to large increases in the seasonal mean of Z500 and TAS over the North Atlantic.
The results for summer are consistent with those of Cassou and Cattiaux (2016) but not the results for winter, which could be due to the very different methods used to define seasonality. Moreover, allowing for more freedom in the definition of the SWRs by using seven regimes rather than four, we find a collapse of the regime associated to past winter conditions, corresponding to rare cold spells at the end of the 21st century, and the growth of a new summer regime corresponding to past heatwaves that 255 becomes dominant in summer by the end of the 21st century.
These results suggest that past winter conditions are becoming shorter in time and past summer conditions are broadening and intensifying. However, in our case the apparent changes in seasonality seem to correspond rather to a swap between regimes since occurrences of past R1 are replaced by R2 in the future, past R2 are replaced by R3, and so on until R6. Note that R1 conditions correspond to the past winter pattern that almost disappears at the end of the 21st century. Hence, for the 260 future projections, R1 corresponds to extreme winter (intense and rare NAO+) with respect to the "normal" future seasonality.
Therefore, this regime swap, with symmetry between spatial patterns and seasonal cycle, suggests that the seasonality of the weather patterns does not change in a major way with respect to the evolution of the raw seasonal cycle of Z500 and TAS.
Over the last three decades (2071-2100) respectively to the first three decades , SWR4 had about 75% fewer days  The appearance and disappearance of regimes over 1979-2100 do not happen in 1979-2017, probably due to the smaller scale of change in Z500 in this period by contrast to the future where the full extent of the emission scenarios kick in inside the climate model simulations. We found that spatial trends of increasing and decreasing Z500 within regimes, generally associated respectively to TAS warming and cooling trends, are the result of two opposite processes: the large-scale increase of Z500 due 285 to human influence, and the seasonal shift of regimes towards the winter period, where Z500 and TAS are lower than during the rest of the year. This seasonal shift explains the decreasing Z500 trends, generally associated with cooling, which are observed in several regions within SWRs and would otherwise not be possible. This explanation also covers the cooling trends reported by Vrac et al. (2014), understood here as a temporal shift of the regimes' occurrences towards the winter period with cooler conditions rather than a seasonally-stationary cooling.

290
The d-SWRs results (i.e., SWRs obtained from detrended Z500) showed almost no temporal evolution between past and future, which means that the Z500 large-scale increase is the main cause for the evolution of SWRs. Christidis and Stott (2015) reported that the large-scale Z500 increase during 1979-2012 was mostly due to human forcings. So, although climate models overestimate the surface warming and Z500 increase over the past period (Christidis and Stott (2015), Jones et al. (2013)), there might be a strong link between the human forcings and the shift in seasonality of the regimes that we detect here, since 295 most of the evolution of the regimes disappears when we remove the calendar large-scale Z500 increase.

Limitations and perspectives
Even if the regimes and their evolutions in climate models in the past period are similar to those from ERAI, we note a few limitations and sources of uncertainty. The representation of the climate in ERAI and models has uncertainties and errors, especially in atmospheric dynamics (Shepherd (2014)) and surface temperature in models (Jones et al. (2013)). Bias correction 300 methods could lead to more realistic seasonal weather regimes but could imply other issues such as modifications of spatial and temporal structures (and trends) that could possibly generate physical inconsistencies (Vrac (2018), François (2020, in review)).
Overall, although our study highlights the value of a clustering approach for comparing (and evaluating) models as well as seasonal structures, the apparent consistency that we find between climate models on the future evolution of seasonal dynamics 305 seems at odds with other studies where the projected circulation response differs strongly between models (e.g. Barnes and Polvani (2015)). Indeed, clustering approaches might hide inter-model variability, or seasonal variability (depending on the number of clusters). Additional sources of uncertainty include the choice of RCP8.5 for the future emission scenarios and the choice of Z500 (i.e., mid-troposphere atmospheric circulation) rather than surface, low-or high-troposphere conditions.
Similar methods to those that we used could be applied to explore changes in weather seasonality at a more local scale by 310 downscaling meteorological variables (e.g. humidity, wind speed, temperature) based on large-scale weather regimes (Vrac and Yiou (2010)) in order to bring more locally-relevant insights for social matters related to the weather. Understanding Gaussian distributions are ellipsoids in space determined by the mean (location) and covariance matrix (geometric features: volume, shape, orientation). The parameters of the Gaussian Mixture Model (GMM) are the means µ k , covariance matrix Σ k , and mixture ratio π k , describing the K(k = 1, . . . , K) Gaussian distributions. The estimation of the GMM parameters is done iteratively in the Expectation Maximization (EM) algorithm by maximizing the likelihood that the current statistical model represents the observed data (Fraley and Raftery (2002)). Before being optimized, the GMM parameters are initialized by the 435 result of a hierarchical model-based agglomerative clustering (multivariate), or by separation in quantiles (univariate), rather than random initialization. This approach avoids poor initial partitioning leading to the convergence of the likelihood function to a local maximum rather than a global one (e.g. Scrucca and Raftery (2015)). The principle of EM is based on the possibility to calculate π when knowing α (µ and Σ) and vice-versa, thus enabling the optimization of both. After the initialization, the Expectation-step (or E-step) estimates the posterior probability p ik (update of π ik ) that the observation x i belongs to f k with 440 the current parameter estimates (at stage t): n π t+1 where n is the number of observations. The algorithm repeats the E-and M-steps iteratively until termination when model 450 parameters converge and the maximum likelihood is reached (convergence of the log-likelihood function) or after a maximum number of iterations.

Appendix B: Model selection with the BIC and covariance matrix
The Bayesian Information Criterion (BIC) is a criterion for model selection that helps to prevent overfitting by introducing penalty terms for the complexity of the model (number of parameters). In the calculation of the BIC, these penalty terms 455 compete with the likelihood function which determines whether adding parameters improves the model by better fitting the 27 https://doi.org/10.5194/esd-2020-26 Preprint. Discussion started: 25 May 2020 c Author(s) 2020. CC BY 4.0 License. observed data. In our case, minimizing the BIC achieves a good compromise between keeping the model simple and a good representation of the observed data.
where K is the number of clusters, L the likelihood of the parameterized mixture model, p the number of parameters to 460 estimate, and n the size of the sample (e.g. 14235 days over . An additional constraint on the definition of clusters is on the covariance matrix. Our GMM is univariate (since we only use PC1) so the variance can be equal or different between clusters (i.e. constraint on volume but not on shape or orientation of clusters).

Appendix C: Detrended-data based seasonal weather regimes
We first calculate the trend of the Z500 (and TAS) spatial mean over the whole region for each calendar day, and then remove 465 this trend in each grid-cell. A nonlinear cubic smoothing spline is applied for the detrending in order to capture most of the trends correctly (not shown). However, doing only this would result in losing the seasons. Therefore, after removing the trend, we add the estimated seasonal cycle of 2017 as the reference cycle in order to keep a stationary seasonality.