Inter–annual global carbon cycle variations linked to atmospheric circulation variability

,

Recently, Sippel et al. (2019) applied Ridge Regression(RR), a regularized linear regression method (Hastie et al., 2009;Friedman et al., 2010), to quantify the component of precipitation and temperature variability driven by atmospheric variations 70 based on sea level pressure (SLP) fields, rather than teleconnection indices. Their approach allowed them to robustly infer the main spatio-temporal patterns of atmospheric variability influencing these two climate variables. On the one hand, including a field of circulation-based predictors, avoids considering predefined assumptions about their spatial configurations as they are common to teleconnection indices, while at the same time compensating for relatively short historical records. The regularization approach, on the other hand, allows to overcome overfitting and multicollinearity issues due to short time series and a very 75 large number of spatial predictors.
3 2 Data and methods 2.1 CO 2 datasets for the recent past 90 We select the CO 2 time-series datasets from the Global Carbon Budget (GCB) 2018 version 1.0 (Le Quéré et al., 2018): the atmospheric CO 2 growth rate (AGR), the land sink from models (SL DGVMs ), the residual land sink (SL Resid ), and the land sink from two atmospheric inversions.
The difference of annual atmospheric CO 2 in a given year and the previous year (Ballantyne et al., 2012;Dlugokencky and Tans, 2018;Le Quéré et al., 2018) corresponds to the AGR, which is based on direct observations. The AGR is based 100 on the average of well-mixed CO 2 measurements at multiple global stations from the US National Oceanic and Atmospheric Administration Earth System Research Laboratory (NOAA ESRL) (Dlugokencky and Tans, 2018).
F F emissions are based on inventories, while F LU C, SL and SO are estimated by models (SL DGVMs and SO, respectively in Eq. (1), all of which contain uncertainties (Le Quéré et al., 2018). The total emissions from F F and F LU C minus AGR should equal the total SO and SL DGVMs (Eq. (1)). Due to uncertainties in modeled land and/or ocean sinks or in land use 105 estimations (Bastos et al., 2020;Hauck et al., 2020), the budget cannot be balanced and thus an imbalance term (IM B) is introduced to the budget.
The period common to the CO 2 time-series  is selected.

Sea level pressure
We use global monthly mean SLP fields from ERA5 reanalysis with the spatial resolution of 0.25 • ×0.25 • (Bell et al., 2020), 125 at monthly time-steps and covering the period 1950-1978 (Bell et al., 2020) and 1979-present (Hersbach et al., 2019. The period common to other datasets of 1958-2017 is selected here.

Teleconnection indices
In addition to SLP fields, we select 15 teleconnection indices from the atmosphere-ocean variability, Northern Hemisphere(NH), and Southern Hemisphere(SH).
For the CESM simulations, the SLP fields are originally provided at 1.9 • ×2.5 • spatial resolution at monthly mean time-steps, which we then resampled ::::::: resample : to 5 • ×5 • spatial resolution. Annual mean NBP was : is : calculated from the monthly fields.
NBP and SLP fields were :: are : selected for the simulation period 1000-5000 year. 175 2.3 Methods :::::::: Statistical :::::::: analysis The overall goal is to characterize :::::: annual variations in the global C-cycle that can be explained by large-scale atmospheric circulation :::::::: variability. Here, the pixel-based time-series of SLP anomalies are used as predictors (p ≥ 800) of CO 2 time-series (n ≤ 54 years) in a linear regression model. However, the small sample size relative to the large number of predictors (n < p) can cause severe overfitting problems and result in unstable predictions (Hastie et al., 2009). Moreover, the existing spatial 180 correlations among the neighboring pixels of SLP anomalies might cause multicollinearity among the predictors (von Storch and Zwiers, 1999). The potential multicollinearity problem results in unstable regression ::::: Ridge ::::::::: Regression coefficients in least  (Mantua et al., 1997;Mantua and Hare, 2002). Using EOF and regression over 20 • -90 • N in the Pacific (Mantua et al., 1997). Downloaded from NOAA NCEI.  (Rayner et al., 2003;Enfield et al., 2001).  (Pittock, 1980(Pittock, , 1984Jones et al., 1999), data calculated by Henley et al. (2015), data from University of East  Sippel et al. (2019) applied Ridge Regression (RR) to avoid these overfitting and multicollinearity problems. RR ::::: Ridge 185 ::::::::: Regression is a regularized linear regression, whose the fundamental principle is to introduce a constraint (hyper-parameter λ) to regularize the varying regression coefficients in least squares estimation (Hastie et al., 2009;Friedman et al., 2010). The regularized variance comes with a compromise of biased predictions and is addressed as the bias-variance trade-off (Hastie et al., 2009). When selecting the best hyper-parameter λ, this trade-off is considered to achieve stable (low variance) while slightly biased predictions (Hastie et al., 2009). 190 Model performance is evaluated by the R 2 , the Pearson's correlation R, and mean squared error (MSE) of the original CO 2 time-series against predicted values. Pearson's correlation R is selected as the main measure of predictability. At the same time, R 2 is optimized and verified for validity, and , ::: and : the significance P < 0.05 is selected. Given the relatively short period (n < 60), here we use leave-one-out (LOO) cross-validation to achieve optimal model training and testing. For each train and test group splitting, we select the train group as all years excluding three consecutive years and the middle year of those 195 three is then selected as the test sample. We excluded :::::: exclude : the preceding and following year :::: years : to reduce the potential influence of temporal auto-correlation. Here, we refer to the correlation of RR LOO predicted time-series with observed values as ρ predictors , where the predictors are either SLP fields (ρ SLP ) or teleconnection indices (ρ Tele ). The regression coefficients of RR LOO are described as ω SLP and ω Tele with SLP and teleconnection indices as predictors respectively. processes. In this study, DJF SLP is in ::::::::: aggregated :: to 9 • ×9 • with , :: so : p = 800. Note that to reduce the heavy computation load, when conducting spatial and temporal sensitivity study as described in Section 2.4, the range of λ is lower and with smaller steps than the range and step shown in the Panel (b). For example, when only selecting the tropical domain of SLP, the number of predictors is much less than 800. So the λ is selected in range [10, 1000] with a step of 50. When using teleconnection 210 indices instead of SLP anomalies, the predictors are ::::: equal :: or less than 15, the range of λ is selected from [1, 200] with a step of 2.
The first step is to divide datasets into train and test groups, as shown in panel (c). The grouped training datasets are then used for model training and tune the best λ through 5-folds cross-validation, the best λ that achieves the optimal prediction (highest LOO R 2 ) is then selected by the model to predict test datasets. The model then starts another iteration with train and 215 test grouping. Panel (c) describes the LOO :::::::::::: leave-one-out train and test grouping, the years before and after the selected test year are removed for each grouping to reduce the impact of temporal auto-correlations in CO 2 time-series.
8 RR (LOO) ::::: Ridge ::::::::: Regression ::::::::::::: leave-one-out cross-validation is performed using the Python package Scikit-learn "Ridge" and the λ is tuned by Scikit-learn "RidgeCV" (Pedregosa et al., 2011). ::: The :::::: global ::::: maps :::: ( 3. Spatial sensitivity study. We evaluate the predictability of historical annual CO 2 time-series using DJF SLP anomalies The intervals shorter than 100 years are all in 1 year step. We also evaluate the error rate of the model in each sliding window of 15, 20, 30, and 40 year lengths. The error rate is calculated by the number of invalid predictions that with have significance P > 0.05 in ρ SLP divided by the number of total predictions within a given window.
We :::: First, ::: we find the detrended :::::: annual CO 2 time-series are generally consistent with each other, except the SL DGVMs show ::::: shows slight deviation (Fig. 2 a). We find two anomalous years (1987 and 1998), which show deviations larger than 2 standard deviations in most CO 2 time-series, both signifying apparent AGR increases and subsequent lower land sink (Fig. 2 a). These two years correspond to strong El Niño events, which are usually associated with below-average land CO 2 uptake (Keeling 265 et al., 1995;van der Werf et al., 2004;Bonan, 2016).
Here, we refer to the ρ SLP as the LOO correlation of predicted and observed time-series based on RR and with SLP anomalies as predictors. Accordingly, ρ Tele refers to the LOO correlation that has teleconnection indices as predictors. LOO correlation by linear regression based on the single predictor of SOI index is represented by ρ SOI . SLP :::::: Global :::: SLP and teleconnection indices show comparable predictability of :::::::: predictive :::: skill ::: of ::::: global : C-cycle IAV in 270 winter, while teleconnection indices have higher predictability :::::::: predictive :::: skill in spring (Fig. 2 b). In both periods, the value of ρ SLP (except SL DGVMs ) is higher in DJF (0.51-0.70) than in MAM (0.29-0.60). On the other hand, the values of ρ Tele are higher in MAM (0.57-0.79) than in DJF (0.53-0.70). The relative low prediction ::::::: predictive : skill of global SLP anomalies compared to teleconnection indices might result from: 1) limited sample size (less than 60 years) and a large number of predictors (p = 800) for RR ::::: Ridge ::::::::: Regression : training with global SLP anomalies. But for teleconnection indices and SOI, the 275 predictability skills are much less influenced by the limited sample size due to their limited predictors (p ≤ 15 for teleconnection indices and p =1 for SOI). As we increase the sample size to over 100 years, the prediction :::::::: predictive : skill of SLP anomalies increases considerably, as is shown in temporal sensitivity study (Fig. 6a), and 2) the predictability :::::::: predictive :::: skill of SLP anomalies in explaining global C-cycle IAV can be reduced in domains with large local rather than global impacts of atmospheric variations to land carbon sinks (Jung et al., 2017). In such domains, the SLP anomalies might show strong 280 relationship to local C-cycle variations but weaker link to global C-cycle variations. Selecting the :::: SLP domains with higher contribution to the global C-cycle variability could improve the predictability, as is shown by the analyzes of sensitivity of the results to the spatial domain ::: SLP :::::: spatial ::::::: domains : (Fig. 4). Compared to ρ Tele :: the ::::::::: predictive :::: skill :: of :::::::::::: teleconnection :::::: indices, which includes a set of 14 teleconnection indices for period 1959-2017 and 15 for period 1980-2017 as predictors, the ρ SOI :::::::: predictive :::: skill :: of :::: SOI : is slightly lower or similar in both 285 seasons, with 0.53-0.67 in DJF and 0.60-0.74 in MAM (Fig. 2 b). This is consistent with the dominant role of ENSO in driving ::::: global : C-cycle IAV, with other modes showing less contributions. Such interpretation requires caution as the indices cannot fully represent the complex atmospheric dynamics.
Potential explanations for this pattern include trends found in SLP and SLP variability over the Pacific and Southern Atlantic (Schneider et al., 2012;IPCC, 2013;Roxy et al., 2019), or enhanced sensitivity of C-cycle variability to climatic drivers, 400 particularly in semi-arid areas, under progressive climate change (Wang et al., 2014;Poulter et al., 2014).  57 1959 1993 1961 1995 1964 1997 1966 1999 1968 2001 1970 2003 1972 2005 1974 2007 1976 2009 1978 2011 1980 2013 1984 2015 0 Understanding and attributing these changes to given processes is beyond the scope of this study, but these results highlight the importance of the temporal domain when analyzing IAV in the ::::: global : C-cycle. Since ::: the :::::::: observed CO 2 time-series are short and cover only limited temporal domains, results are likely to be affected by multi-decadal internal climate variability, in addition to external forcing. Moreover, the data-driven RR ::::: Ridge :::::::::: Regression method to quantify circulation-induced ::::: global 405 C-cycle variability uses a large number of predictors, while only relatively short time series are available for training, which may negatively affect the model's performance. Therefore, we further test the sensitivity of the results :::::::::: predictability : to the length of the time-series (Fig. 6). We test the predictability of ::::: global : C-cycle IAV for different lengths of the temporal domain: 15, 20, 30 and 40 years for the datasets in GCB2018 ::::: Global ::::::: Carbon :::::: Budget ::::: 2018 and CESM simulations and 100, 500 and 2000 years for CESM only.
From the 15-yr to 30-yr interval, the error rate of ρ SLP of AGR R reduces from 0.4 to 0 in DJF ::::: winter : global and 0.13 to 0 in 450 DJF ::::: winter : tropical. The error rate of ρ SLP decreases to less than 0.16 in 30-yr interval, except ρ SLP of AGR R decreases to 0.24 in MAM ::::: spring global, which also matches the relative low predictability of MAM ::::: spring : SLP anomalies in the period 1980-2017 with different spatial domains (see Appendix A Fig. A7 b). All error rates are reduced to almost zero in a 40-yr interval.