Divergent historical GPP trends among state-of-the-art multi-model simulations and satellite-based products

Understanding historical changes in gross primary productivity (GPP) is essential for better predicting the future global carbon cycle. However, the historical trends of terrestrial GPP, owing to the CO2 fertilization effect, climate, and land25 use change, remain largely uncertain. Using long-term satellite-based near-infrared radiance of vegetation (NIRv), a proxy for GPP, and multiple GPP datasets derived from satellite-based products, Dynamic Global Vegetation Model (DGVM) simulations, and machine learning techniques, here we comprehensively investigated their trends and analyzed the causes for any discrepancies during 1982–2015. Although spatial patterns of climatological annual GPP from all products and NIRv are highly correlated (r > 0.84), the spatial correlation coefficients of trends between DGVM GPP and NIRv significantly 30 decreased (with the ensemble mean of r = 0.49) and even the spatial correlation coefficients of trends between other GPP products and NIRv became negative. By separating the global land into the tropics plus extra-tropical southern hemisphere (Trop+SH) and extra-tropical northern hemisphere (NH), we found that, during 1982–2015, simulated GPP from most of the models showed a stronger increasing trend over Trop+SH than NH. In contrast, the satellite-based GPP products indicated a substantial increase over NH. Mechanistically, model sensitivity experiments indicated that the increase of annual GPP was 35 dominated by the CO2 fertilization effect (Global: 83.9%), albeit a large uncertainty in magnitude among individual simulations. However, the spatial distribution of inter-model spreads of GPP trends resulted mainly from climate and land-use change rather https://doi.org/10.5194/esd-2021-69 Preprint. Discussion started: 23 September 2021 c © Author(s) 2021. CC BY 4.0 License.

The application of satellite-derived GPP proxy datasets provides a breakthrough for estimating global GPP (Running et al., 2004;Badgley et al., 2019;Piao et al., 2020). Many GPP proxy indices, such as normalized difference vegetation index (NDVI), 70 enhanced vegetation index (EVI), and SIF, have been widely used to estimate the global GPP (Frankenberg et al., 2011;Guanter et al., 2014). However, each of them has its shortcomings. For example, NDVI can be saturated in tropical regions, demonstrating its nonlinear relationship with GPP (Badgley et al., 2017;Badgley et al., 2019;Camps-Valls et al., 2021). The EVI index improves the NDVI algorithm, but this index has not entirely solved the saturation problem (Huete et al., 2002).
Without dealing with the problem of distinguishing whether the signal comes from the plant or other interference factors, 75 satellite retrieval of SIF measures the light emitted by chlorophyll in leaves and can be used as a robust proxy of GPP (Frankenberg et al., 2011;Mohammed et al., 2019). However, the time range of global SIF products is short, with direct observations only available from 2007. Representing the proportion of reflected near-infrared radiation attributable to vegetation, NIRv is a relatively recent GPP proxy (Badgley et al., 2017). Compared to NDVI and EVI, the saturation problem of NIRv and GPP in the tropical region is weakened because of the mixed effects of background brightness, leaf area, and the 80 distribution of canopy photosynthetic capacity with depth were largely eliminated. Since NIRv can be directly obtained from observational datasets of the AVHRR sensors, it can be derived from 1982 to the present. Moreover, previous studies have shown that NIRv and SIF are closely related and indicated that NIRv could well represent changes in GPP (Badgley et al., 2017;Badgley et al., 2019;Camps-Valls et al., 2021;Wang et al., 2021c). 85 Although there have been a lot of studies focusing on extreme anomalies, the seasonal cycle, interannual variation, and the climatological pattern of global and regional GPP based on the multiple GPP products and proxy indices (Chen et al., 2017;Madani et al., 2020;Wang et al., 2021b), few efforts have been devoted to evaluate the long-term GPP trends across different GPP sources and to analyze the causes of uncertainties. This study comprehensively investigates historical GPP trends during 1982−2015, based on the satellite-derived GPP proxy (NIRv), TRENDYv6 multi-model simulations, machine-learning 90 products, satellite-based estimates, and site-level observations. Section 2 describes the datasets and statistical methods used.

TRENDYv6 multi-model simulated GPP
We used the model simulation results conducted under the auspices of the "Trends and drivers of the regional scale sources and sinks of carbon dioxide" (TRENDY) Project . We used 10 DGVMs in the TRENDYv6 project for the period of 1982-2015, including CABLE (Haverd et al., 2018), CLASS-CTEM (Melton and Arora, 2016) al., 2010), DLEM (Tian et al., 2015), ISAM (Jain et al., 2013), OCN (Zaehle et al., 2010), ORCHIDEE-MICT (Guimberteau et al., 2018), ORCHIDEE (Krinner et al., 2005), VEGAS (Zeng et al., 2005), and VISIT (Kato et al., 2013). There is a suite of experimental protocols in the TRENDY project, and we here explored GPP trends and their mechanisms using the GPP outputs from three simulations. In detail, DGVMs were run under the varying CO2 concentration, and constant climate conditions and land-use change in S1; the varying CO2 concentration and climate conditions, with constant land-use change in S2; the varying 105 CO2 concentration, climate conditions, and land-use change in S3. Hence, the S1 scenario represents the impact of the CO2 fertilization effect. The contributions of climate change and land-use change (hereafter "LUC") are calculated through the differences between S2 and S1, S3 and S2, respectively. These modelling details are listed in Table 1. In sections 3.2 and 3.3.1, we calculated the ensemble mean of the 7 model simulations, which included all scenarios as the 110 DGVM ensemble GPP and calculated their standard deviation to represent inter-model spread across these models. In other sections which only need results from S3, we used the ensemble mean simulations from 10 models. Simulation datasets in the corresponding experiments (S1, S2, and S3) as available for models indicated with the notation of "√".

2 FLUXCOM GPP
The FLUXCOM datasets comprised of 120 global carbon flux products generated by nine machine learning techniques, based 120 on site-level observed GPP measured by eddy covariance and upscaled with remote sensing information and meteorology data (Jung et al., 2020). This research used the ensemble mean of GPP datasets forced by CRUJRA climate data and generated https://doi.org/10.5194/esd-2021-69 Preprint. Discussion started: 23 September 2021 c Author(s) 2021. CC BY 4.0 License. from three machine learning techniques (random forest, artificial neural network, and multivariate adaptive regression splines) from 1982 to 2015. The original spatial resolution of this dataset is 0.5 º × 0.5 º.

Satellite-based GPP products
In this study, the GLASS GPP and revised EC-LUE GPP estimates were used as representatives of long-term satellite-based GPP products from 1982 to 2015.
GLASS GPP originated from the Eddy Covariance-Light Use Efficiency (EC-LUE) model (Yuan et al., 2007), which 130 considered various impact factors (NDVI, photosynthetically activate radiation, temperature, CO2 concentrations, the Bowen ratio of sensible to latent heat flux, water vapor pressure deficit, direct radiation fluxes, and diffuse radiation fluxes) and nine ecosystem types to accurately estimate the long-term change of GPP (Yuan et al. 2019). The original spatial resolution of this dataset is 0.05 º × 0.05 º.

135
The revised EC-LUE GPP is a long-term GPP dataset based on the LUE equation. Zheng et al. (2020) generated the revised EC-LUE GPP using the following formula: where !"# is the maximum LUE of sunlit leaves and !"$ is the maximum LUE of shaded leaves; is the PAR absorbed by sunlit leaves and "$ is the PAR absorbed by shaded leaves. , , and represent the downward 140 regulation scalars of atmospheric CO2 concentration, temperature, and VPD on LUE with the range from 0 to 1. Specifically, the direct effect of CO2 fertilization on GPP is determined by the following equations: (2) where * represents the CO2 concentration inside the leaf, + is the atmospheric CO2 concentration, means the CO2 145 compensation point in the absence of dark respiration (ppm), and means the ratio of CO2 concentration inside the leaf to that in the atmosphere (Farquhar et al., 1980).
After adding the effect of CO2 fertilization, the GPP generated from this model is closer to the site observation data of more than five years (R 2 = 0.44) than that of other LUE models (R 2 ranged from 0.06 to 0.30) (Zheng et al., 2020). The original 150 spatial resolution of this dataset is 0.05 º × 0.05 º.
The spatial pattern and temporal changes of these datasets are highly consistent (Fig. S1, Fig. S2, and Fig. S3). Therefore, for simplicity, we averaged them to represent satellite-based GPP products.

Site-level GPP observations
We also adopted 20 EC sites from the FLUXNET2015 dataset (Pastorello et al., 2020) with an observation period longer than 15 years to evaluate the performance of different global GPP products. These sites included 5 vegetation types: evergreen broadleaf forest (EBF), evergreen needleleaf forest (ENF), deciduous broadleaf forest (DBF), grassland (GRA), and mixed forest (MF), all distributed over Northern Hemisphere ( Table 2). The GPP variable used in this study is GPP_NT_VUT_REF. 160 When evaluating the global gridded GPP datasets with the site observations, the bilinear interpolation method was used to interpolate the gridded data to the specific site locations.

6 Leaf area index
GLASS LAI version 03 was used to compare the TRENDY model ensemble LAI (S3) because it is an input parameter for GLASS and revised EC-LUE GPP. This dataset is originated from AVHRR product before 2001 and MODIS surface reflectance product (MOD09) after 2001. Biome-specific general regression neural networks were used to fuse these two 170 datasets. Its original spatial and temporal resolutions are 0.05º × 0.05º and eight days, respectively (Xiao et al., 2016).

Statistical methods used
Due to the difference among temporal and spatial resolution of each product, we resampled all GPP datasets into 0.5 º × 0.5 º through the first-order conservative remapping method: 175 where 8 , is the area-averaged destination quantity, , means the area of grid k, and f is the quantity in an old grid with an overlapping area with the destination grid. We resampled NIRv and LAI into 0.5 º × 0.5 º using bilinear interpolation and generated the annual datasets according to the weights of days. We then calculated the global and regional total GPP time series from each GPP dataset and generated the linear trends of each dataset at the pixel level to generate the spatial patterns 180 of GPP trends and at the global and zonal scales to detect the historical changes in GPP. The linear trend was calculated as where k represents the linear trend of the time series, b is the intercept, and is the error term.
Finally, non-parametric Mann-Kendall trend tests were used to evaluate the level of significance for each GPP time series 185 because it does not acquire the data to follow the normal distribution (Khaled H. Hamed, 1998).

Different GPP trends in DGVMs and satellite-based products
During 1982-2015, the spatial patterns of climatological annual GPP from different products are highly correlated with 190 satellite-derived NIRv with their spatial correlation coefficients ranging from 0.84 to 0.94 (Table 3) (Table 3).

200
Although the DGVM ensemble GPP trends are close to those of NIRv than the other GPP products used here, inconsistencies exist in spatial distribution and magnitude of GPP trends among individual model simulations. Firstly, the spatial correlations among individual models and NIRv range from 0.15 to 0.48. Secondly, the GPP simulated by the DLEM shows increasing trends in about three-fourths (77.3%) of the global land areas, while the GPP simulated by the CLASS-CTEM has increasing trends in only about one-third (33.9%) of the land area (Table 3). Finally, in magnitude, the trends of VEGAS GPP appear 205 generally weaker than other models (Fig. S2, Fig. S4a).
The spatial distributions of the trends between NIRv and the remaining non-DGVM products are quite different, ranging from uncorrelated to negatively correlated (Table 3). For FLUXCOM, owing to the lack of CO2 fertilization effect, the GPP trend pattern generally shows no significant trends over 74.1% of the land areas. The most striking differences between NIRv and 210 satellite-based GPP products are located in the low latitudes, especially over Amazon and Indonesia, with the latter indicating significant decreasing trends over these regions.  220 Table 3. Spatial information for NIRv and different GPP products.  As mentioned above, the trends of DGVM ensemble GPP and NIRv show relatively consistent patterns for 1982-2015.

Products
However, the DGVM ensemble mean GPP shows slightly stronger trends over the tropics than over Northern Hemisphere, whereas NIRv has relatively stronger increasing trends over Northern Hemisphere (Fig. 2a). The tropical region shows the 225 most extensive inter-model spread for the DGVM ensemble, with the strongest trends in CABLE and the weakest trends in VEGAS (Fig. S4a, Fig. S5b). The latitudinal distribution of satellite GPP products is quite different from DGVM ensemble and NIRv. Satellite-based GPP shows a significant decrease over the tropical region, rebounding to the most substantial increase located between 15°S and 25°S. The GPP increases in the middle and high latitudes of the northern hemisphere, but its magnitude is weaker than that of the DGVM ensemble in the most northern regions. There are sufficient numbers of pixels 230 available in these regions to lend confidence to our results (Fig. 2c).
The GPP trends are quite different for the long-term  and recent short-term (2001-2015) periods (Fig. 2b, Fig. S4,

245
The relative changes of annual total GPP and the linear trends of GPP among the DGVM simulations, FLUXCOM, and satellite-based GPP products, vary substantially both globally and regionally (Fig. 3, Fig. S3). Based on the analysis over the 34 years, the trend of global GPP was about 0.37 (DGVM ensemble mean), 0.0 (FLUXCOM), and 0.18 (satellite-based GPP products) GtC year −2 , respectively (Fig. 3d). Before 2001, the GPP trend of satellite-based products was generally stronger than that of DGVMs (Fig. 3a). Separating the global land into the tropics plus extra-tropical southern hemisphere (Trop+SH: 250 90°S-23°N) and extra-tropical northern hemisphere (NH : 23°N-90°N), results show that more than half of the increase of GPP in the DGVM ensemble is from the Trop+SH (57%). In comparison, the increase of GPP in satellite-based GPP products is mainly attributed to the NH (60%). In individual models, the majority of them show similar results to the model ensemble mean. With the exception of ORCHIDEE-MICT and VEGAS, the others indicate that Trop+SH largely contributes to the global GPP trend (ranging from 54.3% to 65.3%) (Fig. 3d). 255 After 2000, there were obvious differences in the trend between DGVM ensemble and satellite-based GPP products, with the 265 satellite-based GPP products showing an obvious turning point (Figs. 3a-c). Both GLASS and revised EC-LUE GPP changed from significant increasing trends to significant decreasing trends, resulting mainly from Trop+SH (Figs. 3a-c). Studies based on satellite-based GPP products suggested that this transition was mainly due to the increasing atmospheric vapor pressure deficit in the tropical zones (Yuan et al., 2019;Madani et al., 2020). Meanwhile, the increasing GPP trend in the NH was greatly weakened (from 0.10 to 0.01 GtC year −2 for GLASS and from 0.11 to 0.05 GtC year −2 for revised EC-LUE). In contrast, 270 DGVM ensemble mean GPP and NIRv kept increasing. However, in detail, four out of ten DGVM models (CLASS-CTEM, OCN, ORCHIDEE-MICT, and VEGAS) simulated weakened GPP increasing signals primarily from the Trop+SH (Fig. 3d and e). Similar to the spatial distribution, the FLUXCOM GPP has no noticeable trends in the study period.

Trend attributions in DGVMs 275
We analyzed the contributions of three drivers (CO2 fertilization effect, climate change, and LUC) to the GPP trends during 1982-2015 by using the results of the TRENDY sensitivity experiments (Fig. 4). Globally, DGVM ensemble results suggest that the CO2 fertilization effect is the dominant driver to increasing GPP (0.29 GtC year −2 accounting for 83.9%), followed by The spatial distributions of GPP trends indicate that the CO2 fertilization effect consistently increases global GPP, especially in tropical rainforest areas (Fig. 5a). Climate change has inhomogeneous effects on GPP owing to different regional changes 285 of climate elements ( Fig. 5c and Figs. S8a and b) associated with different vegetation sensitivities to each climate element (Wang et al., 2016;Jung et al., 2017). For instance, in the high northern latitudes, global warming dominates the increase of GPP (Fig. 5c, Figs. S8a, and S9c); the increase in temperature and decrease in precipitation over Amazon lead the GPP to decrease; the increase of GPP over Equatorial Africa appears to be consistent with the increase of the precipitation based on visual comparison against the trends for the CRU observational dataset (Fig. 5c and Fig. S8b). Concentrated in the Trop+SH, 290 LUC basically weakens GPP (Fig. 5e and Fig. S9a), mainly due to deforestation (Friedlingstein et al., 2019).

295
indicate that the trend is significant with p < 0.05 following the non-parametric Mann-Kendall trend test.

DGVM simulations
By analyzing the simulation results driven by each factor, it can be found that though the CO2 fertilization effect has the most 300 considerable contribution to the global GPP trend, it has the largest inter-model uncertainty ( = 0.11 GtC yr -2 ) among three drivers (Fig. 4). A previous study showed that some models might overestimate the CO2 fertilization effect on stomatal closure (Anav et al., 2015). The spatial pattern of the standard deviation among each model indicates that the inter-model spreads are mainly located in the Trop+SH (Fig. 5h). The inter-model spread attributed to the CO2 fertilization effect shows a consistently positive effect on GPP at the global scale (Fig. 5a, b). By contrast, the inter-model spreads driven by climate change and LUC 305 over Trop+SH outweigh the CO2 fertilization effect (Figs. 5b, d, and f). However, because the inhomogeneous impacts on GPP from climatic elements can offset each other to a large extent (Fig. 5c), it makes the largest inter-model uncertainty of CO2 fertilization to the global GPP increase rather than the climate effect. Meanwhile, the largest uncertainty in the impact of LUC on GPP increase concentrates over 20º-40ºS in South America. We further calculated the spatial correlation coefficients among the GPP trend of each model simulation to quantify their spatial consistencies. The correlation coefficient between each model varied from 0.16 to 0.61 (Fig. S10), implying that large uncertainties existed in the distribution of GPP trends among models, which were caused by differences in model structures and parameterizations (Rogers, 2014;Rogers et al., 2017). Furthermore, studies have shown that the global GPP increase will be largely overestimated without nitrogen (N) constraints, especially in the tropical region, where the nitrogen limitation will 315 reduce the photosynthetic capacity of vegetations and weaken its response to the increasing atmospheric CO2 concentration (He et al., 2017;Terrer et al., 2019). Phosphorus (P) availability also limits the extent to which plants respond to the CO2 fertilization effect, which is especially relevant in the Amazon forest (Fleischer et al., 2019). Therefore, DGVM ensemble GPP may overestimate the increasing GPP trend in the tropical regions since not all of the models used in this study take the effect of N limitation into consideration, and no model includes P limitation to the CO2 fertilization effect. 320

Satellite-based GPP products
In this study, GLASS GPP and revised EC-LUE GPP were used as a representative of long-term satellite-based GPP products.
As an essential input in the LUE model (Eq. 1), APAR is a function of LAI, suggesting that LAI is a key parameter in satellite- derived GPP. The spatial correlation coefficient of trends between GLASS LAI and satellite-based GPP (i.e., the mean of 330 GLASS GPP and revised EC-LUE GPP) is 0.42. A previous study over China has shown that satellite-based LAI datasets play a more important role in GPP estimation than meteorological data for all land cover types . Also, the spatial distribution of trends between LAI and GPP simulated from the DGVM ensemble is even more consistent ( = 0.77), confirming the previous studies showing that the trends of GPP and LAI are highly correlated in biome models (Ito et al., 2017;Liu et al., 2019), and the changes of GPP and LAI are consistent in earth system models from CMIP5 (Hashimoto et al., 2019). 335 Figure 6 compares the LAI trends of GLASS and the DGVM ensemble during 2001-2015. The spatial distribution of DGVM LAI indicates significant increasing trends over the boreal forest region, Indonesia, Equatorial Africa, and India, and significant decreasing trends over Kazakhstan, Southeast Asia, and Western Australia. The trends of GLASS LAI were obviously weaker than that of the DGVM ensemble, especially in the equatorial Africa region, northern Amazon region, Indonesia, and Northern 340 high latitudes. The large inconsistencies between the LAI from the DGVMs and those observed in the satellite products could lead to substantial uncertainties in generating/simulating the global GPP ( Fig. 1 and Fig. 6). Xiao et al., (2017) suggested that the trends of four satellite-derived LAI products showed large discrepancies in equatorial Africa from 1982 to 2011 and differed across each vegetation type. Jiang et al., (2017) revealed that NOAA satellite orbit changes and MODIS sensor degradation might cause long-term satellite-derived LAI products inconsistent with each other. Xie et al., (2019) also suggested 345 that satellite-derived LAI datasets can cause uncertainties in GPP estimations through model structure and the complexity of the terrain. Hence, the long-term trends of satellite GPP products based on satellite-derived LAI remain highly uncertain (Smith et al., 2016;Jiang et al., 2017;Liu et al., 2018).  Striped areas indicate that the trend is significant with p < 0.05 following the non-parametric Mann-Kendall trend test.

Evaluation of site-level GPP trends
We further adopted 20 sites with observations longer than 15 years from FLUXNET2015 datasets to evaluate long-term GPP 355 trends from global products and simulations. Compared to the site observations, the magnitudes of GPP at most of site locations were underestimated by satellite-based GPP products, FLUXCOM, and the DGVM ensemble mean. Also, the interannual variation and long-term trend of GPP at sites are more obvious than those of global GPP products and NIRv (Fig. 7). This is possibly due to the fact that climate variables at 0.5-degree grid cells are smoothed out compared to those recorded at individual sites, which leads to a moderate GPP variation. 360 More than half of the sites indicate that FLUXNET GPP has increased on a long-time scale (Fig. 7, Fig. S11), which was mainly caused by rising LUE due to the CO2 fertilization effect and increased green vegetation cover (Cai and Prentice, 2020).
Although FLUXCOM was upscaled from FLUXNET datasets, it did not capture the trends of GPP observed at sites, which was also mentioned by Anav et al. (2015). Sites with significant increasing GPP trends were all captured by DGVM ensemble 365 mean, but some were missed by satellite-based GPP (Figs. 7b,d,k,l,m,s,and t). Furthermore, none of sites with decreasing GPP trends were reflected in the global GPP products and NIRv (Figs. 7f,h,i,j,n,q,r), which may be in part due to the different spatial representativenesses between a tower fetch and a model or satellite grid point. Therefore, the uncertainty remained when using the site-level observed GPP to evaluate the GPP trends of the DGVM simulations and satellite-based GPP products. It is worth mentioning that sites with more than 15 years of observations were all located at NH (from 39.32°N 370 to 50.96°N) ( Table 2). It is hardly for us to evaluate the GPP trends of global products over the Trop+SH by using the GPP observations from sites.

Conclusions
Based on five kinds of GPP or GPP-related datasets, including satellite-based products, machine learning models, DGVM simulations, satellite-observed proxy (NIRv), and site-level observations, we comprehensively assessed the global and regional GPP trends during 1982−2015. The simulated spatial pattern of GPP trends from the DGVM ensemble is highly consistent with NIRv, but shows considerable inconsistency with satellite-based GPP products, especially in the tropical regions. After 385 2000, the GPP generated by the satellite-based GPP products decreased significantly in Trop+SH, and the increasing trend in NH also weakened. However, the results of DGVMs showed that global GPP kept increasing after 2000, even in the tropical region, which was closer to the performance of NIRv. By analyzing the impact of each driving factor in DGVM simulations, the results indicate that the CO2 fertilization effect has the dominant contribution to the global GPP. Spatially, the CO2 fertilization effect makes the global GPP increased consistently, while climate has inhomogenous impact on GPP trends over 390 different regions.
We further explored the uncertainties in GPP trends among these different datasets. For DGVM ensemble, globally, the CO2 fertilization effect causes the largest inter-model spread. At the grid cell level, the uncertainties in simulated GPP trends concentrate over the Trop+SH, which result mainly from climate and LUC. Furthermore, the large discrepancy in the GPP 395 trends between DGVM ensemble and satellite-based GPP products is, to a large extent, induced by the difference of vegetation canopy structure parameter (LAI). Therefore, the highly uncertain satellite-derived LAI data in the tropical regions increase the uncertainty of satellite GPP products and weaken their reliability in explaining changes in the global GPP.
Finally, GPP trends from satellite-based products and DGVM simulations were evaluated by using the FLUXNET2015 dataset. 400 Results show that all of sites with significant increasing GPP trends can be captured by DGVM ensemble mean, but some of them were missed by satellite-based GPP. And none of sites with decreasing GPP trends were reflected in the global GPP products. Therefore, uncertainty remained when using the FLUXNET observed GPP to evaluate the GPP trends of the global GPP products.

405
Generally, the differences among models, observations, and products suggest the importance of the research on the GPP trend and make our caution to interpret the mechanisms of the global carbon cycle by using the long-term GPP products. Data Availability. All data acquired or used in this analysis are available from the links in Table S1.