Parameter uncertainty dominates C cycle forecast errors over most of Brazil for the 21st Century

Identification of terrestrial carbon (C) sources and sinks is critical for understanding the earth system and to mitigate and adapt to climate change results from greenhouse gas emissions. Predicting whether a given location will act as a C source or sink using terrestrial ecosystem models (TEMs) is challenging due to net flux being the difference between far larger, spatially and temporally variable fluxes with large uncertainties. Uncertainty in projections of future dynamics, critical for policy evaluation, has been determined using multi-TEM intercomparisons, for various emissions scenarios. This approach 5 quantifies structural and forcing errors. However, the role of parameter error within models has not been determined. TEMs typically have defined parameters for specific plant functional types generated from the literature. To ascertain the importance of parameter error in forecasts we present a Bayesian analysis that uses data on historical and current C cycling for Brazil to parameterise five TEMs of varied complexity with a retrieval of model error covariance at 1 degree spatial resolution. After evaluation against data from 2001-2017, the parameterised models are simulated to 2100 under four climate change scenarios 10 spanning the likely range of climate projections. Using multiple models, each with per pixel parameter ensembles, we partition forecast uncertainties. Parameter uncertainty dominates across most of Brazil when simulating future stock changes in biomass C and dead organic matter (DOM). Uncertainty of simulated biomass change is most strongly correlated with net primary productivity allocation to wood (NPPwood) and wood mean residence times (MRTwood). Uncertainty of simulated DOM change is most strongly correlated with MRTsoil and NPPwood. Due to the coupling between these variables and C stock dynamics 15 being bi-directional we argue that using repeat estimates of woody biomass will provide a valuable constraint needed to refine predictions of the future carbon cycle. Finally, evaluation of our multi-model analysis shows that wood litter contributes substantially to fire emissions necessitating a greater understanding of wood litter C-cycling than is typically considered in large-scale TEMs.

improved atmospheric transport (van der Laan-Luijkx et al., 2015). The ensemble uses five NEE priors and three fire emission drivers but with a common set of atmospheric constraints and transport model (Schaefer et al., 2008;Bodesheim et al., 2018;200 van Schaik et al., 2018;Haynes et al., 2019;Koren , 2020). By using a range of priors it covers the uncertainty in the seasonal variation of C fluxes in tropical regions (Saleska et al., 2003;Restrepo-Coupe et al., 2013;Koren et al., 2018;Mengistu et al., 2020). In the remainder of this text the CTE and CT-SAM datasets are collectively referred to as CTE. FLUXCOM GPP is estimated by an ensemble of ML approaches driven with meteorological reanalysis and EO derived vegetation indices and calibrated using eddy covariance information drawn from the FLUXNET network (Jung et al., 2020). 205 FLUXCOM has been widely evaluated using eddy covariance information and has been used to evaluate TEMs (Jung et al., 2020). Each GPP estimate has a corresponding median absolute deviation estimate drawn from the ML ensemble which is assumed to represent FLUXCOM's uncertainty for comparisons to CARDAMOM-DALEC.

Analysing the drivers of forecast uncertainty
To project DALEC to 2100, future climate drivers were extracted from the UK Earth System Model (UKESM; Sellar et al., 2019) contribution to CMIP6 (Eyring et al., 2016). This study uses the core scenarios SSP1-2.6W m −2 , SSP2-4.5W m −2 , SSP3-7.0W m −2 , SSP5-8.5W m −2 spanning a mean global warming of 1.7-5 o C (O'Neill et al., 2016). The scenarios are also 220 used to impose future forest biomass extraction. The future climate is imposed by determining the anomaly from the end of our analysis until 2100. The mean temperature (M1-5), incoming shortwave radiation (M1-5), vapour pressure deficit (M2-5), wind speed (M2-5) and precipitation (M2-5) anomalies for each scenario as shown in Figure A3. The time series of future atmospheric CO 2 concentration is prescribed for each scenario.
Model analyses quantified the relative contribution of variation in model parameters, model structure and climate change 225 scenario to overall uncertainty in the simulation of biomass and DOM to 2100. Parameter uncertainty was estimated to be the 90% CI resulting from the simulation of the retrieved parameter ensembles. Model structural uncertainty was estimated as the between model range of the pixel-level median estimates. Climate change scenario uncertainty was estimated as the pixel-level range of median estimates across scenarios for each model. This analysis address research question 2.

235
We conducted MDF analyses to retrieve ensembles of location specific parameters for five DALEC models of varied complexity across Brazil. Each model simulated the calibration data with a good degree of skill, returning similar likelihood scores (R2  Table 2) against independent estimates. The 1 degree spatial parameter ensembles show that there is a strong dependency between net carbon exchange and wood stocks (R2 >0.8; Figure A6). The DALEC parameter ensembles have been 240 projected to 2100 ( Figure 6) and show that parameter, not structure or climate change scenario, dominates overall uncertainty in most areas ( Figure 7). Finally, using the parameter ensembles to quantify the correlation between ecosystem variables and future carbon stock dynamics we identify allocation of NPP to wood (NPP wood ) and MRT wood as targets for further constraint on model forecasts (Figures 8-9).

Calibration constraints 245
All DALEC models match their calibration information with a high degree of skill ( Figure 3). The DALEC models are consistent with FLUXCOM GPP at the 90% CI across 83-94% of Brazil ( Figure 4). Inter-model variation follows the implementation of the differing photosynthesis models and inclusion of carbon-water cycle interactions.
The simplest model, M1, (ACM1, no water cycle; 94%) was the most consistent with FLUXCOM; followed by M2 (ACM2, no water cycle; 90%). M3-5, which use ACM2 and simulate water cycling, are least consistent with FLUXCOM (83-85%) over Brazil. The non-consistent areas for all models are concentrated in the Caatinga and Cerrado (Figure 4), which have strong seasonality in rainfall and more extreme temperatures ( Figure A2). Moreover, the DALEC models all estimate a lower GPP for these regions (by ∼5 MgC/ha yr −1 ) than FLUXCOM suggesting different high temperature and drought sensitivities between analyses ( Figure A7). The activation of water cycling between M2 and M3-5 reduces Brazil's GPP by an average of 7% but varies substantially in space with declines across the Cerrado and Caatinga of ∼30%.  ing the annual net C uptake by 3-30 % over this period ( Figure 5). At the 90% CI the DALEC models are consistent with GFEDv4.1s and GFAS fire emissions estimates over 41-53% of Brazil ( Figure 4). DALEC has, however, a persistent low bias (∼2 MgC/ha yr −1 ) across the boundary between the Amazon and Cerrado ( Figure A7). Spatial consistency is greatest in M5 (53%) and M4 (49%). Moreover, a comparison of total fire emissions show that M1-4 estimates are substantially lower than either GFAS (mean = 65-69 TgC yr −1 ) or GFEDv4.1s (mean = 92-96 TgC yr −1 ) ( Figure 5). Whereas M5 estimates, while still 290 lower than GFAS (mean = 13 TgC yr −1 ) or GFEDv4.1s (mean = 41 TgC yr −1 ), fall between the independent estimates in 11 of 15 years in which these data sets overlap ( Figure 5). Collectively, these results suggest that increasing model complexity by both partitioning R g and R m , and the inclusion of a wood litter pool, improved simulation of fire (Table 1). DALEC-estimated C losses due to forest biomass removals from GFW show substantial inter-annual variation (120-400 TgC yr −1 ) reducing net uptake by 5-32 % ( Figure A10). For further details see SI text S2. 295

Constraints on Brazilian C-cycling
Simulated net biome exchange (NBE = NEE + fire) is dominated by wood stock dynamics. Variation in wood stock dynamics explains 81-92 % of variation of simulated NBE while variation in soil stock dynamics explains 2-16 % ( Figure A6). Carbon dynamics of wood, not soil, is the primary driver of net exchange. Using the per-model, 1 degree resolution parameter ensembles provides quantification across Brazil of whether a given 1 degree pixel is a net source or sink of carbon, i.e. the sign of 300 NBE ( Figure A6). At the 90 % confidence interval our analyses indicate there is currently insufficient observational constraint to confidently determine the sign of NBE or soil C dynamics. The same was largely true for wood stock dynamics, except that the sign of wood stock trajectories could be confidently determined for ∼5 % of Brazil's land area after the inclusion of the water cycle increasing to 11% on the inclusion of a wood litter pool. The areas of statistical confidence are concentrated in the Cerrado ( Figure A6).

325
There are substantial variations between models in estimated MRTs, in addition to biome level differences ( Figure A12).
MRT root shows the greatest between-model variation with short (<1 year) MRTs estimates across the majority of Brazil (i.e. little biome level variation) in M1 but longer MRT in all other models (1-2 years). Models M2-5 have larger biome-level variability in MRT, with longer (> 1 year) MRT root in the drier Cerrado and Caatinga regions compared to other biomes. In models M1-4 MRT wood is ∼10 years across much of Brazil except the Amazon which has MRTs of up to 50 years but notable 330 short MRT along the boundary of with the Cerrado (i.e. the arc of deforestation). Longer residence times were estimated across parts of the Caatinga in M3-5, likely linked to the inclusion of the water cycle in these models. In M5, where an explicit wood litter pool is included, MRT wood increased in the southern Cerrado and Atlantic forest from < 10 years to > 10 years. Finally, the estimate of the combined litter (i.e. foliar, fine root and wood) MRT in model M5 was greater across Brazil than other models due to the explicit inclusion of slowly decomposing wood litter.

340
The median forecasts of all DALEC models simulated a net increase of biomass and DOM by 2100 under the SSP2-4.5 W m −2 climate change scenario ( Figure 6, S3). M1-2 and M5 simulated a larger C accumulation (∼75 PgC) while M3-4 simulate a smaller increase (∼40 PgC). The 90% CI (i.e. the 5% -95% quantiles) is greater than the median predicted accumulation for each model, therefore crossing the source / sink boundary for the next century in all cases ( Figure 6). C accumulation is simulated to be concentrated in the Amazon and to a lesser degree the Atlantic forests (e.g. M5). All other areas (i.e. the drier 345 regions) remain near neutral or decline in carbon storage ( Figure A13, A14).
The analyses indicate that live biomass and DOM stocks will most likely increase under each climate change scenario ( Figure   6). Median C accumulation under SSP1-2.6 W m −2 plateaus by 2080 before turning into a carbon source by 2100 (i.e. begins losing its accumulated carbon), while all other scenarios continue to accumulate carbon to 2100. As expected, accumulation of DOM lags that of biomass, as turnover of biomass provides inputs to DOM in the models. Analysis uncertainty is larger than 350 mean predicted accumulation for all scenarios, with the lower bound of the 90% CI indicating a possible net loss of carbon for the next century. The spatial variation in carbon source / sink distribution indicated for SSP2-4.5 W m −2 is consistent for each scenario ( Figure A13, A14).  with LAI is linked to the sensitivity of the respective models to variations in atmospheric CO 2 (Melnikova and Sasai , 2020; Sun et al., 2019). The sensitivity of GPP to CO 2 remains poorly quantified, especially in the tropics, due to the lack of in-situ data and large observational uncertainties (Sun et al., 2019). As a result, process-model-enhanced GPP may be overestimated, due to missing nutrient limitation (He et al., 2017), acclimation processes (Ainsworth & Rogers, 2007)  Uncertainties in terrestrial carbon cycling have remained large over multiple inter-comparison cycles (Arora et al., 2020).

445
Thus, we argue that a greater focus is needed on refining parameters themselves.

Which ecosystem traits are most strongly correlated with simulated carbon dynamics?
MRT wood was the parameter most tightly coupled to the response of biomass C stocks to climate change between now and function of three factors. First, there is a relatively weak constraint on the MRT wood parameter in each pixel over the calibration period due to lack of repeat observations of wood biomass. Second, woody biomass is typically the largest biomass pool, and MRT wood is a key control on turnover and therefore on decadal changes in the size of this pool. Third, MRT wood is a key control on C inputs to the soil C and wood litter (M5 only) pools in DALEC, generated by modelled wood losses. Higher resolution studies (e.g. at ha or km 2 resolution) over areas of rapid biomass change, such as the arc of deforesta-460 tion in Brazil, will provide added insights into model structure and parameter uncertainties. There are challenges for the CARDAMON-DALEC approach to work at finer scales and with more dynamic ecosystems. For instance, the concept of MRT is less appropriate in modelling successional change and highly dynamic systems, where internal feedbacks may adjust C losses (mortality) with variations in density and age (Peters , 2003;Ge et al., 2019). Chronosequence data at high spatial and temporal resolutions can provide the means to test alternate representations of successional variation in C cycling and storage 465 (e.g., Safar et al., 2020).

Similarly it is important that NPP
Variations in C storage linked to model structure were smaller than those linked to model parameterisation, except in specific areas of Brazil (Caatinga; Figure 7, S15). The selection of five model structures was limited by our choice, so it is perhaps not surprising that the parameter calibration, which allows for multidimensional variation over broad priors, contributes more variation to projections than does model structural variability. However, the variation in model structure was designed to test 470 whether hypothesised key processes were important in projections. For instance, we used models with and without a water cycle simulation to test the importance of carbon-water feedbacks in projections of C storage to 2100. Models M3-5 included dynamic simulation of soil moisture changes and its interactions with canopy processes. Projections with these models thus included the potential development of soil moisture stress, with an impact on GPP. Models M1 and M2 had no direct effect of soil moisture on C cycling. This soil moisture feedback on GPP only manifested itself in projections for north east Brazil, 475 the driest part of the country, in the Caatinga biome, and some nearby parts of Cerrado (Figures 4, S7). This feedback does have an impact on projected C storage (Figure 7; Table A2), but these effects are of similar or less magnitude to parameter uncertainty. We conclude that for much of Brazil, outside of Caatinga, our model-data fusion shows a limited sensitivity of C cycling to future soil moisture stress. This is likely a result of CO 2 fertilisation leading to reductions in plant water demand that are explicit in both ACM GPP models. However, it is possible that land surface models like DALEC are overestimating 480 CO 2 fertilisation effect (Wang et al., 2020) highlighting the need for further evaluation and refinement.
As expected climate change scenario uncertainty is dwarfed by uncertainties in both model structure and parameters ( Figure   7). However, we have only used projections from one earth system model (UKESMv1), and therefore we have not quantified the impact of uncertainty in meteorology / climate change itself for a given emissions scenario. While uncertainty in meteorology has been shown on longer time scales (decadal) to contribute a minority component relative to the overall uncertainty (e.g., Huntingford et al., 2013;Bonan et al., 2019), we do consider multi-model forcing well worth including in future analyses to provide the most robust assessment of observation constrained carbon trajectories possible.

Future avenues to improve observational constraint
Improving constraints on NPP and MRT will reduce uncertainties associated with simulated biomass and DOM change ( Figure   8-9). The reverse is also true: providing information on contemporary biomass and DOM dynamics will improve constraint 490 on NPP and MRT, which in turn reduces uncertainty when simulating into the future. In addition to repeat measurement of biomass stocks there are a diverse range of alternate observational constraints which could provide critical information.
Assimilation of atmospheric inversion estimated net carbon exchange could provide constraint on simulated biomass and DOM change and therefore on their correlated ecosystem variables and parameters. The divergence between estimates of net carbon exchange by different atmospheric inversion systems has reduced substantially over recent years (Gaubert et al., 2019).

495
Moreover, given sufficient observational constraint, posterior estimates of net exchange can converge even with substantially different priors of biospheric exchange indicating a robust analysis (White et al., 2019). But this approach limits ecological learning (e.g. refining MRT to reduce prediction uncertainties). Direct assimilation of atmospheric observations of CO 2 concentrations into TEMs has previously been used to refine a limited number of parameters for specific plant functional types (PFT) (e.g., Reuter et al., 2011). However, key ecosystem traits (e.g. NPP allocation and MRTs) vary within what would clas-500 sically be considered the same PFT (e.g., Exbrayat et al., 2018b) necessitating the development of strategies to allow direct model parameterisation at sub-PFT scales. We recognise that this approach will have substantial technical and computational challenges but the potential benefits are too great to ignore.
Methodologies to estimate potential biomass, i.e. in the absence of direct human disturbance (Exbrayat and Williams , 2015), or potential regrowth rates (Cook-Patton et al., , 2020) have recently gained attention. However, a key weakness of these 505 estimates is their dependence on current biomass ∼ climate relationships, thus lacking the ability to project into new climate or disturbance regimes (Lewis et al., 2019). Nevertheless, we see an opportunity to use potential biomass information as an additional constraint, in conjunction with existing EO biomass maps, on the steady state generated by a given combination of current climate and parameters in the absence of human disturbance (which can be determined analytically). As simulation of biomass change correlates strongly with both MRT wood and NPP wood (Figure 8), we expect that assimilation of potential 510 biomass will also provide constraints on these parameters, increasing confidence in simulations of climate-sensitive carbon cycle trajectories.
Forest biomass removal has a significant impact on the Brazilian C-cycle, resulting in losses between 100-450 TgC yr −1 (Table 2; Figure A10) and the subsequent regrowth of secondary forests. Secondary forests across the Brazilian Amazon alone are estimated to cover an area of 22-28 Mha accumulating 1.5-11.25 MgC ha yr −1 but are estimated to be re-cleared every 515 5-10 years (Poorter et al., 2016;Yang et al., 2020). It is likely that we are missing losses driven by degradation, re-clearance events, and edge effects (e.g., Yang et al., 2020)(e.g. Yang et al., 2020) that are not accounted for in existing EO datasets, such as GFW, that are used to drive disturbance in our models (Milodowski et al., 2017;Silva Junior et al., 2020). Missing these disturbance events would lead to overestimation of long term accumulation of woody carbon, consistent with the likely overestimate of net carbon uptake estimated by the DALEC models already discussed in comparison with CTE. Moreover, 520 improved disturbance drivers can add additional constraint to the C-cycle, potentially reducing uncertainty and refining our best estimates of key ecosystem traits as has previously been demonstrated due to the inclusion of fire disturbance information (Exbrayat et al., 2018b). Therefore, improvements in EO-based estimates of deforestation, degradation and the inclusion of re-clearance information is essential to reflect their associated emissions and therefore improve model calibration efforts.

525
We use a MDF approach in conjunction with 5 related terrestrial carbon cycle models, with differences in key feedbacks and processes, and observational constraints to quantify the current and future state, trajectory and uncertainties of the Brazilian carbon cycle. Our analysis shows that parameter uncertainty exceeded both the structural uncertainty captured within our model ensemble and uncertainties in projected climate except in drier areas of the country. Parameter uncertainty alone was large enough to span the source / sink boundary identifying a clear need to further refine parameter constraint not just model 530 structural complexity. We identify NPP wood , MRT wood and MRT soil as key uncertainties influencing future trajectories. Given the bi-directional nature of these associations we have identified future avenues for new observational constraints on these ecosystem properties. Such constraints include repeat estimations of AGB, estimates of NEE from atmospheric inversion, estimates of potential AGB stocks and improved estimates of fluxes driven by ecosystem disturbance and regrowth. Improving constraints on residence times will greatly improve our ability to make meaningful predictions into the future.

535
Code and data availability.  Table A1) consistent with observational constraints and their uncertainties. The likelihood (i.e. probability) of a given x is estimated with respect to the assimilated observations (P(x|O)) as a function of the likelihood of the observations given the current parameters (P(O|x)) and any prior knowledge. In our analyses we assume a prior range on each parameter defined as P range (x) and ecological and dynamical constraints (EDCs) estimated as a function of DALEC output (PEDC(DALEC(x)))

545
(See Bloom and Williams , 2015, for details). Finally, all models apply a prior value (P prior (x)) on the ratio of R a to Figure A1. Maps of observational constraints and their associated uncertainties used in the CARDAMOM assimilation framework. Mean annual LAI, wood stocks and initial soil stocks.
tissue (equivalent to 28% of NPP; Waring and Schlesinger , 1985). This development provides the first step in explicit simulation of respiratory costs needed for implementation of economic theory within DALEC (e.g. Thomas et al., 2019;Flack-Prain et al., 2020).

580
M5 includes a wood litter pool rather than allocating wood litter directly to the soil. The inclusion of a wood litter pool tests our ability to constrain this potentially large but challenging to observe carbon store (Magnússon et al., 2016). Furthermore, wood litter can play an important role in carbon emissions due to fire (vanderwerf et al., 2006) and its inclusion here allows us to investigate whether M5 has an improved estimate of fire emissions.

585
DALEC estimated carbon losses due to forest biomass removals are estimated to vary between 120 -400 TgC yr −1 reducing the biospheric sink by 5 -32% ( Figure 5). The largest biomass extractions are estimated for 2016 (387 -425 TgC yr −1 ; between model range) and 2017 (302-337 TgC yr −1 ; between model range). In all other years the mean biomass loss was substantially lower at 102-220 TgC yr −1 . The interannual variation follows the GFW estimate as the fraction forest cover loss is derived by GFW. GFW estimates forest losses are larger than the DALEC models at the beginning of the analysis but converging by 2017 590 potentially driven by the accumulation of wood in the DALEC models. As GFW is used as a forest loss driver by the DALEC suite this comparison is not fully independent. However, disagreement between these estimates highlights the importance of the biomass map underpinning the analyses (See discussion for further details). Table A1. Description of parameters estimated for each DALEC model, each parameter is given a name, unit, description. As not all parameters are used by all models the applicable models are also given following the code used in the main text. Note: Gross Primary Productivity = GPP, Autotrophic respiration = Ra, Autotrophic maintenance respiration = Rm, heterotrophic respiration = Rh. Litter is assumed to be the combined foliage and fine root litter pools, where appropriate wood litter will be explicitly stated. Note that GPP allocation fractions are applied sequentially such that GPP allocation to C wood = GPP -(GPP·Ra:GPP) -(GPP·GPP lab ) -(GPP·GPProot).