Comparison of uncertainties in land-use change fluxes from bookkeeping model parameterization

The revised version of the MS has two differences from our replies. The first is that we compare areas between the two models in 2015, not 1700. This is because these values were not available for 1700. The second is that we add more detail about the models in the main text, rather than adding a supplementary section. We found that adding these important details to the supplement would complicate the reading of the manuscript instead.

Reconstructing these changes consistently over the globe for the past centuries (let alone millennia) is, however, challenging and associated with high uncertainties (Hurtt et al., 2020;Klein Goldewijk et al., 2017;Pongratz et al., 2014;Ramankutty and Foley, 1999). This uncertainty in forcing translates directly to uncertainties in F LUC estimates (Gasser et al., 2020;Pongratz et al., 2009;Stocker et al., 2011). Moreover, differences in definitions and terminology, and on how indirect environmental effects such as increasing atmospheric CO 2 concentration are considered lead to large differences in F LUC estimated by different methods (Gasser and Ciais, 2013;Grassi et al., 2018;Pongratz et al., 2014;Stocker and Joos, 2015). Grassi et al. (2018) have shown that by harmonising definitions of managed land, estimates of F LUC by a bookkeeping (BK) model, dynamic global vegetation models (DGVMs) and national inventories can be in part reconciled. The indirect environmental effects (accounted for in DGVMs but not in BK models) can be calculated by factorial simulations, in order to compare estimates from these two methods (Bastos et al., 2020). Whether and how these indirect effects are accounted for in F LUC creates large differences between estimates but can be resolved by a consistent terminology (Grassi et al., 2018;Pongratz et al., 2014). Besides uncertainty in historical land-use change (LUC) areas and terminological issues, studies also differ with respect to which LUC practices are considered. Several studies have shown that including management practices such as shifting cultivation, crop or wood harvesting might increase F LUC by 70 % or more in individual DGVM estimates Pugh et al., 2015) with management processes explaining some of the differences between biospheric fluxes from DGVMs and top-down estimates (Bastos et al., 2020).
In the global carbon budgets since 2017 (Friedlingstein et al., 2019;Le Quéré et al., 2018a, b), F LUC estimates for recent decades are taken as the mean of the estimates of two BK models, the one from Houghton and Nassikas (2017) (HN2017) and the BLUE model described in Hansis et al. (2015). However, even for these similar methods, estimates differ considerably (Bastos et al., 2020;Friedlingstein et al., 2019). Cumulative F LUC from 1850 until the present day by these two BK models is 205±60 PgC in the Global Carbon Budget 2019 (Friedlingstein et al., 2019; GCB2019 in the following). The F LUC uncertainty after 1959 has been defined by best value judgement that there is a 68 % likelihood that actual F LUC lies within ±0.7 PgC yr −1 of the two models' mean, and for earlier periods, the standard deviation of a group of DGVMs was used. This uncertainty range reflects uncertainties in parameterisations of the BK models, in the applied land-use change forcings as well as definitions, processes considered, and is large enough to encompass the two models' estimates.
Besides differences in cumulative F LUC , BLUE and HN2017 also show very different temporal behaviours (Friedlingstein et al., 2019, and see Fig. 2 below). Noteworthy is an increase in F LUC in BLUE but decrease in HN2017 in the 1950s, which is likely attributable to the change in methodology in HYDE  from using FAOSTAT (FAOSTAT, 2015) estimates to populationbased extrapolation in the past (Bastos et al., 2016). This comes on top of a generally steeper increase in F LUC in BLUE in . A second notable difference in temporal dynamics can be observed in the 2000s, as has been shown by Bastos et al. (2020). Here, BLUE shows a strong increasing trend starting 2000, while HN2017 estimates start decreasing after the late 1990s.
The estimated uncertainty of F LUC in the global carbon budgets is thus approximately 0.7 PgC yr −1 or approximately ±50 % of the average value, substantially larger than that of fossil fuel emissions. This uncertainty, in turn, is propagated to global and regional carbon budget estimates and affects the land sink term, which has often been quantified as residual depending on F LUC . Houghton (2020) further noted that while net F LUC can be constrained by the global carbon budgets, the component gross fluxes (sources such as deforestation and sinks, e.g. by afforestation) are even more uncertain.
Differences in initial land-cover distribution and transitions across different forcing datasets can also lead to substantial differences in estimated F LUC (Di Vittorio et al., 2020;Li et al., 2018;Gasser et al., 2020). A detailed analysis of the impact of the forcing datasets on LUC estimated by the OSCAR bookkeeping (BK) model has been performed by Gasser et al. (2020), and Hartung et al. (2021) analysed the effect of the different LUC from the Land-Use Harmonization dataset (LUH2v2.1) (Hurtt et al., 2020) and of various internal model assumptions in BLUE on F LUC .
Despite the relevance of the BLUE and HN2017 estimates for the global carbon budget analyses, stark discrepancies between these two models (Friedlingstein et al., 2019) and the long-standing appreciation of various factors contributing to such differences (Hansis et al., 2015;Houghton et al., 2012), no quantitative analysis on the contribution of model differences to this discrepancy has so far been performed. Both models rely on observation-based estimates for their parameterisations and forcing datasets, and the choices on spatial and plant functional type representation, starting year and other aspects are well justified in both models. However, these multiple differences add to uncertainty in F LUC estimates and make it difficult to attribute differences in F LUC and their trends to specific aspects of the F LUC calculation.
In this study, we fill this gap and assess to which extent the different parameterisations in BLUE and HN2017 affect global and regional F LUC estimates and their trends. We further investigate the effect of the different parameter choices on the gross LUC fluxes.

Model characteristics and datasets used
In this study, we focus on the two BK models used in the GBC2019 as well as in the Intergovernmental Panel on Climate Change's Special Report on Climate Change and Land (IPCC, 2019) to estimate F LUC : the Bookkeeping of Land-Use change Emissions model, BLUE (Hansis et al., 2015) and the model from Houghton and Nassikas (2017), which is referred to as HN2017.
The two models differ in several aspects, the most relevant ones summarised in Table 1. An important difference, which we will account for in this study, is that BLUE estimates F LUC from gross LUC transitions, while HN2017 uses net transitions. Gross transitions resolve that within a unit (grid cell for BLUE, country/region for HN2017) there may be concurrent back-and-forth transitions between a pair of land-use types; for example, 30 % of the unit area may be transformed from forest to cropland, while on 20 % cropland is abandoned and forest regrows. Net transitions would represent this as a 10 % forest to cropland transition. These subunit changes are particularly important for large units (large grid cells or country level; Wilkenskjeld et al., 2014) and in regions where shifting cultivation prevails (in particular in the tropics; Heinimann et al., 2017) or with small-scale dynamics such as in Europe (Fuchs et al., 2015). HN2017 implicitly includes shifting-cultivation effects if these are captured by FAO (2015) data and allows degraded lands start to accumulate carbon again after 10 years of no change. The two models are also forced by distinct LUC datasets: HN2017 calculated F LUC at country level based on statistics of changes in croplands and pastures extent since 1961 and harvest data and changes in forests and other land since 1990 (FAO, 2015;FAOSTAT, 2015), with extrapolations to earlier time periods. BLUE, on the other hand, is forced by spatially explicit transitions and harvest at 0.25 × 0.25 • resolution from LUH2v2.1 (Friedlingstein et al., 2019;Hurtt et al., 2020). LUH2v2.1 calculates cropland, pasture, urban and ice/water fractions between 850 and 2018 based on the HYDE3.1 dataset . HYDE3.1 in turn, also used FAOSTAT (2015) data for country-level agricultural areas (cropland, pasture, rangelands) data after 1961, extrapolated backwards in time using total population and agricultural area per-capita ratios for each country. The cropland and forest area estimates from these two different datasets (LUH2v2.1 vs. FAO) differ considerably in several key LUC areas, for example, South America and SE Asia (Li et al., 2018), which can lead to large differences in F LUC and their trends found in those regions (Bastos et al., 2020;Di Vittorio et al., 2020).
Following a transition, C stocks in the different pools will decay following response curves with characteristic decay times (fast for biomass pools and slow for soil pools). To estimate changes in C stocks, the models rely on values of C den-sity in above-and below-ground pools which are plant functional type (PFT) specific and based on measurements (Table A2). However, the models differ in the number of plant functional types (Table A1) and their spatial distribution (per country in HN2017 and spatially explicit in BLUE).
For harvest and clearing, the dislocated C is distributed between a dead soil pool and three product pools of different lifetimes: 1, 10 and 100 years (Table A3). In the case of BLUE these fractions are fixed and PFT specific, while HN2017 distinguishes between harvested wood use over time (fuel, 1-year, industrial, 10-and 100-year timescales), so that the fraction allocated to each pool changes over time.
Parameters in BLUE and HN2017 are defined on a PFT basis, but HN2017 distinguishes 20 PFTs (3 of them desert PFTs), while BLUE distinguishes 11 PFTs. In order to compare the parameterisations, the different PFTs need to be mapped. Most HN2017 PFTs can be aggregated into the often more broadly defined BLUE PFTs but some of the PFTs in BLUE do not correspond to HN2017 PFTs (e.g. summergreen shrubs) (Table A1). A map of the PFT distribution from HN2017 is not available, as the PFT fractions are defined on a per-country basis. When aggregated globally, the values of BLUE and HN2017 show good agreement in the global extent of croplands (15.3 and 13.8 million km 2 for HN2017 and BLUE, respectively, in 2015) and forests (39.9 and 40.9 million km 2 for HN2017 and BLUE, respectively, in 2015).
When more than one PFT class from HN2017 is aggregated to one PFT in BLUE, we estimate the corresponding parameter value as the average value weighted by the HN2017 PFT fractions within that country. We use therefore spatially explicit values in the model simulations (as in Fig. A1), but they are summarised as spatially averaged values in Table A1.

Factorial simulations
In order to attribute differences in F LUC between the two models to specific aspects from Table 1, we perform a set of factorial simulations with BLUE (see Table 2), in which we replace the BLUE parameters with those from HN2017 (see also schematic in Fig. 1). We then compare these simulations with the fluxes estimated by HN2017, published in Houghton and Nassikas (2017) and Friedlingstein et al. (2019).
The different simulations performed and their justification are as follows (summarised in Table 2): -S BL is the BLUE simulation performed for GCB2019, following the setup described in Table 1, i.e. the standard BLUE configuration.
-S BL-Net (reference simulation) is the BLUE simulation as S BL but starting in 1700 and using net transitions rather than gross transitions. The difference to S BL provides an estimate of the impact of the core setup of HN2017 (net transitions and starting in 1700). In this Table 1. Summary of the most important characteristics of the two F LUC estimates from the two BK models used in the GCB2019 (BLUE and HN2017), including how F LUC is calculated in the standard version and configuration of each model, the processes represented and how they are parameterised. The model assumptions and parameterisations investigated in this study (see Table 2) are highlighted in bold. simulation, net land conversion is taken first from primary land; i.e. abandonment (to secondary land) is allowed to cancel clearing from preferentially primary land in addition to secondary land, which reduces emission estimates more than if abandonment were allowed to cancel clearing only of secondary land (Hansis et al., 2015). The choice for net transition implementation aims to make F LUC estimates more comparable to the approach in HN2017, albeit keeping the different origi-nal forcing (LUH2v2.1 in BLUE as compared to FAO in HN2017). All subsequent simulations are run with this setup but with different parameterisations (Table 2).
-In S HNCdens , BLUE is run using the C densities in vegetation and soil parameters from HN2017. Although C density parameters in HN2017 are defined on a percountry and per-PFT basis, only vegetation C densities differ between countries for a given PFT, while  Table 2). The model is forced by a map of grid-cell-level land-use transitions occurring at time t (gross vs. net). These are then combined with a potential vegetation map of 11 natural vegetation types (Table A1), each having specific carbon densities in vegetation and soil pools (Cdens), to calculate the carbon dislocated by each transition. The mass of dislocated carbon is then distributed among different slash and product pools (Alloc), with specific response curves with different decay times (t).
soil C densities only differ per PFT (for example, for tropical evergreen broadleaved forest in Fig. A1). The global average values per PFT for BLUE and HN2017 are given in Table A2. BLUE has generally higher vegetation and soil C densities in the tropics and most temperate PFTs, and lower vegetation and soil C densities in pastures, and lower soil C densities in croplands, compared to the average values of HN2017.
-In S HNAlloc , BLUE is run using the harvest and clearing allocation fractions, and the slash fractions following clearing from HN2017, but the C densities in vegetation and soil from BLUE. The global average values for BLUE and HN2017 are given in Table A3. In the actual HN2017 model run (Houghton and Nassikas, 2017), the allocations vary over time. Since BLUE uses temporally static fractions, we used an average over the full period . Harvest slash fractions in BLUE (with timescales of 5-15 years in BLUE) are larger in BLUE than in HN2017 for all PFTs. HN2017 allocates more harvest product to the long-lived pool over the period 1850-2015 than BLUE (Table A3). For clearing, the short and long-lived pools are relatively similar between the models but the medium-lived pool is larger in BLUE, depending however on the PFT considered. The slash fractions following clearing from HN2017 are also used instead of those in BLUE.
-In S HNt , the decay times from HN2017 are used in BLUE.
-In S HNFull , BLUE is run using net LUC transitions, starting in 1700 and using HN2017 parameters for C densities, harvest and clearing allocation fractions and decay times (i.e. a combination of S HNCdens , S HNAlloc and S HNt ).
In those simulations where BLUE is run with all or a subset of HN2017 parameters (S HNCdens , S HNAlloc , S HNt ), instead of global values per PFT, the values per PFT from HN2017 are translated into BLUE PFTs and organised into parameter maps that can be read by BLUE. The difference between these simulations and S BL-NET provides an estimate of F LUC differences each including one set of parameters from HN2017 in BLUE. For S HNFull , the difference with S BL-NET is not expected to be simply the sum of the corresponding S HNCdens , S HNAlloc and S HNt differences because of interactions between C densities, allocation fractions and response times, with differences in model structure and LUC forcing, as described in Fig. 1.

Model comparison
We calculate F LUC from the different simulations between 1850 and 2015 (the period common to both datasets) for the globe and for the 18 regions used in Bastos et al. (2020) to evaluate sources of uncertainty in land carbon budgets: Canada (CAN), USA, central America (CAM), northern South America (NSA), Brazil (BRA), southern South America (SSA), Europe (EU), northern Africa (NAF), equatorial Africa (EQAF), southern Africa (SAF), the Middle East (MIDE), Russia (RUS), the Korean Peninsula and Japan (KAJ), central Asia (CAS), China (CHN), southern Asia (SAS), SE Asia (SEAS) and Oceania (OCE). We then evaluate separately the contribution of running BLUE with the reference HN2017 setup, i.e. with net instead of gross transitions and starting in the 1700s (S BL-Net -S BL ). S BL-Net is then used as the baseline for comparison with other simulations, which follow the same setup (net emissions in simulations starting in 1700).
Both BLUE and HN2017 add emissions from peat burning (van der Werf et al., 2017) and drainage (Hooijer et al., 2010) in a post-processing step. For easier comparison of direct model output, we do not include these post-processing steps.
For all simulations, we compare both the interannual variability in F LUC and the resulting cumulative emissions between 1850 and 2015. The discrepancies in interannual variability of estimated F LUC between HN2017 and each simulation from BLUE (S i ) are assessed by the root mean square difference of F LUC from each simulation, calculated as where N is the number of years. In addition, we compare the effect of the different parameterisations on the gross LUC fluxes: fluxes from clearing of primary or secondary natural vegetation, wood harvest (net of decay and regrowth), abandonment of agricultural land (cropland and pasture) and transitions between cropland and pasture.

Global F LUC
We analyse annual F LUC from 1850 until 2015 (Fig. 2,  All BLUE simulations show similar interannual variability patterns, consistent with the use of the LUH2v2.1 forcing, but these variations are dampened when the parameters for C densities, allocation fractions and time constants from HN are used. The BLUE simulation using the full set of HN2017 parameters (S HNFull ) shows F LUC close to those of HN2017 until the 1980s and with a weak peak in emissions in 1960s and relatively stable F LUC rather than an increasing trend in 2000-2015. The resulting cumulative F LUC for S HNFull is 104 PgC, 52 % lower than S BL-NET , at the low end of previous estimates (Hansis et al., 2015;. This value is substantially outside the cumulative budget range of the GCB2019 (205±60 PgC 1850-2018) but still consistent with the uncertainty range of ±0.7 PgC yr −1 provided by GCB2019 after 1959.
The parameters that lead to larger differences in global F LUC are the C densities (S HNCdens , Fig. 2, dark red) and the allocation rules (S HNAlloc , yellow), while changing the decay times have small effect. Both S HNCdens and S HNAlloc result in lower F LUC over the 1850-2015 period, and weaker increasing trends between 2000 and 2015, which indicates that the trends in this period are not only due to forcing differences (Bastos et al., 2020) but in part from model parameterisation. The cumulative F LUC in 1850-2015 is 164 and 142 PgC for S HNCdens and S HNAlloc respectively, i.e. 24 % and 34 % lower than S BL-Net , and closer to the HN2017 estimate on global scale. The lower F LUC with HN2017 C densities can be explained by the HN2017 lower C densities in both vegetation and soil for most PFTs and the smaller difference between primary and secondary forest C stocks (Table A2) compared to BLUE. In particular, BLUE often features higher vegetation carbon in broadleaf forests and higher soil carbon in most other ecosystems than HN2017, which, together with lower soil carbon assumed for cropland and pasture, leads to substantially larger carbon losses in BLUE for many transitions (Table A2). Even though S HNt results in a small positive difference in cumulative F LUC relative to S BL-NET (221 PgC), effect of response-curve times is multiplicative (Fig. 1), therefore the F LUC trends are amplified (Fig. 2, left panel).

Regional patterns
The global differences between simulations result from interactions between the different factors and in the types of LUC occurring in a given point in space and time. We first analyse the temporal evolution of regional F LUC for each simulation (Fig. 3).
The factorial analysis sheds light on the underlying reasons of the diverging trends in the 2000s, where BLUE showed an upward trend, opposing the downward trend in F LUC from HN2017. In absolute terms, the upward trend in BLUE stems foremost from BRA (a peak of about 0.45 PgC yr −1 in the early 2000s, then a decline; similar in HN2017 but peaking at about 0.3 PgC yr −1 ), SSA (also shows an increase in F LUC in CHN for the 2010s, while HN2017 estimates a sink due to afforestation. In all of these regions, adjusting BLUE partly or fully to HN2017 parameters does not obviously bring trends closer together, because a lowering of the 2000s F LUC in BLUE, which results from several of the factorial experiments, would lead to lower F LUC in earlier time periods as well. To summarise these patterns, we calculate the relative average differences in regional cumulative F LUC from S BL-Net and S HNFull with S BL (top panel of Fig. 4a, values in percent change) and the root mean square difference with HN2017 (Eq. 1, RMSD HN-BLUE ), which reflects differences in interannual variability (top panel of Fig. 4b, in TgC yr −1 ).
Even though S BL-Net results in a small (−13 %) decrease in global F LUC compared to S BL as discussed above, regional differences show stronger decreases, especially in regions with intensive shifting cultivation practices, such as SEAS (−40 %), CAM (−22 %), SAF and EQAF (−23 % in both). S BL-Net additionally leads to higher agreement in interannual variability with HN2017 at global scale but also for most regions (i.e. lower RMSD HN-BLUE , Fig. 4b). Europe shows 7 % higher cumulative F LUC for S BL-Net than S BL , likely because of the importance of subpixel post-abandonment recovery and re-/afforestation dynamics in Europe Fuchs et al., 2015). However, this increases the RMSD HN-BLUE by only 2 TgC yr −1 .
As seen for global F LUC , the simulation using HN2017 parameter values (S HNFull ) leads to a reduction of F LUC by 50 % or more compared to S BL in many regions (dark blue colours, see values in the centre of grid cells in Fig. 4a), except for CAS, where an increase of 47 % is estimated, mainly due to differences in C density parameters. The reductions in cumulative F LUC differences in CAM, BRA, EU and SEAS. Decreases in the RMSD HN-BLUE between S HNFull and S BL-Net globally and for 11 of the 18 regions (Fig. 4b), with small increases elsewhere. This shows that differences in setup and parameterisation cancel differences arising from the different land-use forcing in BLUE and HN2017 in some regions. In addition, the reductions in RMSD HN-BLUE in S HNFull compared to S BL are stronger than for S BL-Net , indicating that parameterisation differences have stronger contribution to RMSD HN-BLUE than the impact of simulation net/gross transitions.
The differences between S BL-Net and each of the factorial simulations (bottom panel of Fig. 4a) shows that C densities and allocation rules are the dominant factors not just for global F LUC , but also in most regions, and lead to lower RMSD HN-BLUE , compared to S BL-Net (Fig. 4b). Using HN2017 allocation fractions to pools for harvest and clearing results in lower cumulative F LUC everywhere (S HNAlloc ) and decreases the RMSD HN-BLUE at global scale and in all regions but NSA and SSA. Altering C densities (S HNCdens ) has contrasting effects in cumulative F LUC between regions, increasing cumulative F LUC in 3 out of 18 regions. Strong reductions in RMSD HN-BLUE for S HNFull are found in BRA, RUS, CHN and SAS (top panel in Fig. 4b), explained by RMSD HN-BLUE reductions by changing the C densities in vegetation and soil pools (S HNCdens ) and allocation fractions. In SEAS, cumulative F LUC is reduced when using HN2017 parameters (S HNFull ) but with a higher RMSD HN-BLUE . In this region, C density parameters contribute the most to the reduction of bias, compared to S BL-Net , and both C density parameters and allocation fractions contribute to the increase in RMSD HN-BLUE . This highlights the importance of interac-tions between different parameters to the overall F LUC variability. The decay times generally contribute to small increases in cumulative F LUC compared to S BL-Net , except NAF where they increase F LUC by 22 % and would slightly amplify RMSD HN-BLUE globally and in 13 of the 18 regions.

Effects on gross F LUC component fluxes
To better understand the effects of the different parameterisations on F LUC , we analyse the spatial distribution of the differences between S HNFull , S HNCdens , S HNAlloc , S HNt and S BL-Net decomposed into gross F LUC : fluxes from harvest, clearing, abandonment/regrowth and transitions between crop and pasture (Fig. 5).
In most grid cells, the difference between S HNFull and S BL-Net is dominated by the effects of the parameterisation of C densities in gross fluxes and allocation rules for abandonment fluxes. For F LUC from abandonment and clearing to agriculture (crop and pasture), the differences are mostly negative (i.e. higher uptake from recovery and lower emissions from clearing to agriculture using HN2017 parameterisation), while for the fluxes from transitions between crop and pastures and harvest, regional contrasts between posi-tive and negative differences are found. The lower F LUC from clearing to agriculture for S HNFull in most grid cells is linked with the lower vegetation and soil C densities for most forest PFTs (Table A2). Higher F LUC from wood harvest are simulated by S HNFull in eastern and northern North America, central Europe and Scandinavia and China, mostly related with response-curve time constants. Other transitions (crop to pasture or pasture to crop) result in higher F LUC for S HNFull in most semi-arid regions, which is explained to a larger extent by differences in C densities and time constants between the two models (S HNCdens , S HNt ) than by allocation rules (S HNAlloc ) (Table A2).

Discussion
Fluxes from land-use change and management are one of the most uncertain and least constrained components of the global carbon cycle (Bastos et al., 2020;Friedlingstein et al., 2019;Houghton, 2020). Several sources of uncertainty in F LU C have been previously analysed, such as the choice of gross vs. net LUC transitions Fuchs et al., 2015;Gasser et al., 2020;Wilkenskjeld et al., 2014), the definitions and terminology used (Grassi et al., 2018;Pongratz et al., 2014) or the management processes considered Pugh et al., 2015). The two bookkeeping models used in the global carbon budgets may differ in their F LUC estimates due to differences in the forcing data and differences in model structure, parameterisation and in how certain processes are represented. The impact of the LUC forcing on F LUC has been extensively investigated in previous studies (Bastos et al., 2020;Gasser et al., 2020;Hartung et al., 2021). Both models have a similar structure (Table 2) and both models use parameters from different sources that are based on observations which are, however, uncertain. Here, we evaluate how the different model parameterisations impact F LUC estimates and whether they can explain differences in global and regional average F LUC and on variability between the two models since 1850.
The simulation with net transition (S BL-Net ) reduces differences in the average and interannual variability of F LUC estimates from BLUE and HN2017. The contribution of gross to F LUC is smaller than previous estimates (15 %-38 %, Arneth et al., 2017;Fuchs et al., 2015;Hansis et al., 2015) and also lower than in earlier BLUE simulations that used the same rule. Cancelling of primary and secondary land clearing, with primary first, gave 24 % lower emissions in Hansis et al. (2015). The differences are likely explained by the substantial changes that came in with the change from LUH1 to LUH2 versions, in particular the change to Heinimann et al. (2017) shifting cultivation maps.
Based on observation-based constraints by atmospheric inversions Bastos et al. (2020) pointed out that F LUC estimated by DGVMs and BLUE in BRA, SEAS, EU and EQAF were probably too high. Our analysis shows that F LUC estimates for these regions, except EU would be lower if the setup of HN2017 were used, i.e. starting in 1700 instead of 850 and using net transitions, and all four regions would show even larger reductions in F LUC if the parameterisation of HN2017 were used in BLUE. However, these changes would also bring down F LUC estimates in many regions that were not deemed too high in F LUC based on the constraint by observations. This suggests that neither the BLUE nor HN2017 setup and parameterisation can be judged as being superior to the other for all regions of the world and all time periods.
The rules for allocation of displaced carbon to different pools have the strongest effect on average F LUC , as well as their variability, followed by C density parameters. Contrary to C densities (Sect. 4.1), at the moment no global dataset of allocation parameters exists that could be compared to the allocation fractions used here. BLUE and HN2017 F LUC in 1850-2015 show better agreement in temporal variability, mostly because fact that the C density and allocation parameterisations of HN2017 dampen the effect of differences in land-use change transitions.
This elimination of the 2000s trend difference in some regions comes at the cost of larger divergences in earlier times. With high LUC dynamics in the 20th century in some regions, which is more strongly captured by BLUE with its representation of gross transitions, slightly larger C density losses with the transformation of natural vegetation to agriculture or degradation by wood harvesting and rangelands may lead to an increase in F LUC beyond what would be expected from net land-use areas alone. On top comes a distribution of cleared and harvested material to faster pools (more slash in BLUE, more long-lived products in HN2017), which also emphasises the effects of LUC dynamics. The differences between BLUE and HN2017 are thus a combination of higher LUC dynamics in BLUE (by using LUH2v2.1 and accounting for gross transitions) and of faster material decay than in HN2017. The different trends of BLUE and HN2017 in the 1950s and after 1990 are instead largely attributable to the different LUC forcing (Gasser et al., 2020).

Constraining C densities
The parameterisation of C densities of vegetation and soil pools is the second most relevant parameter but one that affects all flux components. Even though both models were parameterised based on observation-based C densities, these parameters are highly uncertain, as they are derived from sparse plot-level data with high variance across datasets (Brown and Lugo, 1982;Post et al., 1982;Schlesinger, 1984;Zinke et al., 1986). Remote-sensing-based estimates of potential vegetation C stocks in undisturbed lands and well as present-day C stocks have been produced by Erb et al. (2018), including their uncertainty. The values of Erb et al. (2018) can, therefore, be compared to the potential C stocks simulated by HN2017 and by BLUE using the different configurations in this study (circles in Fig. 6), as well as of simulated present-day carbon stocks (small circles, end of arrows). In addition, we compare simulated C stocks with those of Anav et al. (2013) for the present day.
All BLUE simulations, as well as HN2017, have 4 %-6 % lower potential C stocks in vegetation than estimates in Erb et al. (2018) (Fig. 6). Since the values of potential biomass in Erb et al. (2018) were estimated for present day, they include the effect of environmental changes such as CO 2 fertilisation, and are expected to be up to 10 % higher than they would be without these effects . Therefore, the C stocks in vegetation simulated both by BLUE and HN2017 are consistent with these remote-sensing based estimates, if environmental effects are excluded. The methodology of using the highest percentiles in a moving window as a potential value in Erb et al. (2018) could overestimate biomass because it has a bias towards capturing the oldest rather than average forests in a cycle of natural disturbances. Additionally, these simulations result in present-day C stocks in vegetation that are within the range provided by Erb et al. (2018), or close to its upper limit, and are also consistent with the reference value from Anav et al. (2013). All simulations estimate lower C stocks in the soil, compared to Anav et al. (2013). HN2017 has higher C stocks in soil both for the pre-industrial period and the present day, compared to BLUE simulations, which are close to the values estimated by Anav et al. (2013). The two simulations using HN2017 carbon densities (S HNFull , and S HNCdens ) result in too-low C stocks in soils, compared to Anav et al. (2013), and much lower potential vegetation C stocks than Erb et al. (2018). However, present-day vegetation C stocks for S HNFull and S HNCdens are consistent with their values.

Conclusions
We conclude that differences between BLUE and HN2017 arise from the higher allocation of cleared and harvested material to quickly decomposing pools in BLUE, compared to HN2017, combined with higher emissions in BLUE due to often larger differences in soil and vegetation C densities between natural and managed vegetation or primary and secondary vegetation. It should be noted, however, that specific transitions and prevalence of specific PFTs in certain regions prohibits generalising this statement. Together with the larger land-use dynamics which stem from BLUE representing gross transitions and its usage of LUH2v2.1 as LUC forcing, these changes lead to overall higher carbon losses that have a faster decay.
The two reference datasets of global C stocks seem to support the choice of C densities used in the default BLUE configuration and therefore the higher estimates of F LUC by BLUE. However, it should be noted that both models have limited representation of spatial variability in C densities: BLUE ignores spatial variability in vegetation and soil C within each PFT distribution, for example, due to less favourable climate in some regions; HN2017 includes country-specific C densities for vegetation, but not for soil, and no spatial variability within each country.
The large contribution of the C densities to the differences between the F LUC estimates of the two BK models found in our results highlights the importance of deriving spatially explicit maps of vegetation, and soil C densities discriminated per vegetation type would be required. Producing such maps is challenging, especially for the estimates of C densities in undisturbed land, as most of the land surface has been directly or indirectly impacted by human activity. However, observation-based maps of vegetation and soil C densities in both disturbed and undisturbed land would be highly valuable, as they could be used in BK models to reduce uncertainties in F LUC .
Similarly, improvements in allocation can be performed. Bookkeeping models, and many DGVMs, follow very simple assumptions of the fate of cleared or harvested material, often along the lines of the "Grand Slam Protocol" (McGuire et al., 2001) but developed for bookkeeping models earlier (Houghton et al., 1983), which distinguishes only three product pools (fast, medium, slow), with timescales defined rather ad hoc as 1, 10 or 100 years. The fractions going into these and into slash are compiled from individual studies for specific regions (Houghton et al., 1983;Hurtt et al., 2020) but are hard to quantify on the global level throughout several centuries. Such long timescales are needed, however, to capture the slow dynamics of decay and regrowth and thus to capture legacy fluxes accurately. For the last decades, however, more detailed data have become available than those currently used in the models of the global carbon budgets, such as global sets of dynamic carbon-storage factors (Mason Earles et al., 2012) that define a larger number of product pools and time-varying fractions of allocation.   Table A3. Global median values of harvest and clearing allocation rules to the short-, mediumand long-lived pools (1, 10 and 100 years) for BLUE PFTs, from the standard BLUE setup and used for the simulations with parameters from HN2017 (converted to BLUE PFT classes). The slash fraction from clearing is calculated as the 1 minus the sum of the 1-, 10and 100-year pools.  Figure A1. Carbon densities in vegetation (a, c) and soil (b, d) for tropical broadleaved evergreen forests for BLUE (a, b) and HN2017 (c, d) in tC ha −1 . It should be noted that even though C density values are assigned on a per-country basis in HN2017, they do not differ between countries for soil C. Note that C densities are assigned to all countries, even if evergreen broadleaved forest is not present in a given country.
Code availability. The bookkeeping model code can be requested from Richard A. Houghton (HN2017) and Julia Pongratz (BLUE). Other scripts are available upon request to the corresponding author.
Data availability. The global and regional fluxes from HN2017 and the BLUE simulations are provided in the Supplement. The gridded fields of the BLUE simulations can be provided by the contact author upon request.
Author contributions. AB designed the study together with JP, with input from KH, JEMSN and TBN. AB planned and executed the model simulations as well as the overall analysis. RAH and JP provided expert knowledge on each of the two bookkeeping models. TBN prepared Fig. 1. KH, JP, RAH and JEMSN contributed to the discussion of the results. AB prepared the original draft; all other co-authors participated in the review and editing of the paper.
Competing interests. The authors declare that they have no conflict of interest.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.