Bookkeeping estimates of the net land-use change flux – a sensitivity study with the CMIP6 land-use dataset

The carbon flux due to land-use and land-cover change (net LULCC flux) historically contributed to a large fraction of anthropogenic carbon emissions while at the same time being associated with large uncertainties. This study aims to compare the contribution of several sensitivities underlying the net LULCC flux by assessing their relative importance in a bookkeeping model (Bookkeeping of Land Use Emissions, BLUE) based on a LULCC dataset including uncertainty estimates (the Land-Use Harmonization 2 (LUH2) dataset). The sensitivity experiments build upon the approach of Hurtt et al. (2011) and compare the impacts of LULCC uncertainty (a high, baseline and low land-use estimate), the starting time of the bookkeeping model simulation (850, 1700 and 1850), net area transitions versus gross area transitions (shifting cultivation) and neglecting wood harvest on estimates of the net LULCC flux. Additional factorial experiments isolate the impact of uncertainty from initial conditions and transitions on the net LULCC flux. Finally, historical simulations are extended with future land-use scenarios to assess the impact of past LULCC uncertainty in future projections. Over the period 1850–2014, baseline and low LULCC scenarios produce a comparable cumulative net LULCC flux, while the high LULCC estimate initially produces a larger net LULCC flux which decreases towards the end of the period and even becomes smaller than in the baseline estimate. LULCC uncertainty leads to slightly higher sensitivity in the cumulative net LULCC flux (up to 22 %; references are the baseline simulations) compared to the starting year of a model simulation (up to 15 %). The contribution from neglecting wood harvest activities (up to 28 % cumulative net LULCC flux) is larger than that from LULCC uncertainty, and the implementation of land-cover transitions (gross or net transitions) exhibits the smallest sensitivity (up to 13 %). At the end of the historical LULCC dataset in 2014, the LULCC uncertainty retains some impact on the net LULCC flux (±0.15 PgC yr−1 at an estimate of 1.7 PgC yr−1). Of the past uncertainties in LULCC, a small impact persists in 2099, mainly due to uncertainty of harvest remaining in 2014. However, compared to the uncertainty range of the LULCC flux estimated today, the estimates in 2099 appear to be indistinguishable. These results, albeit from a single model, are important for CMIP6 as they compare the relative importance of starting year, uncertainty of LULCC, applying gross transitions and wood harvest on the net LULCC flux. For the cumulative net LULCC flux over the industrial period, the uncertainty of LULCC is as relevant as applying wood harvest and gross transitions. However, LULCC uncertainty matters less (by about a factor of 3) than the other two factors for the net LULCC flux in 2014, and historical LULCC uncertainty is negligible for estimates of future scenarios. Published by Copernicus Publications on behalf of the European Geosciences Union. 764 K. Hartung et al.: Bookkeeping estimates of the net land-use change flux


Introduction
Globally, the historical net carbon flux due to land-use and land-cover change (net LULCC flux) has been positive (i.e. a Since the net flux from LULCC cannot be directly measured, we can only rely on values calculated by models, for example dynamic global vegetation models (DGVMs) and bookkeeping models. Bookeeping models (Houghton, 2003;Houghton and Nassikas, 2017;Hansis et al., 2015) combine observation-based carbon densities with LULCC estimates to determine the net 35 LULCC flux. DGVMs, on the other hand, model the evolution of carbon pools on a process-based level and also react to climate impacts and trends.
Differences in model estimates of the net LULCC flux can have different origins, broadly falling into three categories: (i) the underlying LULCC reconstruction and its uncertainties, (ii) the LULCC practises considered (e.g. wood harvest and shifting cultivation) and (iii) model assumptions (e.g. parameterizations of processes like type and lifetime of wood products). 40 Considering point (i), several global multi-century LULCC reconstructions exist (i.e. Pongratz et al., 2008;Kaplan et al., 2011;Klein Goldewijk et al., 2017;Hurtt et al., 2020). Furthermore, several studies isolated and quantified the impact on the net LULCC flux of the individual components of the three categories listed above: (i) the impact of the choice of LULCC dataset (Hurtt et al., 2006;Pongratz et al., 2008;Stocker et al., 2011); (ii) the importance of neglecting or modelling wood harvest (Stocker et al., 2014;Arneth et al., 2017) and shifting cultivation (Hurtt et al., 2011;Wilkenskjeld et al., 2014;Stocker 45 et al., 2014;Arneth et al., 2017); and (iii) the model assumptions, for example using either DGVMs or bookkeeping models (Houghton et al., 2012;Gasser et al., 2020). The starting year of a simulation can either be seen as part of the LULCC itself (category i) or a model assumption (category iii), and is a good example for a very common uncertainty across different model types: Despite the Land-Use Harmonization 2 (LUH2) land-use change data being available from 850, Coupled Model Intercomparison Project Phase 6 (CMIP6) simulations start by default in 1850 (Eyring et al., 2016), contributions to the Land 50 Use Model Intercomparison Project (LUMIP) assess different starting dates of 1700 and 1850 (Lawrence et al., 2016), and more recent Global Carbon Budget (GCB) estimates switched from 1860 to 1700 as a starting year for DGVM simulations (Le Quéré et al., 2018;Friedlingstein et al., 2019Friedlingstein et al., , 2020. Example studies from bookkeeping(-like) models comparing the impact of properties across at least two of the above listed categories are Hurtt et al. (2011) and Gasser et al. (2020). In the Hurtt et al. (2011) sensitivity study based on the LUH1 55 dataset (Land-Use Harmonization, Chini et al., 2014), the authors analysed over 1600 simulations with respect to model "factors" like the simulation start date, the choice of historical and future agricultural land-use and wood harvest scenarios, and inclusion of shifting cultivation. The simulation outputs were compared across a variety of metrics and diagnostic tools including secondary area and mean age, global gross and net transitions, and cumulative gross and net loss of aboveground biomass. Their analysis showed that the most relevant factors were the start date, and the inclusion of both shifting cultivation 60 and wood harvesting. The LUH2 dataset (Hurtt et al., 2020) responded to these findings by developing a dataset that started in 850, with improved representations of the spatial patterns of both shifting cultivation and wood harvesting based on remotesensing data. Gasser et al. (2020) use a hybrid model (the OSCAR model) combining bookkeeping properties (tracking the effect of LULCC activities) and biogeophysical properties from a DGVM to estimate uncertainties acting on annual and cumulative CO 2 emissions. The focus in Gasser et al. (2020) is on the relative importance of biogeophysical parameters, the 65 LULCC dataset (either the LUH2 or the FRA (Forest Resources Assessment, fao, 2015) dataset) and the inclusion of the LASC (loss of additional sink capacity, Pongratz et al., 2014 ) to the net LULCC flux. The latter property constitutes one of the main differences of the resulting flux estimates between DGVMs and bookkeeping models and is due to changes in carbon densities caused by varying atmospheric CO 2 concentrations. Gasser et al. (2020) find that the largest variation in flux estimates is induced by biogeophysical parameters (mainly carbon densities), followed by the definition of the LULCC flux (i.e. including 70 or excluding LASC). The LULCC dataset is found to cause the least uncertainty cumulatively, though the trend of the annual LULCC flux based on the two datasets has opposing signs in recent years.
The goal of our study is to build upon previous approaches (e.g. Hurtt et al., 2011 andGasser et al., 2020) to assess a variety of the above mentioned sensitivities of the net LULCC flux with one single underlying LULCC dataset reporting uncertainty (LUH2, Hurtt et al. 2020) and the bookkeeping model BLUE (Bookkeeping of Land Use Emissions, Hansis et al. 2015). The 75 LUH2 dataset (Hurtt et al., 2020) provides historical land-use estimates from 850 with uncertainty estimates for agricultural land area (from the History Database of the Global Environment (HYDE), Klein Goldewijk et al. 2017) and wood harvest (Zon and Sparhawk, 1923;Kaplan et al., 2017). The dataset captures the challenge of reconstructing the LULCC of the past. LUH2 is the land-use dataset that is -besides many other studies -also applied in CMIP6 (Eyring et al., 2016) for simulations with process-based DGVMs, like in LUMIP (Lawrence et al., 2016). Our findings and discussions regarding DGVM studies are 80 therefore also informative for the interpretation of CMIP6 results. BLUE is a data-driven bookkeeping model (Hansis et al., 2015) used in the GCB for LULCC flux estimates (Friedlingstein et al., 2019). We choose a bookkeeping model in contrast to a DGVM because LULCC fluxes due to individual LULCC events can be traced and because of the potential to isolate the net LULCC flux independent of climate variability, among other factors .
Due to the high computational efficiency of the bookkeeping model, several sensitivity experiments can be produced and an 85 exhaustive comparison of common factors impacting the total net LULCC flux is possible. Here, the impact of modelling wood harvest and shifting cultivation as land management processes is compared to the impact of uncertainties of the LULCC dataset and the initialisation year of the LULCC simulation. We design additional artificial sensitivity experiments to disentangle the uncertainty from the initial land cover distribution and the uncertainty from LULCC activities (transitions). By extending the historical simulations under future LULCC scenarios, we can then estimate the impact of past uncertainty on future estimates 90 of the net LULCC flux.
Our study thus provides an extension to previous studies comparing sensitivities across a different set of factors by also disentangling the relevance of the initial land-cover distribution compared to the uncertainties in LULCC activities on the net LULCC flux. In addition, it updates the sensitivities of e.g. wood harvest and shifting cultivation based on a more recent LULCC dataset, which is also the basis for CMIP6, using one bookkeeping model.

95
The analysis of the simulations is guided by two main questions: 1. How do LULCC uncertainties influence the overall emitted carbon? and 2. What uncertainties remain at the end of the historical period and how much do they influence future projections? For both questions the global net LULCC flux, as well as separation by LULCC activity and by different regions are considered. This analysis can serve as reference for subsequent sensitivity analyses with complex models (DGVMs, ESMs) and points to model and data choices which matter most for modelling of land-use related changes in the carbon cycle. 100 2 Model description, LULCC dataset and experiment setup As a first step, we present the bookkeeping model BLUE used in this study. Then the LUH2 dataset, its high and low LULCC scenarios as well as various future scenarios are introduced. Finally, an overview of the conducted BLUE experiments is given.
A brief description of how the LUH2 dataset is prepared for use with the BLUE model and short discussion of the properties of the LULCC dataset are provided in the Appendix (Sec. A1 and A2).

The bookkeeping model BLUE
BLUE (Hansis et al., 2015) is a data-driven, semi-empirical bookkeeping model. Initial areas of the four land-cover types primary land, secondary land, cropland and pasture determine the amount of carbon stored in soil and vegetation biomass prior to tracked LULCC activities. These initial "equilibrium pools" are determined from observation-based carbon densities and are non-zero for the carbon associated with the soil component undergoing slow relaxation processes and the vegetation 110 biomass. LULCC activities, i.e. land-use transitions, take the model state away from equilibrium, increasing or decreasing so-called disequilibrium pools. BLUE considers the four LULCC activities abandonment (cover change from crop or pasture to secondary land), clearing for cropland or pasture (cover change from primary land, secondary land, crop or pasture to crop or pasture) and wood harvest (cover change from primary to secondary land, or land management on secondary land). As wood harvest is the only type of harvest modelled in BLUE it is in the following abbreviated as harvest. Disequilibrium pools exist 115 for vegetation, soil undergoing fast and slow relaxation processes, and for products from harvest and clearing with lifecycles of 1, 10 and 100 years. Response curves characterise the temporal adjustment of the disequilibrium pools after a transition to the new equilibrium, where the difference in carbon stocks, namely the content of the disequilibrium pools, is steadily emitted to the atmosphere. The version of BLUE used here and in the GCB is based on 11 natural plant functional types (PFTs), of which six represent forested biomes, and two agricultural PFTs (crop and pasture). More information on the BLUE model can 120 be found in Hansis et al. (2015).
For the analysis it is useful to note a few additional model assumptions. If two simulations are based on the same LULCC dataset but start in different years (y 2 > y 1 ), then areas of the four cover types will be identical in year y 2 but the disequilibrium carbon pools and the resulting flux to the atmosphere will not be identical. As the simulation started in y 2 is based on the initial land cover of that year, it will only track LULCC activities occurring after y 2 and not all the activities that have happened since 125 y 1 , as in the first case. Moreover, two simulations can have an identical cumulative LULCC flux up to a given year, but because they might be associated with different disequilibrium pools, the subsequent evolution of fluxes can differ (see for example Fig. 2). This also applies to net LULCC flux caused by LULCC activities during the simulation but occurring after the end of the simulation (e.g. decay of long-lived harvested wood products), which are not tracked in the applied setup of the BLUE model. The first assumption, to only track LULCC activities subsequent to the start year, is specific to the model world. The   130 second assumption, to only account for the net LULCC flux which already happened and not for the total net LULCC flux that a LULCC activity causes, is more common also to policies. (annual and perennial C3 and C4 crops as well as C3 nitrogen fixers), managed pasture and rangeland, as well as natural vegetation (forested and non-forested, primary and secondary land). Rangelands are distinguished from managed pastures by an aridity index and population density from the HYDE dataset and can imply a land-cover change (e.g., in Brazil's Cerrado), 140 but can also simply mean a different management of the original land-cover type (e.g. in the semi-dry regions of Australia). In addition to five wood harvest transitions on primary and secondary, forested and non-forested land (for secondary forested land is further divided by forest age), gross land-use transitions are available between the different land-use types. Wood harvest is characterised alternatively by the harvested area or the removed biomass. Land-use states and transitions are available for a baseline scenario and two additional scenarios which in this study are used to quantify the uncertainty of the LULCC dataset: 145 a high scenario assumes more land-use activity at the start of the LULCC dataset in 850 than in the baseline, whereas the low scenario starts off with less land-use activity, and vice versa at the end of the dataset.

5
The uncertainty in agricultural area is estimated in the HYDE dataset and linked to population uncertainty. The latest version of the HYDE dataset, HYDE3.2, provides data every 100 years until 1700, every 10 years between 1700 and 2000, and every year after 2000. The LUH2 dataset uses agricultural data from the uncertainty range A of the HYDE product, an uncertainty 150 range based on literature and expert judgement. The uncertainty in primary/secondary land is estimated in the LUH2 dataset, partly through application of three different wood harvest estimates based on two different datasets before 1920 (Zon and Sparhawk, 1923;Kaplan et al., 2017) and partly through the different gross transitions arising from the different LULCC time series. For the LUH2 dataset, HYDE data is interpolated and combined with annual wood harvest data from Food and Agriculture Organization (FAO) to provide annual states and transitions.

155
Results from four future scenarios are also included in this analysis, namely two SSP4 scenarios, SSP4-3.4 and SSP4-6.0, by GCAM and two SSP5 scenarios, SSP5-8.5 and SSP5-3.4OS, by MAgPIE (Riahi et al., 2017;Popp et al., 2017;Calvin et al., 2017;Hurtt et al., 2020) for the period 2015-2100. SSP4 describes an inequality scenario with low challenges to mitigation and high challenges to adaptation. SSP5, on the other hand, is characterised by fossil-fuelled development with high challenges to mitigation and low challenges to adaptation. In the following, the scenarios are referred to by their Shared Socioeconomic 160 Pathways (SSPs) and their Representative Concentration Pathways (RCPs) and not mainly by the Integrated Assessment Model (IAM) that produced them, i.e. GCAM or MAgPIE. Hurtt et al. (2020) gives a more detailed summary of the properties of the different land-use scenarios. Of all available future scenarios these four were selected for this study because they are based on the same two SSP scenarios but describe a range of possible RCP scenarios. For each of the different scenarios no further uncertainty ranges are provided but the set of scenarios is used to explore the impact of past LULCC uncertainties on the future 165 net LULCC flux. More information is given in Appendix A2.
It should be noted that the LUH2 dataset, as proposed by CMIP6, does not capture the full range of uncertainty but is an estimate based on the available data (Klein Goldewijk et al., 2017;Hurtt et al., 2020). Importantly, annual updates to the LUH2 data, for use in the GCB, are provided when further/new information becomes available, and customized versions of the LUH2 data have been produced for use in specific studies (e.g. Frieler et al., 2017). In particular, the last years of the baseline 170 LUH2 scenario have been substantially revised for subsequent analyses related to the annual GCB. This includes updates in the underlying agricultural data from the FAO, but also revisions of regionally inconsistent data (e.g. erroneous data in Brazil in the GCB 2018 results, Le Quéré et al., 2018;Bastos et al., 2020). These corrections are not included in the current CMIP6 dataset.  The nine main experiments (Table 1)   It should be noted that the extent of the LULCC areas in BLUE sometimes differs from the LUH2 input dataset, even for the nine main experiments, mainly because of a mismatch in PFTs between the LUH2 (harvest) input and the BLUE model. In all cases the amount of primary land is larger in BLUE than in the original LUH2 dataset, at the cost of other land-cover 195 types. Overall this means that the total amount of net LULCC flux will be underestimated in BLUE, the most in the HI850 experiment. More information is provided in Section A2.

Experimental setup and analysis
In addition to the nine main experiments, we conduct 30 sensitivity experiments (Tables 2 and 3) in order to (i) compare the sensitivity due to LULCC and StYr to other LULCC properties and (ii) to assess how historical uncertainty propagates into future scenarios.

200
The three LUH2 LULCC estimates differ not only in the temporal evolution of the LULCC activities but also in their initial areas, especially when the simulation starts after 850. To disentangle these effects, we conduct additional BLUE simulations based on artificial LULCC information which is not proposed by LUH2. Instead, it uses the original (REG) area initial con-   Table 1) and additional sensitivity experiments (third to sixth row). The first column gives the abbreviation of the experiment type described in the second column and the last three columns provide reference simulations for the uncertainty analysis (more information around Fig. 3 However, these deviations of primary land area from the LUH2 dataset are still smaller than those caused by not considering wood harvest (not shown).
By neglecting information on some of the LULCC activities from the input dataset, simulations without wood harvest and with net instead of gross transitions can be produced (see Table 2). Note that the net LULCC flux is an aggregate of all sources and sinks due to LULCC in one year and is not linked to net transitions, i.e. net and gross land-use transitions must not be 215 mixed up with the net or gross LULCC flux. Table 3. Overview of future sensitivity experiments, continued from simulations with starting year 1700 for all three LULCC scenarios (see Table 2). Each of the three main simulations with starting year 1700 (Table 1) is continued following each of the four future landuse scenarios until 2100 (Table 3)

Results and discussion
The timeseries of all three historical uncertainty estimates (Fig. 1) shows the known feature of a peak in 1960 (Hansis et al., 2015;Friedlingstein et al., 2019). Before around 1960, the net LULCC flux is almost continuously rising and levels decrease although HI produces the largest net LULCC flux initially, this is not true throughout and especially at the end of the simulation.
The increased land-use dynamics in LO in later times let LO exceed HI in terms of cumulative net LULCC flux at some point in time, which we will call a crossing point.

235
Feature (1) is not in conflict with a roughly symmetric uncertainty of harvest, which at first could be assumed to result in equal difference in net LULCC flux between HI/REG and LO/REG. However, harvest on forested primary land, which is most important for the net LULCC flux, is similar between REG and LO ( Fig. A2) and thus causes the similarity in net LULCC flux.
Harvest on secondary land does not produce a net flux to the atmosphere if considered over a long time-period (total source is equivalent to total sink). From about 1800 onwards, less harvest on primary land can be observed in the HI LULCC estimate, 240 slightly more in LO and the most in REG.
Feature (2) develops because the timescale of regrowth (sink of carbon flux, i.e. flux from atmosphere to land) is longer than that of clearing/harvest (source). The feature can be seen by comparing the orange and green crosses, representing the cumulative net LULCC flux for the period 1850-2014 in REG850 and REG1700 respectively, with the blue cross for REG1850 in Fig. 2.

245
Finally, feature (3) can be explained by the link between LULCC and the net LULCC flux. If one scenario has continuously more LULCC than another, it will continue to produce a larger net LULCC flux and therefore no crossing points will occur.
However, if the rate of LULCC varies differently with time in two scenarios, then the simulation with an initially larger amount of LULCC activities exhibits fewer transitions towards the end. More information on properties and origins of crossing points in our analysis are given in Appendix B1.

Comparison of components of uncertainty
As discussed in the previous section around Fig. 2, Fig. 3 similarly shows that the cumulative net LULCC flux in the LO scenario (filled circles) exceeds the values in the HI scenario (crosses), and that REG (horizontal dash) and LO produce more similar cumulative net LULCC fluxes. The main analysis is restricted to comparison of the net cumulative LULCC flux between 1850 and 2014 but a discussion of the comparison over the full respective time periods is given in Appendix B2.

255
The cumulative net LULCC flux exhibits a reduced sensitivity to LULCC uncertainty with starting year 1850 (compare vertical spread of blue markers in LULCC-column) since the input data has smaller uncertainty in more recent years (   The color of the connecting lines represents the reference simulations. The artificial sensitivity experiments IC and Trans reveal that the sensitivity to ICs (visible as the spread across LULCC 265 estimates) increases more the later the simulation starts (Fig. 3, second and    Sensitivity of the cumulative net LULCC flux to harvest is mainly found for HI setups (LULCC column) and any LULCC simulation started in 1850 (StYr column). As mentioned in Section 3.1.1, harvest primarily results in net fluxes associated with the primary-to-secondary land transitions. The difference in these fluxes when comparing to HI vs. REG setups is much greater than the differences between the REG vs. LO setups (Fig. 2). Similarities in REG and LO harvest on primary land 300 are thus in line with similar net LULCC flux estimates in those experiments. This also explains why REG850 and LO850 produce similar amounts of harvest emissions until 1700 (Fig. 2), although their total harvested area is different. Both harvest and pasture expansion exhibit larger cumulative net LULCC flux in LO than HI experiments ( Fig. 4b and d), while the opposite is true for abandonment (Fig. 4a) and crop expansion shows minimal differences between the two experiments (Fig. 4c). The LULCC activity showing the best agreement between the three LULCC scenarios is crop expansion (Fig. 4c): results of HI and

13
The sensitivity of the net LULCC flux to the uncertainty from pasture expansion (Fig. 4d) is larger from transitions (Trans, fourth column) than from initial conditions (IC, third column). This can be explained by the fact that the agricultural area (  Table 2 for reference simulations). Only the net LULCC flux from simulations in starting 850 is slightly reduced. Note that in the experiments without harvest, the cumulative net LULCC flux from harvest is not zero because a small contribution of transitions from primary to secondary land due to rangeland expansion is counted as harvest.

325
The analysis of the contributions from the four LULCC activities to the total net LULCC flux sensitivity reveals: 1. LULCC uncertainty from harvest causes largest sensitivity in the cumulative net LULCC flux, followed by equal contributions from abandonment and pasture and negligible sensitivity due to crop uncertainty. For harvest the sensitivity is asymmetric, i.e. the net LULCC flux due to harvest in the HI scenario deviates further from REG than in the LO scenario. 2. Uncertainties in wood harvest cause large sensitivity to starting year of the simulation (StYr), as well as to initial conditions (IC) and transitions 330 (Trans) in the artificial LULCC experiments.

Regional variations of uncertainty
Europe, Asia and Africa exhibit the largest sensitivity of cumulative net LULCC flux to LULCC uncertainties in the REG, HI and LO simulations starting in 1700 (Fig. 5). In most regions, HI1700 produces a smaller cumulative net LULCC flux than REG1700 and the cumulative flux is generally larger in LO1700 than REG1700. However, there are large coherent areas 335 over Central and North America and Northern Europe/Asia with reduced cumulative net LULCC flux in LO1700 compared to REG1700.
Some regions with reduced emissions in the HI scenario, like Poland and South-East Asia, correspond to regions where fewer transitions of the LUH2 input data are used (Fig. A5), which is further enhanced in the HI-REG comparison.
Further division by LULCC activity is discussed in the following and shown in the Supplementary Material (see Fig. C1). Oceania is relatively small. Interestingly, the cumulative net-land use change flux over Oceania is larger in HI1700 rather than LO1700 because few transitions occur before 1700 so that basically all transitions are captured in the analysis period.
3.2 How does past uncertainty impact future scenarios?

The current state 350
Next, we want to analyse the magnitude of legacy emissions at the end of the historical simulations in 2014 and how much they are affected by past LULCC uncertainty. The magnitude of the annual net LULCC flux is determined by the size of the disequilibrium pools, which aggregate information of past LULCC events. If these disequilibrium pools are similar between two setups in a given year and the upcoming LULCC events are identical, then the annual net LULCC flux in the following years will be similar as well.

355
In 2014 the annual net LULCC flux is 1.7 PgC yr −1 in REG1700 (Fig. 6). Neglecting wood harvest (NoH) or only using net transitions (net) leads to three times larger deviations from the reference (see Table 2) than LULCC uncertainties (first column) and reduces the net LULCC flux at most to about 1. on the net LULCC flux in 2014 is similar to the characteristics discussed for the cumulative net LULCC flux estimates (Fig.   3). LULCC differences still modulate annual net LULCC flux estimates throughout the 20 th century (Fig. C2) and the largest variability of net LULCC flux, about ±0.1 to 0.3 PgC yr −1 is due to uncertainties in harvest and abandonment. In 2014, the largest impact of the remaining differences is due to harvest (about ±0.05-0.1 PgC yr −1 ). Figure 6. Global annual net LULCC flux in 2014. Although the overall layout is as in Fig. 3 the y-axis is not scaled by a reference simulation but presents the total net emissions in 2014. Note that the experiment groups LULCC and StYr are now combined as the presented values show the absolute net LULCC flux.

365
The extensions of the twelve scenario simulations as a continuation of the three historical simulations with starting year 1700 are shown in Fig. 1. The underlying area changes are presented in Fig. A3 and the attribution of emissions to different land-use histories is shown in the supplementary material (Fig. C3). Table 4   peak is mainly caused by crop expansion and a reduced sink from abandonment connected to a reduction of secondary land area from about 2050 (Fig. C3a, c).
The baseline SSP5 scenario (SSP5-8.5) on the other hand starts off with a minor maximum of the net LULCC flux which 375 is followed by a declining estimate. The initial peak in SSP5 is mainly caused by pasture expansion and wood harvest (Fig.   A3); the evolution of secondary land and cropland is similar as in the SSP4 baseline, but less area is used for pasture. Overall, the net LULCC flux in 2099 is lower than in SSP4-6.0 by about 0.6 PgC yr −1 . In the alternative 3.4OS scenario, which differs from the SSP5 baseline mainly after 2040, a secondary peak after around 2050 is present, mainly caused by crop expansion over pasture.

380
Remaining sensitivities to LULCC uncertainties in future scenarios are due to harvest (Fig. C3) and decrease towards the end of the 21 st century but do not reach zero in 2099. These uncertainties in harvest also explain why the remaining spread of net LULCC flux is larger in HI than LO, similar to the historical period.
The estimates of annual net LULCC flux estimates in 2099 (Table 4) Table 4). The impact of the initial uncertainty is thus further reduced, relative to the magnitude of the net LULCC flux in 2099, if followed by a larger cumula-390 tive net LULCC flux. Scenarios with reduced radiative forcing due to increased mitigation action (RCP3.4) produce increased cumulative net LULCC fluxes over the 21 st century, since fossil fuel emissions are substituted partly by energy from biofuel (Hurtt et al., 2020). This biofuel production causes additional cropland expansion and thus leads to net LULCC fluxes from LULCC (see Fig. C3c). Still, the total carbon emissions are expected to be larger in the baseline than the RCP3.4 scenarios. Wood harvest causes the largest sensitivity in the cumulative net LULCC flux (the flux in REG1700NoH is 175 PgC).

400
The sensitivity results presented here are limited by the fact that i) initial and final areas of land cover are not the same in the different experiments, ii) the disequilibrium pools are not the same in 2014 because timescales of harvest and regrowth differ (no committed emissions), iii) the uncertainty range of LUH2 is not exhaustive but represents known uncertainties (unintuitively the known uncertainty is larger in data-rich regions), and iv) BLUE does not use 100 % of the suggested transitions from the LUH2 input dataset.

405
Point iv) mostly affects usability of results from experiment HI850. Considering the whole time period, HI850 produces results between HI850NoH and the setup suggested by the LUH2 dataset, but closer to the latter. As differences of primary land area in 2014 between the LUH2 dataset and the BLUE experiments are otherwise uniform across LULCC scenario experiments, the qualitative properties of the results will be valid also if accurately using the whole dataset.
Over the period 1850-2014 the cumulative LULCC flux as determined by GCB2019 (Friedlingstein et al., 2019) is 195 ± 410 60 PgC, compared to 400±20 PgC from fossil fuels. The baseline scenario is thus included in the GCB2019 uncertainty range; the sensitivity range of the cumulative net LULCC flux due to LULCC uncertainty is smaller than the uncertainty in GCB2019 but the sensitivity due to inclusion of wood harvest is of similar magnitude. However, towards the end of the historical time series, the sensitivity of the net LULCC flux to LULCC uncertainty and to all other parameters is somewhat smaller than the uncertainty presented in is not considered . The estimates found here with the bookkeeping model BLUE and the LUH2 dataset (a 13 % decrease by neglecting shifting cultivation and 28 % decrease by neglecting wood harvest) are thus comparable in magnitude to previous studies, despite using a different modelling approach.
These results are also largely consistent with the findings of Hurtt et al. (2011), in which the contribution of shifting cultivation and wood harvesting were the model factors that the simulation output, in terms of the net LULCC flux, was most 430 sensitive to. In comparison with Hurtt et al. (2011) it can be noted that sensitivities might look different in other metrics like forest age or area. Although the spatial and temporal representation of these processes has been significantly improved in LUH2 (vs. LUH1), the choice of whether or not to include these processes in DGVM simulations is still a large contributor to the overall uncertainty in LULCC fluxes. However, assuming that the sensitivity in net LULCC flux from one LULCC dataset with uncertainties (based on the LUH2 dataset, presented here) is similar to the comparison of two LULCC datasets (Gasser 435 et al., 2020), results in Gasser et al. (2020) point towards even larger contributions from e.g. uncertainties in carbon densities (both spatially and temporally).

Conclusion
This study investigates the impact of LULCC uncertainties compared to other common uncertainties in modelling of LULCC fluxes with the bookkeeping model BLUE, like the representation of wood harvest and shifting cultivation. 440 We show that the sensitivity of the net LULCC flux to the uncertainty of LULCC based on the LUH2 datset is not negligible and may explain part of the large uncertainty range of DGVMs as part of the GCB (Friedlingstein et al., 2019), since LULCC processes are captured with varying comprehensiveness (see Table A1 in Friedlingstein et al., 2019). Furthermore, the difference in net LULCC flux between high and low land-use scenarios is expected to be larger in DGVMs than in a bookkeeping model as they are influenced by a higher CO 2 concentration exposure via the loss of additional sink capacity. In DGVM simulations, a higher CO 2 exposure will most likely lead to larger vegetation and soil carbon stocks in 455 the 20 th century in low simulations as compared to high land-use simulations. The increasing number of transitions in the 20 th century in the low land-use simulations will thus increase the difference in emissions between the two alternative scenarios.
Another difference that can influence results comparing bookkeeping models and DGVMs is that the former approach uses constant (present-day) carbon densities while DGVMs work with variable carbon densities which respond to environmental conditions. Nevertheless, the results presented here provide a reference for comparisons with the upcoming CMIP6 model  tion from rangeland to secondary land areas. BLUE pasture consists of managed pasture, the rangeland contribution and urban area. Finally, LUH2 C3/C4 annual/ perennial crop are combined as cropland in BLUE. Bioenergy crops, which are mostly present in future scenarios of the LUH2 dataset, are not considered separately to crops. Since we focus on differences across simulations with the same assumptions and do not include crop harvest, we expect the impact of neglecting differences between regular crops and bioenergy crops to be small.

620
Transitions between land-use types are aggregated in the same way as the land-use types themselves. In addition, wood harvest is used by means of harvested area (as opposed to the alternatively available harvested biomass). Transitions from primary to secondary land which are not associated with wood harvest are still accounted as part of wood harvest. Since areas and transitions from the LUH2 dataset refer to fractions of the total gridcell, we scale them down with a map of total vegetation cover in each gridcell (Pongratz et al., 2008).

625
Since harvest is provided in the LUH2 dataset based on the cover type (forest or non-forest), transitions are not used in BLUE when the cover type does not match. Fig. A5 shows the impact of these neglected transitions in terms of difference in primary land between the LUH2 input dataset and BLUE in 2014, i.e. at the end of the historical simulation period. Differences are reduced for later starting time (g) and then the spread between the three LULCC scenarios is also reduced (e.g. h and i).
Largest differences between BLUE and the LUH2 dataset occur in HI850 (b BLUE if the from-type is present. Due to this rule it is also not possible that the area fraction in a grid cell exceeds 100 % due to previously neglected transitions.

A2 Properties of the LULCC dataset
The properties of the LULCC LUH2 dataset (Hurtt et al., 2019a, b) are presented in Hurtt et al. (2020) and are briefly discussed here with modifications for the analysis with BLUE and to provide a basis for the following sensitivity analysis. Properties of 640 land-use areas and LULCC activities are first discussed in the baseline scenario (here called REG) and then differences in the high (HI) and low (LO) LULCC scenarios are compared.
The amount of secondary and agricultural land in 850 is small compared to primary vegetation (Fig. A1a,b, less than about 1000 and 200 Mio. ha, respectively, compared to more than 8000 Mio. ha of primary land). From around 1700 the area of agricultural land expands more rapidly and from around 1850 the same is true for secondary land ( Fig. A1c and d, respectively).

645
Abandonment and crop expansion (Fig. A2a) are of similar magnitude due to shifting cultivation dominating gross LULCC (not shown), especially until 1750. From 1300 onwards, and for most of the time series, these two LULCC activities affect roughly the same area as wood harvest, though wood harvest exhibits larger temporal variability. Pasture expansion and harvest on primary forested land are only relevant from around 1700 onwards and affect less area than the other LULCC activities.
The uncertainty of agricultural area is largest at the beginning of the timeseries (Fig. A1b) and decreases with time. In 850 650 the uncertainty around the baseline scenario is about 50 % for pasture and crop area, of which 1 % remain in 2014 (Fig. A1b).
The uncertainty in secondary land is about 50 % in 850 (Fig. A1a). This initial uncertainty of secondary land is due to division of rangelands into secondary land and pasture for BLUE and is accounted to rangelands in the LUH2 data. Thus, the same total uncertainty is present in the LUH2 dataset and the data prepared for BLUE. It is important to note that the historical scenarios (HI, REG and LO) neither start nor end with the same area distribution. The transitions (Fig. A2b-e) show largest 655 uncertainty in wood harvest, with a small contribution from wood harvest on primary land. Compared to total wood harvest the contribution of harvest changing cover type from primary to secondary land is relatively small. Although total harvest biomass is designed to be equal across scenarios after 1920 (Hurtt et al., 2020), this is not true for harvested area, since harvested area is derived such that the demanded harvested biomass can be fulfilled. Since the other LULCC activities influence the available biomass, more or less area might be required in order to fulfill the harvested biomass demand. Increased uncertainties in crop 660 and abandonment before 1850 are largely related to uncertainties about the magnitude of shifting cultivation and the extent of agricultural areas described in the HYDE dataset.
The baseline SSP5 scenario (SSP5-8.5) captures conditions of high levels of fossil fuel use, increasing global food demand and therefore increasing cropland area (about 20 % increase from 2010 to 2100, Fig. A3). At the same time, primary land area is reduced to about 74 % of its original extent and secondary land area is steadily increasing at a total of about 22 %.

665
The alternative scenario SSP5-3.4OS is an overshoot scenario which mainly differs from the baseline scenario after 2040. The cropland area is increased by 50 % from 2010 to 2100, mainly by cultivating cropland which was previously used as pasture.
The evolution of primary and secondary land areas is similar to the baseline.
The baseline SSP4 scenario (SSP4-6.0) represents an evolution of progress with high agricultural productivity and environmental policies (reduced deforestation, re-and afforestation, ...) in high-income countries and the opposite in low-income   28 Figure A4. Differences in global total agricultural area in BLUE, also including results from initial conditions (IC) and transitions (Trans) sensitivity experiments (see Table 2).
29 Figure  The origin of the crossing points can, for example, be seen in Fig. A1c. Initially, and until about 1800, more natural land is converted to agricultural land in the HI scenario, but then a reversal of this trend relative to LO occurs (net transitions not 685 shown). The temporal evolution of the cumulative net LULCC flux from IC and Trans experiments (not shown) confirms that crossing points originate from the variability of LULCC activities, because they only occur in Trans and not in IC experiments.
Furthermore, both harvest and pasture expansion exhibit larger cumulative net LULCC flux in LO than HI experiments ( Fig.   4b and d). Global crossing points in total net LULCC flux (Fig. 2), corresponding to larger cumulative net LULCC flux in LO than HI experiments, are thus likely due to pasture expansion and harvest.

B2 Common reference period of full simulation analysis
Many of the properties discussed for the components of uncertainty in Section 3.1.2 are true as well if the analysis is not restricted to the common time period 1850-2014 but evaluated over the full respective simulation times. Some noticeable differences are not shown here but briefly discussed.
The cumulative net LULCC flux in the LO scenario exceeds the values in the HI scenario in both setups but the relative 695 magnitude of the sensitivity (spread along the y-axis for points with same x-axis base) of the cumulative net LULCC flux to LULCC and starting year of a simulation depends on the period considered. Especially for StYr the range is larger, resulting in a larger sensitivity of StYr compared to the LULCC uncertainty. Larger fluxes occur in runs from year 850 if the total cumulative net LULCC flux is considered, simply because of the longer model simulation (see Fig. 2).
The artificial sensitivity experiments IC and Trans behave differently mainly in Trans, where for the period 1850-2014 the 700 sensitivity of the cumulative net LULCC flux decreases with later starting year (Fig. 3) while over the full respective time periods the sensitivity increases with later starting year (not shown). Neglecting harvest and its uncertainty results in considerably reduced sensitivity to total LULCC uncertainty for simulations started in 1700 and 1850 (not shown). Interestingly, the reduction in cumulative net LULCC flux is largest in HI850NoH if considering the whole simulation (not shown) but from 1850 ( Fig. 3), LO850NoH and REG850NoH show the largest reduction by omitting wood harvest.