Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system

. There is a signiﬁcant knowledge gap in the current state of the terrestrial carbon (C) budget. Recent studies have highlighted a poor understanding particularly of C pool transit times and of whether productivity or biomass dominate these biases. The Arctic, accounting for approximately 50 % of the global soil organic C stocks, has an important role in the global C cycle. Here, we use the CARbon DAta MOdel (CARDAMOM) data-assimilation system to produce pan-Arctic terrestrial C cycle analyses for 2000–2015. This approach avoids using traditional plant functional type or steady-state assumptions. We integrate a range of data (soil organic C, leaf area index, biomass, and climate) to determine the most likely state of the high-latitude C cycle at a 1 ◦ × 1 ◦ resolution and also to provide general guidance about the controlling biases in transit times. On average, CARDAMOM estimates regional mean rates of photosynthesis of 565 g C m − 2 yr − 1 (90 % conﬁdence interval between the 5th and 95th percentiles: 428, 741), autotrophic respiration of 270 g C m − 2 yr − 1 (182, 397) and heterotrophic respiration of 219 g C m − 2 yr − 1 (31, 1458), suggest-ing a pan-Arctic sink of − 67 ( − 287, 1160) g Cm − 2 yr − 1 , weaker in tundra and stronger in taiga. However, our conﬁdence intervals remain large (and so the region could be a source of C), reﬂecting uncertainty assigned to the regional data products. We show a clear spatial and temporal agreement between CARDAMOM analyses and different sources of assimilated and independent data at both pan-Arctic and local scales but also identify consistent biases between CARDAMOM and validation data. The assimilation process requires clearer error quantiﬁcation for leaf area index (LAI) and biomass products to resolve these biases. Mapping of vegetation C stocks and change over time and soil C ages linked to soil C stocks is required for better analytical constraint. Comparing CARDAMOM analyses to global vegetation models (GVMs) for the same period, we conclude that transit times of vegetation C are inconsistently simulated in GVMs due to a combination of uncertainties from productivity and biomass calculations. Our ﬁndings highlight that GVMs need to focus on constraining both current vegetation C stocks and net primary production to improve a process-based understanding of C cycle dynamics in the Arctic.


Introduction
Arctic ecosystems play a significant role in the global carbon (C) cycle (Hobbie et al., 2000;McGuire et al., 2012). Slow organic matter decomposition rates due to cold and poorly drained soils in combination with cryogenic soil processes have led to an accumulation of large stocks of C stored in the soils, much of which is currently held in permafrost (Tarnocai et al., 2009). The permafrost region soil organic C (SOC) stock is more than twice the size of the atmospheric C stock and accounts for approximately half of the global SOC stock (Hugelius et al., 2014;Jackson et al., 2017). Highlatitude ecosystems are experiencing a temperature increase that is nearly twice the global average (AMAP, 2017). The expected future increase in temperature (IPCC, 2013) and precipitation (Bintanja and Andry, 2017) will likely have consequences for the Arctic net C balance. As high latitudes warm, C cycle dynamics may lead to an increase in carbon dioxide (CO 2 ) emissions through ecosystem respiration (R eco ) driven by, for example, larger heterotrophic respiration (Commane et al., 2017;Schuur et al., 2015;Zona et al., 2016), drought stress on plant productivity (Goetz et al., 2005), and episodic disturbances Mack et al., 2011). Alternatively, temperature-induced vegetation changes Graven et al., 2013;Lucht et al., 2002) may increase gross primary productivity (GPP) due to extended growing seasons , CO 2 fertilisation , and shifts in vegetation cover such as greening (Myneni et al., 1997;Zhu et al., 2016) and shrub expansion (Myers-Smith et al., 2011). Consequently, ecosystem responses may feed back on climate with unclear magnitude and sign Peñuelas et al., 2009). As a result of the significant changes that are already affecting the structure and function of Arctic ecosystems, it is critical to understand and quantify the historical C dynamics of the terrestrial tundra and taiga and their sensitivity to climate (McGuire et al., 2012).
Although the land surface is estimated to offset ∼ 30 % of anthropogenic emissions of CO 2 (Canadell et al., 2007;Le Quéré et al., 2018), the terrestrial C cycle is currently the least constrained component of the global C budget and large uncertainties remain (Bloom et al., 2016). Despite the importance of Arctic tundra and taiga biomes in the global land C cycle, our understanding of interactions between the allocation of C from net primary productivity (NPP), C stocks (C stock ), and transit times (TTs) is deficient Hobbie et al., 2000). The TT is a concept that represents the time it takes for a particle of C to persist in a specific C stock, and it is defined by the C stock and its outgoing flux, here addressed as TT = C stock /NPP. According to a recent study by Sierra et al. (2017), TT is an important diagnostic metric of the C cycle and a concept that is independent of model-internal structure and theoretical assumptions for its calculation. Terms such as residence time (Bloom et al., 2016;, turnover time , and turnover rate (Thurner et al., 2016; TT = 1/turnover rate) are used in the literature to represent the concept of TT (Sierra et al., 2017). Studies have focused more on the spatial variability with climate of ecosystem productivity rather than C transit times Nishina et al., 2015;Thurner et al., 2016Thurner et al., , 2017.  detailed that transit time dominates uncertainty in terrestrial vegetation responses to future climate and atmospheric CO 2 . They found a 30 % larger variation in modelled vegetation C change than response of NPP. Nishina et al. (2015) also suggested that longterm C dynamics within ecosystems (vegetation turnover and soil decomposition) are more critical factors than photosynthetic processes (i.e. GPP or NPP). The respective contribution of bias from biomass and NPP to biases in transit times remains unquantified. Without an appropriate understanding of the current state and dynamics of the C cycle, its feedbacks to climate change remains highly uncertain (Hobbie et al., 2000;. There are currently efforts to incorporate both in situ and satellite-based datasets to assess C cycle retrievals and to reduce their uncertainties. At a local scale, the net ecosystem exchange (NEE) of CO 2 between the land surface and the atmosphere is usually measured using eddy covariance (EC) techniques (Baldocchi, 2003). International efforts have led to the creation of global networks such as FLUXNET (http://fluxnet.fluxdata.org/, last access: 9 April 2019) and ICOS (https://www.icos-ri.eu/, last access: 9 April 2019), to harmonise data and support the reduction of uncertainties around the C cycle and its driving mechanisms. However, upscaling field observations to estimate the regional to global C budget presents important challenges due to insufficient spatial coverage of measurements and heterogeneous landscape mosaics (McGuire et al., 2012). Furthermore, harsh environmental conditions in high-latitude ecosystems and their remoteness complicates the collection of high-quality data . Given the lack of continuous, spatially distributed in situ observations of NEE in the Arctic, it remains a challenging task to calculate with certainty whether or not the Arctic is a net C sink or a net C source and how the net C balance will evolve in the future (Fisher et al., 2014). Over the past decade, regional to global products generated from in situ networks and/or satellite observations have improved our understanding of the terrestrial C dynamics. These range from machine-learning-based upscaling of FLUXNET data (Jung et al., 2017), remotely sensed biomass products Thurner et al., 2014), and the creation of a global soil database (FAO/IIASA/ISRIC/ISSCAS/JRC, 2012). Due to a reliance on interpolation and upscaling with other spatial data, it is challenging to evaluate these products for inherent biases.
Global vegetation models (GVMs) have been developed to determine global terrestrial C cycling, through representing vegetation and soil processes, including vegetation dynamics (i.e. growth, competition, and turnover) and bio-geochemical (i.e. water, carbon, and nutrients cycling) responses to climate variability (Koven et al., 2011;Sitch et al., 2003;Woodward et al., 1995). The advantage of using process-based models to characterise C dynamics is that processes which drive ecosystem-atmosphere interactions can be simulated and reconstructed when data are scarce. However, C cycle modelling in GVMs typically relies on parameters retrieved from literature, prescribed plant functional type (PFT), and a spin-up process ensuring C stocks (biomass and SOC) reach steady state. Further, inherent differences of model structure contribute more significantly to GVM uncertainties Nishina et al., 2014) than do differences in climate projections (Ahlström et al., 2012). Many model inter-comparison projects have demonstrated a lack of coherence in future projections of terrestrial C cycling (Ahlström et al., 2012;Friedlingstein et al., 2014). Recent studies have used simulations from the first phase of the Inter-Sectoral Impact Model Inter-comparison Project (ISI-MIP)  to evaluate the importance of key elements regulating vegetation C dynamics but also the estimated magnitude of their associated uncertainties Nishina et al., 2015;Thurner et al., 2017). An important insight is that TTs in GVMs are a key uncertain feature of the global C cycle simulation. Further, GVMs tend not to report uncertainties in their estimates of stocks and fluxes, which weakens their analytical value.
To address these issues we integrate model and data more formally. We apply data assimilation (DA), defined as a Bayesian calibration process for a model of a dynamic system. DA, through probabilistic parameterisation, supports robust model estimates of C stocks and fluxes consistent with multiple observations and their errors (Fox et al., 2009;Luo et al., 2009;Williams et al., 2005). By following Bayesian methods, the uncertainty in observations weights the degree of data constraint, and the outcome is a set of acceptable parameterisations for a given model structure linked to likelihoods. Overall, this approach determines whether model structure, observations, and forcing are (in)consistent and thus assesses the validity of model structure. By assimilating co-located climatic, ecological, and biogeochemical data from remote-sensing observations at a specific grid scale across landscapes and regions, DA can map parameter estimation and uncertainties.
Here, we use the CARbon DAta MOdel framework (CAR-DAMOM) (Bloom et al., 2016;Bloom and Williams, 2015;Smallman et al., 2017) to analyse the pan-Arctic terrestrial carbon cycle at 1 • resolution for the 2000-2015 period. We assimilate gridded observations of leaf area index (LAI), biomass, and SOC stocks at these spatio-temporal scales into an intermediate-complexity C model (DALEC2, which is less complex than GVMs). We compare analyses of C dynamics of Arctic tundra and taiga with (a) global products of GPP (Jung et al., 2017) and heterotrophic respiration (R h ) (Hashimoto et al., 2015); (b) NEE, GPP, and R eco field observations from eight high-latitude sites included in the FLUXNET2015 dataset; and (c) six GVMs from the ISI-MIP2a comparison project (Akihiko et al., 2017). Our objectives are to (1) present and evaluate the analyses and uncertainties of the current state of the pan-Arctic terrestrial C cycling using a DA system, (2) quantify the degree of agreement between the CARDAMOM product with localto global-scale sources of available data to assess analytical bias, and (3) use CARDAMOM as a benchmarking tool for the ISI-MIP2a models to provide general guidance towards GVM improvements in transit time simulation. Finally, we suggest future work to be done in the context of advancing pan-Arctic C cycle modelling.

Pan-Arctic region
The spatial domain we considered in this study (Fig. S1 in the Supplement) corresponds to the extent of the Northern Circumpolar Soil Carbon Database version 2 (NCSCDv2) dataset (Hugelius et al., 2013a, b), bounded by 42-80 • N and 180 • W-180 • E and at a spatial resolution of 1 • × 1 • . This area of study totals 18 000 000 km 2 of land area. We used the GlobCover vegetation map product developed by the European Space Agency (Bontemps et al., 2011) to separate regions dominated by non-forested (tundra) and forested (taiga) land cover types. A complete description of the classes included in each domain can be found in Fig. S1 and caption. The differentiation between tundra and taiga grid cells is in agreement with the tree line delimitated by Brown et al. (1997) together with the tundra domain defined from the Regional Carbon Cycle Assessment and Processes Activity reported by McGuire et al. (2012). The extensive grasslands without presence of trees in some areas such as in south Russia, Mongolia, and Kazakhstan were neglected to focus on higher latitudes. This classification of tundra and taiga totals 8 100 000 and 9 900 000 km 2 of land area, respectively.

The CARbon DAta MOdel framework
Here we use CARDAMOM (Bloom et al., 2016) (list of acronyms can be found in Table S1 in the Supplement) to retrieve terrestrial C cycle dynamics, including explicit confidence intervals, in the pan-Arctic region. CAR-DAMOM consist of two key components: (1) an ecosystem model, the Data Assimilation Linked Ecosystem Carbon version 2 (DALEC2) (Bloom and Williams, 2015;Williams et al., 2005), constrained by observations, and (2) a dataassimilation system (Bloom et al., 2016). This framework reconciles observational datasets as part of a representation of the terrestrial C cycle in agreement with ecological theory.

236
E. López-Blanco et al.: Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system

DALEC2
The DALEC2 ecosystem model simulates monthly landatmosphere C fluxes and the evolution of six C stocks (foliage, labile, wood, roots, soil organic matter -SOM -and surface litter) and corresponding fluxes. DALEC2 includes 17 parameters controlling the processes of plant phenology, photosynthesis (GPP), allocation of primary production to respiration and vegetation carbon stocks and plant and organic matter turnover rates, all established within specific prior ranges based on ecologically viable limits (Table S2; most priors are uniform with broad ranges). DALEC2 simulates canopy-level GPP via the Aggregated Canopy Model (ACM; Williams et al., 1997), and the most sensitive ACM parameter, related to canopy photosynthetic efficiency, is included in the CARDAMOM calibration. DALEC allocates net primary production to the four plant stocks (foliage, labile, wood, and roots) and autotrophic respiration (R a ) as time-invariant fractions of GPP. Plant C decays into litter and soil stocks where microbial decomposition generates heterotrophic respiration (R h ). Turnover of litter and soil stocks is simulated using temperature-dependent first-order kinetics. For practical purposes we aggregated the different C stocks into photosynthetic (C photo ; leaf and labile), vegetation (C veg ; leaf, labile, wood, and roots), soil (C dom ; litter and SOM), and total (C tot = C photo + C veg + C dom ) C stocks. The NEE is calculated as the difference between GPP and the sum of the respiration fluxes (R eco = R a + R h ), while NPP is the difference between GPP and R a . Only NEE follows the standard micrometeorological sign convection presenting the uptake of C as negative (sink) and the release of C as positive (source); both GPP and R eco are reported as positive fluxes. In this study, we addressed C turnover rates and decomposition processes as their inverse rates; this is the C transit time (TT photo , TT veg and TT dom ), represented as the ratio between the mean C stock and the mean C input into that stock during the simulation period.

Data-assimilation system
The intermediate complexity of the DALEC2 model compared to typical GVMs facilitates computationally intense Monte Carlo (MC) data-assimilation to optimise the initial stock conditions and the 17 process parameters that shape C dynamics. CARDAMOM is forced with climate data from the European Centre for Medium-Range Weather Forecast Reanalysis interim (ERA-interim) dataset (Dee et al., 2011) monthly for the 2000-2015 period. A Bayesian Metropolis-Hastings Markov chain MC (MHMCMC) algorithm is used to retrieve the posterior distributions of the process parameters according to observational constraints and ecological and dynamic constraints (EDCs; Bloom and Williams, 2015). EDCs ensure that DALEC2 simulations of the terrestrial carbon cycle are realistic and ecologically viable and help to reduce the uncertainty in the model parameters by rejecting estimations that do not satisfy different conditions applied to C allocation and turnover rates as well as trajectories of C stocks.
Observational constraints include monthly time series of LAI from the MOD15A2 product , estimates of vegetation biomass , and soil organic carbon content (Hugelius et al., 2013a, b) (Table S3). We aggregated ∼ 130 000 1 km resolution MODIS LAI data monthly within each 1 • × 1 • pixel. Biomass based on remotely sensed forest biomass  and upscaled GPP (Jung et al., 2011) covering the pan-Arctic domain was aggregated to 1 • resolution . We used the NCSCD spatially explicit product (Hugelius et al., 2013a, b), which was generated from 1778 soil sample locations interpolated to a 1 • grid.
We apply the set-up described above to 3304 1 • × 1 • pixels (1686 in tundra; 1618 in taiga) using a monthly time step. Each pixel is treated independently without assuming a prior land cover or plant functional type, and we assume no spatial correlation between uncertainties in all pixels. In each 1 • × 1 • pixel, we applied the MHMCMC algorithm to determine the probability distribution of the optimal parameter set and initial conditions (x i ; Table S2) given observational constraints (O i ; LAI, SOC, and biomass; Table S3) using the same Bayesian inference approach described in Bloom et al. (2016): (1) First, in the expression 1, p(x i ) represents the prior probability distribution of each DALEC2 parameter (x i ) and is expressed as where p EDC (x i ) is the prior parameter probability according to the EDCs included in Table S2 and described in Bloom and Williams (2015). In addition, prior values for two parameters and their uncertainties (canopy efficiency, C eff , and fraction of GPP respired, f auto ) are imposed with a log-normal distribution following Bloom et al. (2016)   . The reported uncertainty in biomass data from Thurner et al. (2014) was ±37 % at pixel scale. Because of undetermined errors related to tree cover thresholds used in the upscaling and to reflect unknown model structural error, we slightly inflate the error estimate and use a log-transform (1.5) of ×/÷1.5 (i.e. ×/÷1.5 spans 67 % of the expected error). We use the same proportional error for SOC. For MODIS LAI we inflate the proportional error further to log(2) based on well-reported biases in this product for evergreen forests (De Kauwe et al., 2011) and the estimated measurement and aggregation uncertainty for boreal forest LAI of 1 m 2 m −2 reported by Goulden et al. (2011). The uncertainty assumptions in expression 3 are chosen due to a lack of better knowledge about the combined uncertainties arising from model representation errors and observation errors: For each 1 • × 1 • pixel we run three MHMCMC chains with 10 7 accepted simulations each until convergence of at least two chains. We use 500 parameter sets sampled from the second half of each chain to describe the posterior distribution of parameter sets. We produce confidence intervals of terrestrial C fluxes and stocks from the selected parameter sets. In the following we report the highest confidence results (median; P 50 ) and the uncertainty represented by the 90 % confidence interval (5th percentile to 95th percentile -P 05 , P 95 ). We calculate the transit time for C pools using the approach for non-steady-state pools described in Bloom et al. (2016), Sect. S3.

Model evaluation against independent in situ and pan-Arctic datasets
At the pan-Arctic scale, we compared CARDAMOM GPP with the FLUXCOM dataset from Jung et al. (2017). We also compared our CARDAMOM R h with the global spatiotemporal distribution of soil respiration from Hashimoto et al. (2015) calculated by a climate-driven empirical model. To assess the degree of statistical agreement we calculated linear goodness of fit (slope, intercept, R 2 ) between CARDAMOM and the two independent datasets and determined RMSE and bias from direct comparison on model-data residuals. The mapping includes stipples representing locations where the independent datasets are within CARDAMOM's 90 % confidence interval. At a local scale, we compare CARDAMOM NEE and its partitioned components GPP and R eco estimates with monthly aggregated values from the FLUXNET2015 sites. We selected eight sites (Belelli Marchesini et al., 2007;Bond-Lamberty et al., 2004;Goulden et al., 1996;Ikawa et al., 2015;Kutzbach et al., 2007;Lund et al., 2012;Sari et al., 2017) located across sub-and high-Arctic latitudes, covering locations with different climatic conditions and dominating ecotypes (Table S4). For this evaluation, we compared the same years for both observations and CARDAMOM, and we selected data using the daytime method  due to the absence of a true night-time period during Arctic summers in some locations. Additionally, we selected a variable u * threshold to identify insufficient turbulence wind conditions from year to year similar to . In this datamodel comparison we included the median (P 50 ) ± the 90 % confidence interval (5th to 95th percentiles, (P 05 , P 95 )) including both random and u * filtering uncertainty following the method described in Papale et al. (2006). Some of the sites lack wintertime measurements, and we filtered out data for months with less than 10 % observations. We performed a point-to-grid-cell comparison to assess the degree of agreement between each flux magnitude and seasonality calculating the statistics of linear fit (slope, intercept, R 2 ) per flux and site between CARDAMOM and FLUXNET2015 datasets and determined RMSE and bias from model-data residuals comparison.

Benchmark of global vegetation models from ISI-MIP2a
We compared CARDAMOM analyses of pan-Arctic NPP, vegetation biomass carbon stocks (C veg ) and vegetation transit times (TT veg ) with six participating GVMs in the ISI-MIP2a comparison project (Akihiko et al., 2017). In this study we have considered DLEM (Tian et al., 2015), LPJmL (Schaphoff et al., 2013;Sitch et al., 2003), LPJ-GUESS (Smith et al., 2014), ORCHIDEE (Guimberteau et al., 2018), VEGAS (Zeng et al., 2005), and VISIT (Ito and Inatomi, 2012). The specific properties and degree of complexity of each ISI-MIP2a model are summarised in Table S5. The comparisons have been performed under the same spatial resolution as the CARDAMOM spatial resolution (1 • × 1 • ) for the 2000-2010 period. Also, the chosen GVMs from the ISI-MIP2a phase have their forcing based on ERA-Interim climate data, similar to the forcing used in CARDAMOM. We estimated the degree of agreement using the statistics of linear fit (slope, intercept, R 2 , RMSE, and bias) per variable and model between CARDAMOM and GVMs but also their spatial variability including stipples where the GVM datasets are within CARDAMOM's 90 % confidence interval.
To understand the sources of errors in TT veg calculations, we used CARDAMOM to calculate two hypothetical TT veg (i.e. EXPERIMENT A TT veg = ISI-MIP2a C veg / CARDAMOM NPP and EXPERIMENT B TT veg = CARDAMOM C veg / ISI-MIP2a NPP) and then assessed the largest difference with CARDAMOM's CON-TROL TT veg . We estimated the hypothetical TT veg for each pixel in each model and derived a pixel-wise measure of the contribution of biases in NPP and C veg to biases in TT veg by overlapping their distribution functions.

238
E. López-Blanco et al.: Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system 3 Results
The total size of the pan-Arctic soil C stock (C dom ) averaged 24.4 (10.3, 47.5) kg C m −2 , 16-fold greater than the vegetation C stock (C veg ), 1.5 (0.5, 5.8) kg C m −2 . The soil C stock (fresh litter and SOM) is dominated by C som , accounting for the 99 % which also dominates the total terrestrial C stock in the pan-Arctic. Among the living C stocks, 93 % of C (88 % in tundra and 90 % in taiga) is allocated to the structural stocks (wood and roots; 1.4 (0.4, 5.6) kg C m −2 ) compared to 7 % (12 % in tundra and 10 % in taiga) to the photosynthetic stock (leaves and labile; 0.1 (0.1, 0.2) kg C m −2 ). On average, the total ecosystem C stock is 26.3 (11.8, 51.0) kg C m −2 in the pan-Arctic region, with slightly lower stocks in tundra (24.6 (10.8, 50.6) kg C m −2 ) than taiga (27.7 (12.7, 51.2) kg C m −2 ). In general, the taiga region holds on average ∼ 100 % more photosynthetic tissues, ∼ 160 % more structural tissue, and ∼ 9 % more soil C stocks than tundra. In other words, taiga holds ∼ 12 % more total C than tundra. The greater living stock of C in taiga (2.1 (0.8, 5.1) kg C m −2 ) than tundra (0.8 (0.3, 6.8) kg C m −2 ) means that the relative size of R a and R h in the two regions differs. Thus in tundra R a accounts for 51 % of total ecosystem respiration, while in taiga this fraction is 57 %. R a is 4 % larger than R h in tundra but 24 % greater in taiga, reflecting the greater rates of C cycling in taiga. Uncertainties in estimates of soil C stock are notably higher than for living C stocks, highlighting the lack of Table 1. Multi-year (2000Multi-year ( -2015 annual average of main ecosystem C fluxes (NEE, GPP, NPP, R eco , R a , R h ; g C m −2 yr −1 ), C stocks (C photo , C veg , C dom , C tot ; kg C m −2 ) and transit times (TT photo , TT veg , TT dom , TT tot ; years) for the pan-Arctic, tundra (non-forested), and taiga (forested) domain. The averages contain the median in bold (50th percentile) and the uncertainty estimations across the 90 % confidence range between the 5th and 95th percentiles assuming no spatial correlation between uncertainties in all pixels. We assume spatial correlation between pixels: P 50 , P 05 , and P 95 represent the area-weighted aggregate of all pixels' median, P 05 , and P 95 .  confidence interval. C processes represented include flows for C fluxes in white (NEE, net ecosystem exchange; GPP, gross primary production; NPP, net primary production; R eco , ecosystem respiration; R a , autotrophic respiration; R h , heterotrophic respiration), C allocation in dark green (to labile, leaf, stem and root), and C turnover in cyan (from leaf, wood, roots, and litter). C stocks are represented in dark blue boxes (labile, leaf, stem, root, litter, and SOM, soil organic matter) and aggregated into photosynthetic (C photo = leaf + labile), vegetation (C veg = leaf + labile + wood + roots), soil (C dom = litter + SOM), and total (C tot = C photo + C veg + C dom ) C stocks in red boxes. Analogy, transit times (TT) are also aggregated into photosynthetic (TT photo = leaf + labile), vegetation (TT veg = leaf + labile + wood + roots), soil (TT dom = litter + SOM), and total (TT tot = TT photo + TT veg + TT dom ) C transit times. observational and mechanistic constraint on heterotrophic respiration.
The global mean C transit time is 1.3 (0.8, 2.1) years in leaves and labile plant tissue (TT photo ), 4.5 (1.7, 15.7) years in stems and roots (TT veg ), and 120.5 (9.8, 822.6) years in litter and SOM (TT dom ). The total C transit time (TT tot ) (133.1 (11.5, 1013.6) years) is clearly dominated by the soil C stock, highlighting the very long periods of times that C persists in Arctic soils. CARDAMOM calculated 62 % longer TT dom in tundra compared to taiga, likely linked to lower temperatures, but uncertainties are large due to the limitations of data constraints.

Data assimilation and uncertainty reduction
The CARDAMOM framework generated an analysis broadly consistent with the combination of SOC, biomass, and LAI in each grid cell (Fig. 2) and the errors assigned to these data products. The agreement for the SOC dataset by Hugelius et al. (2013a) is a 1 : 1 relationship (R 2 = 1.0; RMSE = 0.95 kg C m −2 ), reflecting a straightforward model parameterisation. The biomass product from Carvalhais et al. (2014) was well correlated (R 2 = 0.97; RMSE = 0.46 kg C m −2 ), but CARDAMOM was consistently biased ∼ 28 % low. MODIS LAI data were also well correlated (R 2 = 0.79; RMSE = 0.42 kg C m −2 ) but ∼ 28 % higher than CARDAMOM analyses. These biases (Fig. 2) likely arise due to a low estimate in the photosynthesis model (ACM) used in CARDAMOM which propagates through the C cycle. CARDAMOM balances uncertainty in data products and the models (ACM photosynthesis model and DALEC2) to generate a weighted analysis typical of Bayesian approaches. The CARDAMOM analysis 90 % con-240 E. López-Blanco et al.: Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system  , and leaf area index (LAI; Myneni et al., 2002) datasets used in the data-assimilation process within the CARDAMOM framework (a-c), assimilated SOC, biomass, and LAI integrated into CARDAMOM (d-f) and their respective goodness-of-fit statistics between original and assimilated datasets (g-i). The error bars represent the 90 % confidence interval of the assimilated variable in CARDAMOM. fidence interval (CI) includes the 1 : 1 line for biomass and LAI (Fig. 2), indicating that the likelihoods on C cycle analyses include the expected value of the observations. The degree to which posterior distributions were constrained from the prior distributions in each of the 17 model parameters and 6 initial stock sizes (Table S2) varied considerably depending on the parameters in question and their related processes (Table 2 and Fig. S2). The 90 % CI posterior range of foliar, wood, labile, and SOM C stocks (C foliar , C wood , C labile , and C som ) as well as parameters such as allocation to foliage (f fol ) and lifespan (L) were considerably reduced (> 80 % uncertainty reduction compared to priors) most likely controlled by the information on LAI, biomass, and SOC constraints. Contrarily, parameters that have not been regulated in any way in the MHMCMC algorithm, i.e. turnover processes such as litter mineralisation (MR litter ), root turnover (TOR roots ), wood turnover (TOR wood ), decomposition rates (D rate ), and initial C stock such as litter (C litter ) were found to be poorly constrained (< 20 % uncertainty reduction). Overall, the uncertainty reduction classified by processes and ranked from most to least constrained estimated a 71 % reduction for C stocks, a 67 % reduction for C allocation, 59 % for plant phenology, and 31 % for C turnover related parameters. Although there are no substantial differ- ences between tundra and taiga, C roots was better constrained in tundra regions (42 %), while leaf onset day (B day ), leaf fall day (F day ), and leaf fall duration (L f ) were better constrained in taiga regions (> 18 % or more).

Independent evaluation: from global to local scale
We compared our estimates of GPP and R h with independent datasets to evaluate the model performance (Fig. 3). We found GPP to be well correlated (R 2 = 0.81; RMSE = 0.43 kg C m −2 ) but biased lower (∼ 53 %) compared to Jung et al.'s (2017) GPP estimates. There are in general very few pixels where the FLUXCOM product falls within CARDAMOM's 90 % confidence interval. Additionally, the R h product from Hashimoto et al. (2015) is less consistent with our estimates (R 2 = 0.40; RMSE = 0.09 kg C m −2 ), presenting a tendency towards lower values in tundra pixels and higher values in taiga pixels. The spatial variability of R h is considerably smaller in Hashimoto et al. (2015) compared to our CARDAMOM estimates. R h falls within the 90 % confidence interval of CAR-DAMOM in most of the pan-Arctic region due to the fact that the R h uncertainties are significant (Fig. 3). This finding confirms the uncertainties previously noted in modelled respiratory processes (Table 1) where the upper P 95 in R h dominated NEE's uncertainties but also the soil C stocks and transit times.
For comparison with direct ground observations from the FLUXNET2015 dataset, we report here monthly aggregated P 50 ±P 05−95 estimates of NEE, GPP, and R eco to show timing and magnitudes but also to diagnose whether CARDAMOM is in general agreement with flux tower data. Overall, CAR-DAMOM performed well in simulating observed NEE (R 2 = 0.66; RMSE = 0.51 g C m −2 per month; bias = 0.16 g C m −2 per month), GPP (R 2 = 0.85; RMSE = 0.89 g C m −2 per month; bias = 0.5 g C m −2 per month) and R eco (R 2 = 0.82; RMSE = 0.63 g C m −2 per month; bias = 0.35 g C m −2 per month) across eight sub-Arctic and high-Arctic sites from the FLUXNET2015 dataset ( Fig. 4; Table S6). CARDAMOM NEE is ∼ 25 % lower than FLUXNET2015, while GPP and R eco are ∼30 % and ∼ 10 % higher, respectively. This mismatch is important in the context of the FLUXCOM GPP upscaling, 50 % higher than CARDAMOM GPP. At some sites such as Hakasia, Samoylov, Poker Flat, and Manitoba (NEE R 2 = 0.73; GPP R 2 = 0.92; R eco R 2 = 0.88), CAR-DAMOM better matches the seasonality and the magnitude of the C fluxes than the rest, i.e. Tiksi, Kobbefjord, Zackenberg, and UCI-1998 (NEE R 2 = 0.58; GPP R 2 = 0.67; R eco Table 2. Parameter uncertainty reduction in percentage ranked from least (red) to most (blue) constrained in the pan-Arctic, tundra, and taiga domains. The reduction percentage is calculated based on the difference between the 90 % CI prior range and the 90 % CI posterior range.  (Table S4). Uncertainties represent the 25th and 50th percentiles (darker shade) and the 5th and 95th percentiles (lighter shade) of both field observations and the CARDAMOM framework.
R 2 = 0.67). In general, CARDAMOM captured the beginning and the end of the growing season well (Fig. 4), although the assimilation system has some bias due to (1) a difference in timing (e.g. earlier shifts of peak of the growing season in Manitoba GPP and R eco and earlier end of the growing season in Poker Flat NEE) and (2) differences in flux magnitudes (such as in Hakasia GPP and R eco and Kobbefjord NEE).

Benchmarking ISI-MIP2a models with CARDAMOM
We used our highest confidence retrievals of NPP, C veg , and TT veg (i.e. retrievals including assimilated LAI, biomass, and SOC) to benchmark the performance of the GVMs from the ISI-MIP2a project. In this assessment we compared not only their spatial variability across the pan-Arctic, tundra, and taiga region (Fig. 5) but also the degree of agreement between their mean model ensemble within the 90 % confidence interval of our assimilation framework (Fig. 6, Table 3). NPP estimates (RMSE = 0.1 kg C m −2 yr −1 ; R 2 = 0.44) are in better agreement than C veg (RMSE = 1.8 kg C m −2 ; R 2 = 0.22) and TT veg (RMSE = 4.1 years; R 2 = 0.12). The assessed GVMs estimated on average 8 % lower NPP, 16 % higher C veg , and 22 % longer TT veg than CARDAMOM across the entire pan-Arctic domain (Figs. 5 and 6) on average. Thus, at regional aggregation CARDAMOM analyses agreed more closely with ISI-MIP2a models than with FLUXCOM (51 % difference) and with the Carvalhais et al. (2014) biomass data (28 % bias).
The poor spatial agreement regarding TT veg between CARDAMOM and ISI-MIP2a (Table 3) is indicative of uncertainties in the internal C dynamics of these models. For instance, the slopes in Table 3 are steep and the R 2 are poor -so there is substantial disagreement in the spatial pattern, not just a large bias. For ISI-MIP2a comparison R 2 values ranged from 0.03 to 0.52 for NPP, from 0.00 to 0.31 for C veg , and from 0.00 to 0.24 for TT veg . Spatially, the stippling in Fig. 6 indicates areas where the GVMs are within the 90 % CI of CARDAMOM; agreement is best over the taiga domain rather than in tundra for TT veg . The benchmark area of consistency (stippling) is more extensive for C veg and TT veg than for NPP. Thus, while there is a stronger spatial correlation for NPP between CARDAMOM and GVMs (Table 3), this is a clearer bias for NPP. Some models (LPJ-GUESS and OR-CHIDEE) systematically calculate lower values in all the assessed variables, while others (LPJmL and VISIT) calculate higher estimates. The models in closer agreement with CAR-DAMOM were DLEM (5 % difference) and LPJ-GUESS (17 %), while VEGAS (44 %) and ORCHIDEE (56 %) were the models with larger discrepancies (Table 3; Figs. 5 and 6).
The attribution analysis to identify the origin of bias from ISI-MIP2a models indicated a joint split between NPP and C veg for TT veg error simulated in GVMs (Fig. 7). The distribution of the differences relative to CARDAMOM revealed that the higher error (i.e. the lower overlapped area, and by extension the largest contributor to TT veg biases) comes from ISI-MIP2a NPP with a 69 % agreement in the distribution, while C veg agrees to 72 %. In fact, the TT veg R 2 for each model (Table 3) is very close to the product of the NPP R 2 and C veg R 2 for that model, i.e. the uncertainty in the TT veg is a direct interaction of NPP and C veg uncertainty (R 2 of the correlation: 0.71). This finding supports Fig. 6, which shows TT veg error derives equally from both NPP and C veg .

Pan-Arctic retrievals of C cycle
The CARDAMOM framework has been used to evaluate the terrestrial pan-Arctic C cycle in tundra and taiga at coarse spatio-temporal scale (at monthly and annual time steps for the 2000-2015 period and at 1 • × 1 • grid cells). Overall, we found that the pan-Arctic region was most likely a consistent sink of C (weaker in tundra and stronger in taiga), although the large uncertainties derived from respiratory processes (Table 1) strongly increase the 90 % confidence interval uncertainty. We estimate that tundra experienced 62 % longer transit times in litter and SOM C stocks than taiga ecosystems. Further, the contribution of R a and R h to total ecosystem respiration was similar in tundra (51 %, 49 %) but dominated by R a in taiga (57 % compared to 43 %).
CARDAMOM retrievals are consistent with outcomes from relevant papers such as the (i) C flux observations and model estimates reported in McGuire et al. (2012), (ii) C stocks and transit times described by Carvalhais et al. (2014), and (iii) NPP, C stocks, and turnover rates stated in Thurner et al. (2017).
i. The CARDAMOM NEE estimates reported in this study for the tundra domain are inside the variability comparison of values compiled by McGuire et al. (2012) considering field observation, regionalprocess-based models, global-process-based models, and inversion models. The authors reported that Arctic tundra was a sink of CO 2 of −150 Tg C yr −1 (SD = 45.9) across the 2000-2006 period over an area of 9.16 × 10 6 m 2 . Here, CARDAMOM NEE estimated −129 Tg C yr −1 over an area of 8.1 × 10 6 km 2 for the same period. This exhaustive assessment of the C balance in Arctic tundra included approximately 250 estimates using the chamber and eddy covariance method from 120 published papers (McGuire et al., 2012; Supplement 1) with an area-weighted mean of means of −202 Tg C yr −1 . The regional models, including runs from LPJ-WHyMe (Wania et al., 2009a, b), ORCHIDEE (Koven et al., 2011), TEM6 (McGuire et al., 2010, and the TCF model (Kimball et al., 2009), reported an NEE of −187 Tg C yr −1 and GPP, NPP, R a , and R h of 350, 199, 151, and 182 g C m −2 yr −1 , respectively. GVMs applications such as CLM4C (Lawrence Table 3. Statistics of linear fit between the CARDAMOM framework (independent) and the ISI-MIP2a models (dependent) per individual model and per NPP (net primary production; kg C m −2 yr −1 ), C veg (vegetation C stock; kg C m −2 ), and TT veg (vegetation transit time; years). The units for RMSE and bias are kg C m −2 yr −1 in NPP, kg C m −2 yr −1 in C veg , and years in TT veg .
For the same period, CARDAMOM has estimated 330, 167, 160, and 154 g C m −2 yr −1 , respectively, for the same gross C fluxes.
ii. Carvalhais et al. (2014) estimated a total ecosystem carbon (C tot ) of 20.5 (8.0, 52.5) kg C m −2 for tundra and 24.8 (15.2, 58.0) kg C m −2 for taiga, while values from CARDAMOM were 24.6 (10.8, 50.6) kg C m −2 for tundra and 27.7 (12.7, 51.2) kg C m −2 in taiga ( Fig. 5 Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system  bias from ISI-MIP2a models. The CONTROL TT (grey) includes both biomass (C veg ) and net primary production (NPP) estimated by CARDAMOM. EXPERIMENT A TT (dark red) incorporates C veg from ISI-MIP2a and NPP from CARDAMOM, while EX-PERIMENT B TT (dark green) includes NPP from ISI-MIP2a and C veg from CARDAMOM. The lower the overlapped area between control and experimental TT, the larger the contribution for TT biases. For readability purposes, the scale on the x axis is limited to 20 years. tundra and 48.2 (111.6, 24.9) years in taiga in Carvalhais et al. (2014). These numbers have been retrieved from the same biome classification and they include the 90 % confidence interval of the assessed spatial variability. Also, we applied a correction factor of TT gpp = TT npp × (1 − fraction of GPP respired) to be comparable with Carvalhais et al. (2014) TT. Both datasets agree on the fact that at high (cold) latitudes, first tundra and secondly taiga have the longest transit times on the entire globe (Bloom et al., 2016;Carvalhais et al., 2014).
iii. A recent study from Thurner et al. (2017) assessed temperate and taiga-related TTs, presenting a 5-year average NPP dataset applying both MODIS (Running et al., 2004;Zhao et al., 2005) and BETHY/DLR  products and an innovative biomass product  accounting for both forest and non-forest vegetation. Our estimate of TT veg for the exact same period is 5.3 (1.9, 18.2) years, compared to Thurner et al.'s (2017) TT, 8.2 (5.5, 11.5) years using MODIS, and 6.5 (4.2, 8.7) years using BETHY/DLR. A note of caution here: the number reported by the authors are turnover rates, which are converted to transit times by just applying the inverse of turnover rates (TT veg = 1/turnover rates). Additionally, their NPP estimates, 0.35 and 0.45 kg C m −2 yr −1 from both MODIS and BETHY/DLR, are only 5 % more productive on average than the CARDAMOM NPP estimate (0.4 (0.3, 0.6) kg C m −2 yr −1 ). The biomass derived from Thurner et al. (2014) (3.0 ± 1.1 kg C m −2 ) is ∼ 30 % lower than CARDAMOM C veg (2.2 (1.1, 5.0) kg C m −2 ), calculated for the same period and for the same taiga domain.
In general, we found a reasonable agreement between CARDAMOM and assimilated and independent data at pan-Arctic scale. CARDAMOM retrievals of assimilated data are in good agreement with the SOC (Fig. 2). The simulation of TT dom is weakly constrained (Table 1) -our analysis adjusts TT to match mapped stocks, hence the strong match of modelled to mapped SOC. So, independent data on TT dom data (e.g. 14 C) are required across the pan-Arctic region to provide a stronger constraint on process parameters and reduce the very broad confidence intervals of CARDAMOM analyses. The low bias in mean estimates of LAI and biomass ( Fig. 2) likely relates to the strong prior on photosynthesis estimates from the ACM model, which lacks a temperature acclimation for high latitudes in this implementation. However, the uncertainty in the biomass and LAI analyses spans the magnitude of the bias. So, CARDAMOM generates some parameters sets that are consistent with observations. CAR-DAMOM produces analyses that reproduce the pattern of LAI, GPP, biomass, and SOC (Figs. 2 and 3) -this demonstrates that the DALEC model structure can be calibrated to simulate the links between these variables as a function of mass balance constraints and realistic process interactions and climate sensitivities.
There are clear biases in CARDAMOM analyses compared to independent global upscaled GPP (Jung et al., 2017) and R h products (Hashimoto et al., 2015) (Fig. 3). However, CARDAMOM resolves the spatial pattern in GPP effectively, while the spatial mismatch in R h estimates is marked (Fig. 3), echoing the large uncertainty found in NEE (Fig. 1, Table 1). One difference with Hashimoto et al.'s (2015) R h model is the lack of moisture limitation on respiration in CAR-DAMOM. Conversely, GPP is relatively well-constrained in space through the assimilation of LAI and a prior for productivity (Bloom et al., 2016), although an important mismatch has been found: CARDAMOM GPP is 50 % lower than FLUXCOM but 30 % higher than FLUXNET2015 EC data.
The agreement between CARDAMOM analyses and EC data is high given the scale difference. A direct point-togrid cell comparison with local observations derived from the FLUXNET2015 dataset (Fig. 4, Table S6) is challenging and always difficult. CARDAMOM outputs covers 1 • × 1 • grid cells, whereas local eddy covariance flux measurements are of the order of 1-10 ha. Thus, for observational sites located in areas with complex terrain, such as Kobbefjord in coastal Greenland, the agreement can be expected to be low. For inland forest sites, such as Poker Flat in Alaska, there may be less differences in vegetation characteristics and local cli-248 E. López-Blanco et al.: Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system matology between the local-scale measurement footprint and the corresponding CARDAMOM grid cell. This scaling issue is likely to have a larger impact on flux magnitudes compared with seasonal dynamics. In general, CARDAMOM captured the seasonal dynamics in NEE, GPP, and R eco well (Fig. 4, Table S6), although the monthly model time step does reduce skill in shoulder seasons. There was a consistent timing mismatch in early season flux increase, where CARDAMOM predicts earlier growing season onset compared with observations. This is likely due to the impact of snow cover, which is not explicitly included in the CARDAMOM framework.
For a further independent evaluation of CARDAMOM outputs, we compare the tundra and boreal estimates to plot scale flux and stock information. For tundra, Street et al. (2012) calculate growing season GPP estimates of 263-380 g C m −2 for Empetrum nigrum communities and 295-386 g C m −2 for Betula nana communities, which is consistent with the ranges in Fig. 1 for tundra. Biomass stocks for Arctic tundra recorded in the Arctic LTER (Long Term Ecological Research) at Toolik Lake range from 105 to 1160 g C m −2 (Hobbie and Kling, 2014), which are consistent with the estimates from CARDAMOM, albeit at the lower end of the model estimates. For boreal forests, Goulden et al. (2011) report annual GPP estimates across a chronosequence of stands, and thus a variation across canopy densities, which varied from 450 to 720 g C m −2 yr −1 . These data are consistent with the span of GPP in CARDAMOM (Fig. 1), again best matching the lower end of the model estimates. For the same study, the vegetation C stock estimates varied from 100 to 5000 g C m −2 , consistent with CARDAMOM and with measurements of 10 to 40-year old boreal stands best matching the CARDAMOM median estimate of ∼ 1500 g C m −2 . We conclude from comparisons with site data that CARDAMOM analyses are broadly consistent, with some tendency for CARDAMOM to have a high bias. This comparison is similar to the FLUXNET2015 evaluation of CARDAMOM. But it conflicts with the estimation of low bias from the comparison of CARDAMOM with FLUXCOM GPP and Carvalhais et al. (2014) biomass stock maps. It is possible that the scale differences between sitelevel products and landscape estimates is confusing these comparisons, but there is clearly a need to understand these inconsistencies in C cycle estimates better.

CARDAMOM as a model benchmarking tool
An ideal benchmarking tool for GVMs would compare model state variables and fluxes with multiple, independent, unbiased, error-characterised measurements collected repeatedly at the same temporal and spatial resolution. Of course direct measurements of key C cycle variables like these are not available. Even at FLUXNET sites GPP and R eco must be inferred, and NEE data are often gap-filled. Satellite data can provide continuous fields but do not directly measure ecological variables like biomass or LAI, so calibrated models are required to generate ecological products. Atmospheric conditions can introduce biases and data gaps into optical data that are poorly quantified. Upscaling of FLUXNET data requires other spatial data, e.g. MODIS LAI, which challenge the characterisation of error and generates complex hybrid products. We suggest that CARDAMOM provides some of the requirements of the ideal benchmark system -an error-characterised, complete analysis of the C cycle that is based on a range of observational products. CARDAMOM includes its own C cycle model; this has the advantage of evaluating the observational data for consistency (e.g. with mass balance), propagating error across the C cycle, and generating internal model variables such as TT. Further, the model is of intermediate complexity and independent of the benchmarked models.
Using CARDAMOM as a benchmarking tool for six GVMs we found disagreements that varied among models for spatial estimates of NPP, C veg , and TT veg across the pan-Arctic (Fig. 6) in comparison with CARDAMOM confidence intervals. GVM NPP estimates had a higher correlation than TT veg and C veg with CARDAMOM analyses (Table 3), but because CARDAMOM confidence intervals on NPP were relatively narrow (Fig. 1), the benchmarking scores from GVM NPP were relatively poor (Fig. 6). Consequently, we used CARDAMOM to calculate the relative contribution of productivity and biomass to the transit times bias by applying a simple attribution analysis (Fig. 7). We concluded that the largest bias to transit times originated not from a deficient understanding of one single component, but from an equal combination of both productivity and biomass errors together. Therefore, this study partially agrees with previous studies Nishina et al., 2014;Thurner et al., 2017) highlighting the deficient representation of transit times or turnover dynamics, but we further suggest that global vegetation and Earth system modellers need to focus on the productivity and vegetation C stocks dynamics to improve inner C dynamics. A major challenge for GVMs is the spin-up problem (Exbrayat et al., 2014). GVMs need to find a way to ensure that the spin-up process produces biomass estimates consistent with the growing availability of biomass maps from Earth observations. CARDAMOM solves this problem by avoiding spin-up. Its fast run time allows the biomass maps to act as a constraint on the probability distribution of model parameters. There may be opportunities to use CARDAMOM-style approaches to assist the GVM community address this problem.

Outlook
Although CARDAMOM estimates for pan-Arctic C cycling are in moderately good agreement with observations and data constraints, we have not included important components controlling ecosystem processes that could potentially improve our understanding on C feedbacks, with an emphasis on highlatitude ecosystems. For example, thaw and the release of permafrost C is not represented in CARDAMOM, but the influence on vegetation dynamics, permafrost degradation, and soil respiration is critical at high latitudes Parazoo et al., 2018). Also, Koven et al. (2017) have shown that soil thermal regimes are key to resolving the long-term vulnerability of soil C. Moreover, we have not characterised snow dynamics nor the insulating effect of snow affecting respiratory losses across wintertime periods (López-Blanco et al., 2018). Further, methane emissions, another important contributor to total C budget (Mastepanov et al., 2008;Zona et al., 2016), were neglected from this modelling exercise since it is not easy to model due to its three complex transport mechanisms (Walter et al., 2001).
However, our approach to use an intermediate-complexity model has the strong advantage of allowing very large (10 7 ) model ensembles per pixel and thus a thorough exploration of model-parameter interactions that is not feasible with typical GVMs. Other viable options include using emulators (Fer et al., 2018) and particle filters (Arulampalam et al., 2002), but MCMC methods provide the most detailed description of error distributions. There remains a strong argument to utilise intermediate-complexity models like DALEC2 to evaluate the minimum level of detail required to represent ecosystem processes consistent with local observations and to allow testing of alternate model structures. Assimilating further data products, for instance patterns in soil hydrology and snow states across the pan-Arctic from Earth observation, could provide useful information on spatio-temporal controls on soil activity and microbial metabolism to constrain below-ground processes. This information would need to be tied to process-level information on SOM turnover generated from experimental studies and included in updated versions of DALEC. Thus, more field observations are crucial across the pan-Arctic, specifically on the decomposition and TT of SOC (He et al., 2016) and respiratory processes such as the partitioning of R eco into R a and R h (Hobbie et al., 2000;McGuire et al., 2000) across the growing season and also during wintertime (Commane et al., 2017;Zona et al., 2016).
Our approach has used estimated observation error and inflated this to include unknown errors associated with model process representation. We currently lack any better knowledge of the combined uncertainties arising from model representation errors and observation errors. We acknowledge that all models are an imperfect representation of C dynamics, which generates irreconcilable model-data errors due to the inherent assumptions in model structure. Future analyses should investigate model structural error, using, for example, error-explicit Bayesian approaches (Xu et al., 2017), or comparing the likelihoods of alternate model structures of varying complexity. Using multiple sources of data, we have highlighted systematic errors in the model at a landscape scale (Figs. 2 and 3) for LAI, GPP, and biomass. However, these biases are not consistent for site-scale evaluations. Thus, a next step would be to include explicitly both random and systematic process errors for C fluxes in the data assimilation. These errors could be determined from field-scale evaluation of model process representation (Table 2) using, e.g., FLUXNET2015 data. We also need to understand better the error associated with the landscape heterogeneity of C stocks and fluxes to upscale from flux tower observations or direct measurements of LAI to landscape pixel. This could be achieved by constructing robust observation error models (Dietze, 2017) from field to pixel scale for, e.g., GPP, LAI, and foliar N. The evaluation of the sensitivity of C cycling DA analyses to observation error has shown relatively low sensitivity to data gaps and random error on net ecosystem flux data , but further analyses of error sensitivity are required for multiple streams of stock data.

Conclusions
The Arctic is experiencing rapid environmental changes, which will influence the global C cycle. Using a dataassimilation framework we have evaluated the current state of key C flux, stocks, and transit times for the pan-Arctic region for 2000-2015. We found that the pan-Arctic was a likely sink of C, weaker in tundra and stronger in taiga, but uncertainties around the respiration losses are still large, and so the region could be a source of C. Comparisons with global-and local-scale datasets demonstrate the capabilities of CARDAMOM for analysing the C cycle in the Arctic domain. CARDAMOM is a data-constrained and dataintegrated analysis, evaluated for internal consistency, and is therefore a good candidate to benchmark performance of global vegetation models. We conclude that a GVM bias found in transit time of vegetation C is the result of a joint combination of uncertainties from productivity processes and biomass in GVMs, and thus these are a major component of error in their forecasts. While spatial patterns in GVM predictions of NPP are reasonable, particularly in taiga, they have significant biases against the CARDAMOM benchmark. Improved mapping of vegetation and soil C stocks and change over time is required for better analytical constraint. Moreover, future work is required on assimilating data on soil hydrology, permafrost, and snow dynamics to improve accuracy and decrease uncertainties in belowground processes. This work establishes the baseline for further process-based ecological analyses using the CARDAMOM data-assimilation system as a technique to constrain the pan-Arctic C cycle.
Code and data availability. CARDAMOM output used in this study is available from  from the University of Edinburgh's DataShare service at https://doi.org/10.7488/ds/2334. The DALEC2 code is also available on Edinburgh DataShare at https://doi.org/10.7488/ds/2504 (Williams, 2019). Contact Mathew Williams for access to the CARDAMOM software.