23rd Century surprises: Long-term dynamics of the climate and carbon cycle under both high and net negative emissions scenarios

Future climate projections from Earth system models (ESMs) typically focus on the timescale of this century. We use a set of four ESMs and one Earth system model of intermediate complexity (EMIC) to explore the dynamics of the Earth’s climate and carbon cycles under contrasting emissions trajectories beyond this century, to the year 2300. The trajectories include a very high emissions, unmitigated fossil-fuel driven scenario, as well as a second mitigation scenario that diverges 25 from the first scenario after 2040 and features an ‘overshoot’, followed by stabilization of atmospheric CO2 concentrations by means of large net-negative CO2 emissions. In both scenarios, and for all models considered here, the terrestrial system switches from being a net sink to either a neutral state or a net source of carbon, though for different reasons and centered in different geographic regions, depending on both the model and the scenario. The ocean carbon system remains a sink, albeit weakened by climate-carbon feedbacks, in all models under the high emissions scenario, and switches from sink to source in 30 the overshoot scenario. The global mean temperature anomaly generally follows the trajectories of cumulative carbon emissions, except that 23rd-century warming continues after the cessation of carbon emissions in several models, both in the high emissions scenario and in one model in the overshoot scenario. While ocean carbon cycle responses qualitatively agree both in globally integrated and zonal-mean dynamics in both scenarios, the land models qualitatively disagree in zonal-mean dynamics, in the relative roles of vegetation and soil in driving C fluxes, in the response of the sink to CO2, and in the timing 35 of the sink-source transition, particularly in the high emissions scenario. The lack of agreement among land models on the https://doi.org/10.5194/esd-2021-23 Preprint. Discussion started: 22 April 2021 c © Author(s) 2021. CC BY 4.0 License.

mechanisms and geographic patterns of carbon cycle feedbacks, alongside the potential for lagged physical climate dynamics to cause warming long after CO2 concentrations have stabilized, point to the possibility of surprises in the climate system beyond the 21st century time horizon, even under relatively mitigated global warming scenarios, which should be taken into consideration when setting global climate policy. 40

Introduction
Climate change is characterized by long timescales, associated with the accumulation of carbon in the atmosphere and other reservoirs of the Earth system due to emissions of CO2 by anthropogenic activities, and the response of the climate system to the accumulated atmospheric CO2 burden. The long lifetime of CO2 in the atmosphere and the proportionality between global warming and long-term cumulative CO2 emissions are central features of the dynamics of the climate system (Matthews et al., 45 2009;Allen et al., 2009). These features underlie the widely used policy framework that proposes a 'budget' of remaining carbon emissions that would enable the climate system to remain below a given temperature . Future transient climate change scenarios using comprehensive Earth system models (ESMs) have typically focused on the timescale to the end of the 21st century, in order to inform near-term policy actions that may mitigate climate change. This end date of 2100 for these simulations has remained fixed, even though over 30 years have elapsed since the first IPCC assessment (IPCC, 50 1990).The start date of future scenarios has accordingly progressed from 1990 to 2015, shortening the length of these scenarios.
Longer-term dynamics have been explored mainly using Earth system models of intermediate complexity (EMICs) (Zickfeld et al., 2013) and climate system emulators (Meinshausen et al., 2011(Meinshausen et al., , 2020Nicholls et al., 2020b), which allow exploring such dynamics without the computational costs of resolving the full physical and biogeochemical dynamics of an ESM. EMICs (and even more so emulators) typically represent land and ocean biogeochemical processes relevant to the long-term carbon 55 cycle with less detail than comprehensive Earth system models, and so risk missing critical interactions and feedbacks. In contrast, ESMs have prioritized representing processes relevant on timescales to 2100, and may exclude or simplify processes important on longer timescales, such as permafrost carbon feedbacks on land, or sediment biogeochemistry in the ocean.
Initial studies using ESMs on this longer time horizon suggest that the proportionality of warming to carbon emissions that is 60 both historically observed and projected on shorter timescales also holds on multi-century timescales, in unmitigated high-end warming scenarios (Randerson et al., 2015;Tokarska et al., 2016). It is less clear whether these relationships hold under mitigated or overshoot scenarios, where net negative carbon emissions are assumed later in the scenario, but the expectation is that the proportional relationship holds for negative carbon emission as well (Zickfeld et al., 2016). Simple models show that the existence of a cumulative emissions to warming proportionality in such scenarios is sensitive to the response timescales 65 of physical and biogeochemical feedbacks in the Earth system (Sanderson, 2020). Existing experiments using ESMs and EMICs suggest that during a positive emissions phase, marine and terrestrial carbon cycles tend to absorb some fraction of added CO2. During a removal phase, however, they tend to release CO2 and thus partially offset the decline in atmospheric CO2. As a result, we expect that under a scenario with positive emissions followed by net-negative emissions, warming remains approximately proportional to cumulative CO2 emissions, but with an additional delay possible due to lags in the carbon cycle 70 and thermal response to changing CO2 (Tokarska and Zickfeld, 2015;Jones et al., 2016;Zickfeld et al., 2016;Tokarska et al., 2019).
To better understand the long-term dynamics of the carbon and climate systems, here we compare a set of four ESMs and one EMIC under a pair of high emissions and overshoot future climate scenarios that diverge in emissions in the mid-21st century, 75 and explore the dynamics of carbon and climate under these contrasting trajectories. Further, because these models all report more detailed information that can allow some degree of process attribution to the dynamics. We thus also explore whether and where the predicted carbon and climate responses are relatively robust, both within any one model over time and between scenarios, as well as across models for any given scenario and time period. This allows us to explore where model agreement does and does not exist, both in the globally integrated response, as well as in the regional and process drivers of that response. 80 Where ESM or EMIC behavior shows either fundamental disagreement on geographic or process drivers for feedbacks, or shows global dynamics deviating significantly from the expected linearity between warming and cumulative emissions, we interpret it as showing a potential for surprises in the future dynamics of the Earth system.

Scenario Descriptions 85
All models were forced using the SSP5-8.5 and SSP5-3.4-overshoot scenarios (Riahi et al., 2017;Gidden et al., 2019;Meinshausen et al., 2020). These scenarios were constructed as part of the CMIP6 set of coordinated experiments for ESMs , and arose out of the ScenarioMIP and SSP design effort (O'Neill et al., 2014(O'Neill et al., , 2016 to cover a wide range of socioeconomic and policy scenarios and resulting trajectories of greenhouse gas forcings to the Earth system. The simple extensions beyond 2100 were adapted from those originally conceived in O'Neill (2016), as described in Meinshausen et al., 90 (2020). Both of these scenarios follow the SSP5 21st century 'storyline' (Kriegler et al., 2017), which is premised on strong economic growth, relying largely on fossil fuels in the no-climate-policy baseline. However, they diverge in the year 2040: the SSP5-8.5 scenario continues on to long-term emissions growth, an 8.5 W m -2 anthropogenic greenhouse gas radiative forcing by 2100, and CO2 increases until the mid-23rd century. The SSP5-3.4 scenario dramatically changes course in 2040 after emissions peak, and continues with sustained net-negative CO2 emissions until the mid-22nd century before stabilizing to near-95 zero emissions thereafter; the scale of negative emissions required for this overshoot (>5 PgC/yr for several decades) is much larger than in the RCP2.6 scenario reported by Jones et al. (2016).
All models were forced with specified atmospheric greenhouse gas concentrations (not greenhouse gas emissions) (Meinshausen et al., 2020), and land use change forcings (Ma et al., 2020;Hurtt et al., 2020). In a concentration-forced ESM 100 simulation, the land and ocean carbon cycles respond to the CO2 concentrations in the atmosphere, which are specified via a global-mean timeseries, but do not feed back on atmospheric CO2 (Fig. 1a). By integrating the total atmospheric CO2 reservoir, alongside the prognostic carbon reservoirs on land and ocean, compatible fossil fuel and industrial CO2 emissions can be calculated to satisfy the conservation of carbon within the Earth system, such that anthropogenic fossil fuel and industrial emissions equal the sum of total changes in land, atmosphere, and ocean carbon stocks . Land-use-105 driven carbon emissions are directly reflected in changes in the terrestrial carbon inventories and thus can be separately inferred based on terrestrial model dynamics themselves. Because of differences in carbon cycle feedbacks and in the representation of land use fluxes, the CO2 emissions inferred by ESMs to be consistent with a given CO2 concentration pathway will not generally be equal to the CO2 emissions that were provided by the IAM community for each scenario (Riahi et al., 2017;Gidden et al., 2019;Meinshausen et al., 2020), as shown below. The reason for this is that the representation of C cycle in 110 ESMs is different from the model (MAGICC7) used to convert the IAM emissions into atmospheric CO2 concentrations in the first place (Meinshausen et al., 2020).
We apply a seven-year running mean to all global timeseries in order to remove the short-term dynamics and focus on longerterm variability. 115

Model Descriptions
Here we use results from four ESMs and one EMIC to explore the responses of the Earth system to the two long term scenarios.
The models used here were the only models that had performed and archived the necessary experiments as of the time of writing. The four ESMs, all from the CMIP6 generation of models, are the Canadian Centre for Climate Modelling and Analysis fifth-generation Earth System Model, (CanESM5) (Swart et al., 2019d), Community Earth System Model, version 120 two, Whole Atmosphere Configuration (CESM2-WACCM6) (Danabasoglu et al., 2020), the Institut Pierre Simon Laplace (IPSL) IPSL-CM6A-LR (Boucher et al., 2020), and the U.K. Earth System Model (UKESM1) (Sellar et al., 2019). The EMIC is the University of Victoria Earth System Climate Model, version 2.10 (UVic-ESCM) (Mengis et al., 2020).
CanESM5 represents a major update since its predecessor CanESM2 (Arora et al., 2011), which was used in CMIP5, and is 125 described in detail in Swart et al. (2019d). The resolution of CanESM5 (T63 or ~2.8° in the atmosphere and ~1° in the ocean) remains similar to CanESM2, and is at the lower end of the spectrum of CMIP6 models. CanAM5, the atmospheric component of CanESM5, has several improvements relative to its predecessor including changes to clouds, aerosols, radiation, land surface, and lake processes.The land component in CanESM5 is represented using the Canadian Land Surface Scheme vertically-resolved soil biogeochemistry that includes permafrost carbon dynamics; acclimation of photosynthesis and plant respiration to changing temperature; and many others. Because of an artifact in model initialization procedure for soil carbon in CESM2, which left a set of grid cells in the High Arctic with unrealistically high values, here we apply a mask to exclude all the grid cells where vegetation productivity was equal to zero during a 100-year period of the preindustrial control simulation. This, alongside other model differences, including snow biases in the coupled model, also had the effect of 155 reducing the permafrost carbon pool in CESM. Thus, while permafrost dynamics are permitted in CESM2 and CLM5, their feedback to warming is weaker than in the earlier CLM4.5 model as described in Koven et al. (2015).
IPSL-CM6A-LR (Boucher et al., 2020) is the model which was used by the Institut Pierre-Simon Laplace (IPSL) to run most of the simulations needed in the context of the sixth phase of the Coupled Model Intercomparison Project. This coupled model 160 includes the atmospheric LMDZ model, version 6A-LR (Hourdin et al., 2020), the ocean circulation NEMO model, version 3.6, (Madec et al., 2017), including sea ice NEMO-LIM3 model and thermodynamics and ocean biogeochemistry PISCES-v2 (Aumont et al., 2015), and the carbon cycle ORCHIDEE model, version 2.0 (Krinner et al., 2005). ORCHIDEE and PISCES models are coupled to the atmospheric LMDZ model via the OASIS3-MCT coupler (Marti et al., 2010). ORCHIDEE and LMDZ share the same spatial resolution of 2.5° x 1.3°, with the vertical atmospheric resolution being composed of 79 vertical 165 levels up to 80km high. PISCES uses the eORCA1 quasi-isotropic global tripolar grid of 1°, with an additional refinement of 1/3° in the equatorial region and 75 levels in the vertical direction, with steps from 1 to 10m in the surface up to 200m at the bottom. ORCHIDEE land surface model (version 2.0) does not include full nutrient cycles but does include a downregulation of 170 maximum photosynthetic rates under high CO2 concentrations. Based on Sellers et al. (1996), a logarithmic function was used for modeling the downregulation, using a reference CO2 value of 380ppm. Moreover, the photosynthesis is calculated from radiation, soil moisture and temperature. The model includes fifteen PFTs that are grouped into three classes (tall vegetation, short vegetation, and bare ground) for the tiling of the land surface. These PFTs share the same leaf phenology but respond to different individual parameters. ORCHIDEE has an eleven-layer soil hydrology scheme, calculating its budget on a tile-basis 175 to keep the balance in soil moisture distribution. Autotrophic and heterotrophic respiration is finally computed for different pools. Plant, litter and soil carbon pools are estimated on a modelled daily basis, compared to all other budgets, that are calculated every 15min, based on the atmospheric dynamics.
PISCES models various plankton types (phytoplankton, micro-and mesozooplankton) as well as the biogeochemical cycles 180 of carbon, and main nutrients (P, N, Fe, and Si), where N, P and Si are the limiting nutrients in the phytoplankton's growth.
The model has a fixed C:N:P ratio. Oceanic carbon and nutrients input into the model come from atmospheric deposition, river discharge in coastal regions and sediment transport. Sellar et al (2019) and its configuration for CMIP6 simulations, including the ScenarioMIP 185 runs is described in Sellar et al (2020). Land and atmosphere share the same horizontal grid: a regular latitude-longitude grid with 1.25º × 1.875º resolution. There are 85 vertical levels extending to 85km in the stratosphere, and full stratospheretroposphere atmospheric chemistry is simulated using the UKCA model. The ocean component uses the NEMO dynamical ocean on a nominally 1• tripolar grid with 75 vertical levels and an explicit nonlinear free surface.
The terrestrial biogeochemistry in UKESM1 is based on the land-surface model JULES (Clark et al., 2011;Best et al., 2011), 190 but with some major enhancements developed for UKESM1. In particular the inclusion of a prognostic nitrogen cycle  allows representation of limitations to carbon storage due to availability of nutrients. Parameters related to photosynthesis, respiration, and leaf turnover have been updated (Harper et al., 2016). The number of natural PFTs was increased from five to nine to represent the distinction between evergreen and deciduous plants and between tropical and temperate evergreen trees. The new dynamic vegetation and PFTs yield a closer match to observed vegetation distribution, 195 with particular improvements to tropical and boreal forests and the high latitudes (Harper et al., 2018). The land use scheme designates a portion of each gridbox as cropland and a portion as pasture land, where only crops and pasture grasses can grow, respectively, to the exclusion of trees and shrubs. In the remainder of the gridbox, nine natural PFTs compete for space, which determines the distribution of forests, grasslands, shrublands, and bare soil. phytoplankton, zooplankton and detritus (Keller et al., 2012). The new ocean biogeochemistry module includes phytoplankton light limitation, a more realistic zooplankton growth and grazing model, and an iron limitation scheme to constrain phytoplankton growth. Sediment processes are represented using an oxic-only calcium-carbonate model. Decadal-average values of spatially-explicit variables are used for this study.

Climate Responses
In the historical and SSP5-8.5 scenario, global mean temperature change relative to the preindustrial (Fig. 1b) increases monotonically in all models, with a wide range of responses by 2300 from ~18 degrees C in the CanESM5 to ~8 degrees in the UVic-ESCM model. Here, a notable difference arises between the ESMs and the UVic-ESCM EMIC, with much higher transient warming in the ESMs than in the EMIC. This is at least in part due to a sampling bias related to the set of models that 225 have performed these long-term scenarios: all four ESMs used here report transient climate response (TCR) greater than 2.4 ºC and a transient climate response to emissions (TCRE) greater than 2 o C/EgC (mean of 2.16 o C/EgC) versus the CMIP6 mean of 2.0 ºC TCR and 1.8 o C/EgC TCRE ; whereas the specific version of UVic-ESCM used here reports a TCR of 1.8 ºC and a TCRE of 1.8 o C/EgC , closer to the CMIP6 mean. During the period of CO2 stabilization and decline in the 23rd century, three of the ESMs continue to warm substantially, while in one ESM (UKESM1) 230 and the EMIC, the global temperature stabilizes. This implies a substantial slow component in the models which continue to warm past the period of CO2 stabilization, beyond the effective transient values reported above (which reflect short-to-medium term processes that dominate the TCRE) (Proistosescu and Huybers, 2017).
In contrast, in the SSP5-3.4-overshoot scenario, global temperatures follow the CO2 concentration trajectory to first peak and 235 then cool during the 21st century in all models. Subsequent dynamics vary between the models: most stabilize at a cooler temperature than the peak 21st century value, while one model (CESM2-WACCM) reaches a minimum temperature at ~2200, and then resumes warming, albeit at a slower rate, during the 23rd century. As in the very high emissions scenario, there is a separation in the amount of warming between the relatively less sensitive EMIC and more sensitive ESMs both at the peak and in the subsequent overshoot and stabilization period. 240

Carbon Cycle Responses
Responses of the globally-integrated terrestrial and marine carbon cycle to the two scenarios for all models are shown in fig. 1c-d. Under both the SSP5-8.5 and SSP5-3.4-overshoot scenario, the terrestrial carbon cycle (Fig 1c) in all models shifts at some point from being a net sink of carbon from the atmosphere to a neutral or net source of carbon to the atmosphere. In the SSP5-8.5 scenario, the timing of this transition varies widely between models, from ~2100 in UVic-ESCM to ~2220 in 245 CESM2-WACCM. The magnitude of the carbon fluxes also varies widely between models, with CanESM5 showing strongest terrestrial uptake, peaking around 2100, and then reversing to become the strongest terrestrial source out of the models examined here during the 23rd century.
Overall, the pattern of terrestrial sink-to-source transition under long-term high emissions is qualitatively consistent with the 250 results of Tokarska et al. (2016), which show a similar transition in all of the models examined in the RCP8.5 extension experiment. This pattern follows from the dynamics described by Randerson et al. (2015) whereby terrestrial carbon-climate feedbacks strengthen over time, at the same time that the terrestrial carbon-concentration feedbacks weaken, although the experimental protocol followed here, which does not separate CO2 climate and physical effects as in , does not allow this feedback decomposition to be performed. 255 For the SSP5-3.4-overshoot scenario, model agreement of the terrestrial carbon cycle is much higher, with all models transitioning from sink to source during the late 21st or early 22nd centuries, which counteracts some of the net-negative anthropogenic emissions by that time in terms of their effect on lowering atmospheric CO2 concentrations, and consistent with the CMIP5 RCP2.6 results shown in Jones et al. (2016). The timing of the biospheric switch from sink to source follows by 260 decades the change in the sign of CO2 emissions from net positive to net negative. All of the models then revert to a roughly carbon-neutral terrestrial biosphere during the 23rd century. Notably, models across the ensemble show a reduced range of variation in the magnitude of carbon fluxes for the SSP5-3.4-overshoot scenario, relative to the SSP5-8.5 scenario.
Over ocean (Fig 1d), inter-model agreement is in general much higher than over land, although ensemble spread does increase 265 beyond 2100 in the SSP5-8.5 scenario. Peak carbon uptake for both scenarios occurs prior to 2100 in all models, with an earlier and smaller-magnitude peak in the SSP5-3.4-overshoot than the SSP5-8.5 scenario. In the models, the ocean carbon uptake then gradually weakens but remains positive through the 22nd and 23rd centuries in the SSP5-8.5 scenario, while in the SSP5-3.4-overshoot scenario, uptake rapidly reverses to become a source through most of the 22nd century (lagging behind the change in the sign of net CO2 emissions by decades), before then reversing again in the late 22nd century to become a weak 270 sink again through the remainder of the scenario.

Diagnosed CO2 Emissions
Annual ( In the SSP5-8.5 scenario, the model ensemble spread in compatible emissions is widest at the end of the 21st century when 280 emissions also peak, and declines during the 22nd century. In one model under SSP5-8.5 (CanESM5), negative emissions are required in the 23rd century to balance the strong and sustained terrestrial carbon source active at that time in that scenario, whereas in the rest of the models slightly positive or roughly zero emissions are inferred for the scenario. In the SSP5-3.4overshoot scenario, the ensemble spread in compatible emissions peaks first at the time of peak positive CO2 emissions, and then increases again during the period of strongest negative CO2 emissions, as models disagree on the magnitude of carbon 285 cycle responses to each of these phases.
The shape of the cumulative diagnosed CO2 emissions ( fig. 2b) roughly follows the trajectory of the atmospheric CO2 concentrations shown in fig. 1a. Ensemble spread in cumulative diagnosed CO2 emissions shows the relative responses of the carbon cycles in each model to positive and negative CO2 emissions, with, e.g. the IPSL-CM6A-LR model requiring higher 290 cumulative emissions to balance its stronger sink throughout the entirety of the SSP5-3.4-overshoot scenario, and the CanESM5 model requiring higher amounts of cumulative CO2 emissions to balance its high sink in the SSP5-8.5 scenario until ~2200, when that model's terrestrial system reverses from a strong sink to a strong source.
Each of the fluxes, averaged across the models, are shown together in figure 2c-d. Here, for each scenario, the pink line shows 295 the inferred emissions time series, the black line shows the change in atmospheric CO2, and the accumulation in land (green) and ocean (blue) are shown by the area of sinks (hatching) or sources (stippling). This shows lags in the land and ocean carbon fluxes in response to changes in emissions, particularly for the SSP5-3.4-overshoot scenario ( fig. 2d), where the terrestrial and ocean systems remain sinks for several decades during the period of declining and negative CO2 emissions, before they switch to become sources, which partially offset the negative emissions. In the SSP5-8.5 scenario, lags are less evident, but the net 300 behavior of the ocean is to at least partially offset the net carbon losses on land during the period after the mid-22nd century.

Temperature Response to Cumulative Emissions
Plotting global mean temperature change as a function of diagnosed cumulative CO2 emissions ( fig. 3) reproduces the near linear relationship between temperature change and cumulative CO2 emissions described in (Tokarska and Zickfeld, 2015;Jones et al., 2016;Zickfeld et al., 2016). Note, however, that the temperature change shown here includes the response to non-305 CO2 forcings, whereas the linear relationship is strictly defined only for CO2 (Matthews et al., 2009;Collins et al., 2013), and thus the relationship shown here represents an "effective TCRE" (Matthews et al., 2017) that includes these non-CO2 forcings.
There is some deviation from linearity in the cumulative carbon emissions to temperature relationship in SSP5-8.5 scenario.
Initially, up to approximately 2000 PgC, temperature increases less than linearly with cumulative CO2 emissions. There are two potential explanations for this curvature. The first potential explanation is the role of non-CO2 forcers, which contribute a 310 larger fraction of the total greenhouse gas forcing in the early than late part of this scenario (Meinshausen et al., 2020). The second potential explanation is lags in the carbon and climate systems relative to emissions. An analysis with CO2-only experiments up to 2100 (Nicholls et al., 2020a) found a similar slight negative curvature as observed here, suggesting that lags in the carbon and climate systems are dominant up to around 2000 PgC. The temperature vs cumulative emissions relationship is approximately linear for cumulative emissions between 2000 PgC and 4000 PgC, except for the UVic ESCM. The less-than-315 linear relationship in EMICs was noted before and attributed to the saturation of CO2 radiative forcing at high cumulative emissions (Herrington and Zickfeld, 2014;Tokarska et al., 2016). In the final half-century of the SSP5-8.5 scenario, temperature in the ESMs continues to increase in response to approximately stable radiative forcing. This continued warming reflects the lags in the carbon and thermal response to CO2 emissions and non-CO2 forcings. The one EMIC shows a more linear response in this part of the scenario than the ESMs. In the ESMs shown here for the SSP5-8.5 scenario, this lagged-320 warming tail is larger, particularly in the case of CanESM5, than the corresponding behavior shown in Tokarska et al., (2016).
By breaking the cumulative emissions plots into roughly centennial-length segments, fig 3 shows the dynamics for the two scenarios over time for all of the models. This underscores the continuity of the cumulative emissions curve through the 22nd century in the SSP5-8.5 scenario ( fig. 3a), and the break in that relationship for several of the models during the 23rd century. 325 For the SSP5-3.4-overshoot scenario ( fig. 3b), the separation by centuries further shows the slight nonlinearity evident in some-but not all-of the models during the peak and initial overshoot period. In the overshoot scenario, there is not a consistent deviation from linearity at the point of overshoot and negative CO2 emissions. Some models (UVic-ESCM, CanESM5) follow roughly the same trajectory in temperature vs cumulative emissions space in the initial period of negative emissions, while others (IPSL-CM6A-LR, CESM2) follow a lower-temperature trajectory after peak warming, and one model 330 (UKESM1) follows a higher-temperature curve in temperature vs cumulative space after peak emissions. This could be due to different roles of carbon versus thermal inertia in the declining CO2 phase (Zickfeld et al., 2016). CESM2-WACCM also shows a distinctly different 23rd-century response to the other models, with a significant increase in temperatures in response to nearconstant radiative forcing, and nearly zero inferred emissions, over this period.

Terrestrial carbon cycle
Aggregated globally, there is some commonality and a large degree of divergence between models across these two contrasting scenarios. While all models show some consistent patterns (e.g., a shift on land from sink to source), individual models also show differing dynamics both in the patterns, the timing, and the magnitudes of the carbon and temperature response. It is possible to disaggregate these dynamics regionally to better understand the mechanistic basis of the carbon and temperature 340 response, and to explore whether any qualitative similarity holds at the more regional scales. We thus focus on zonal-mean trajectories of carbon and temperature as a way to further understand the degree of similarity in results across models and within any model over time and scenarios. Fig. 4 shows zonal-mean terrestrial carbon flux dynamics for the five models and two scenarios over the full historical to future 345 period. The value for a given latitude is the average over all land cells in that latitude, regardless of the fractional coverage of land in grid cells. In the historical and near-future time period that are shared between the scenarios, the five models already show a strong divergence in behavior: CanESM5 projects 21st century carbon sinks in both the tropics and northern high latitudes; CESM2-WACCM has one main sink area in the tropics and a much weaker sink at northern mid and high latitudes; IPSL-CM6A-LR and UKESM1 also have one main sink area, but in the northern mid-and high latitudes, and UVic-ESCM 350 shows a weak sink in the tropics and growing carbon source in the higher latitudes. Over time in the SSP5-8.5 scenario ( fig.   4a), each of these models show further divergent results: in CanESM5, both the tropical and northern high latitude regions shift from sinks to sources at roughly the same time, becoming sources by the mid-22nd century; in CESM2-WACCM, the tropics remain a sink through the end of the 22nd century, while the northern high latitudes shift to become a source by the end of the 21st century, with the source peaking during the 22nd century and weakening thereafter; in IPSL-CM6A-LR the 355 northern sink weakens gradually over time to become neutral by the mid-22nd century, while the tropics become a strong source of carbon during the 22nd century; in UK-ESM1, the northern sink is sustained while the tropics shift to become a source, and in UVic-ESCM the northern high latitudes become a strong source and the tropics a weak source. Thus the regionally-disaggregated dynamics show even greater divergence than the global integral, with differing locations-and thus mechanisms-driving the overall shift from sink to source across models. Further, the areas of most active terrestrial carbon 360 cycle dynamics shift from one region to the next across centuries within any one model.
Zonal-mean dynamics are both more muted in magnitude and more similar between models for the SSP5-3.4-overshoot scenario ( fig. 4b). In both the CanESM5 and CESM2-WACCM models, the early sinks weaken in favor of a source of carbon in the tropics during the net negative CO2 emissions period from roughly 2050 to 2150. The IPSL-CM6A-LR and UKESM1 365 models show similar dynamics, but with a larger overlay of interannual variability. The UVic-ESCM model shows a relative brief but strong loss of carbon from northern high latitudes during the period of peak warming, and a slower and weaker loss of carbon from the tropics during the subsequent period of net negative CO2 emissions. delay relative to vegetation pools, through the mid 22nd century, but then shifts to lose carbon in soils from both belts by the end of the 23rd century; CESM2-WACCM gains soil carbon through most of the world but also projects substantial carbon losses from the northern high latitude soils beginning in the late 21st century; IPSL-CM6A-LR loses soil carbon, mainly from the tropics, starting mainly during the 22nd century; UKESM1 shows a stronger tropical soil carbon loss and a higher latitude soil carbon gain; and UVic-ESCM shows strong losses of carbon at northern high latitudes and gains of soil carbon in the 380 northern mid-latitudes. Thus, under the SSP5-8.5 scenario, both soil and vegetation dynamics differ markedly across the models, as well as regionally within each model.
For the SSP5-3.4-overshoot scenario, zonal-mean disaggregation of vegetation and soil carbon ( fig. 5c-d) shows some greater degree of similarity between model dynamics. All five models agree that northern mid-high latitudes will gain carbon in 385 vegetation. In the tropics, three models (CanESM5, CESM2-WACCM, and IPSL-CM6A-LR) predict that carbon gains in tropical vegetation peak by the end of the 21st century, while UKESM1 predicts sustained tropical vegetation carbon losses from the historical through the end of the scenario, and UVic-ESCM shows more neutral behavior of vegetation globally.
CESM2-WACCM and UKESM1 also show substantial losses of vegetation carbon in subtropical ecosystems. For soil carbon dynamics, the patterns are much more muted than in the high-emissions case, with weaker but sustained carbon gains in soils 390 of northern high latitudes in the CanESM5, IPSL-CM6A-LR, and UKESM1 models, and a weaker loss of carbon from northern high latitudes and gain of carbon in the northern mid-latitudes in the CESM2-WACCM model, and for the UVic-ESCM model, northern soil carbon losses are weaker than in the very high emissions case but still stronger than any of the other models.

Ocean carbon cycle
Zonal-mean breakdowns of the ocean carbon cycle are much more consistent between models (Fig. 6). All models show near-395 term sinks in the mid and high latitudes of both hemispheres, with sources in the tropics. Under the SSP5-8.5 scenario, all models show a poleward migration of the Southern Ocean sink, and a weakening followed by a strengthening of the tropical source. In four of the models, the northern mid-and high-latitude sinks weaken during the 22nd century, while they remain strong in the IPSL-CM6A-LR model. While the zonal-mean patterns of the ocean carbon flux are broadly consistent across the models, the magnitudes and meridional extents of the source and sink regions vary significantly, leading to the large spread 400 in net global fluxes across models seen under the SSP5-8.5 scenario (Fig 1d). In the SSP5-3.4-overshoot scenario, all models show roughly similar dynamics: the tropical source strengthens, the northern mid-and high-latitudes sinks weaken, and the southern ocean shifts from sink to source, although differences in the timing, strength and meridional extent of these transitions are again evident between models.

Temperature 405
Regional temperature dynamics are roughly similar between ESMs (Fig. 7). All models show polar amplification, and thus warming proceeds faster at high latitudes, particularly in the northern hemisphere. Under the high emissions scenario, global warming is overwhelming, with >10 degrees C warming at all latitudes and much higher warming at the poles, with warming reaching 10 degrees at the northern polar region within this century in all five models. Under the overshoot scenario, polar amplification is still present in all models, and global warming peaks and then declines and stabilizes after the peak CO2 period 410 in all models except CESM2-WACCM.
In CESM2-WACCM, for the overshoot scenario, the northern mid-to high latitudes return to almost the preindustrial temperature during the 22nd century, and then subsequently warm again in the 23rd century, despite no further CO2 concentration increases; this area is responsible for the vertical tail to the cumulative emissions-temperature change plot in the 415 23rd century in that model (Fig. 3b). Plotting the 100-year mean temperature difference between the 23rd and 22nd centuries ( fig. 8a), shows that the 23rd century warming in the model is centered on the Northern Atlantic, suggesting a control by Atlantic Meridional Overturning Circulation (AMOC). To explore this hypothesis, we calculate AMOC as the maximum value of the annual-mean meridional mass flow stream function in the Atlantic basin north of 20N, and plot time series of this for all ESMs and scenarios in fig. 8b. This shows that CESM2 AMOC starts out with stronger AMOC than the other ESMs, which 420 partially collapses during the period of warming to reach a minimum at around 2100 and recovers thereafter in the SSP5-3.4overshoot scenario, with a much stronger rebound than other ESMs considered here. This AMOC recovery response is consistent with earlier long-term overshoot scenarios (Nakashiki et al., 2006) as well as long-term constant 2xCO2 experiments (Manabe and Stouffer, 1994). Thus this supports the interpretation that the AMOC recovery in that model drives the 23rdcentury warming. An additional piece of evidence to support the transient shutdown and subsequent recovery of AMOC as 425 being the key driver of the CESM-WACCM temperature vs cumulative emissions nonlinearity shown here comes from the comparison of Hu et al. (2020) between CESM2 and a closely related model, E3SM. They show that both models have similar ECS but CESM has a substantially lower TCR, which they attribute to its higher sensitivity of AMOC strength to warming.
The 23rd century warming in CESM2-WACCM thus appears to reflect an AMOC that is transiently weakened during the 22nd century due to freshwater influx associated with warming, leading to relative cooling around the North Atlantic and throughout 430 the high latitudes, but which then recovers and removes that cooling anomaly that was present during the weakened-AMOC period.

Discussion
The concept of proportionality of global warming to cumulative emissions and the related metric of transient climate response to emissions (TCRE) are enormously valuable in understanding the expected response of global temperature change to 435 anthropogenic emissions. At the same time, the utility of this framework is limited by both the persistent spread in TCRE across model ensembles Jones et al., 2013), as well as the possibilities of behavior in the coupled climate and carbon cycle systems that may give rise to nonlinear trajectories of temperature as a function of cumulative emissions.
The IPCC and UNFCCC framework for climate change policy uses a framework described in . Within this abstraction of the climate system, these nonlinearities take two forms: a zero emission commitment (ZEC), which is any 440 warming which arises after the point that CO2 emissions reach net zero, and thus would lead to vertical tails (either positive or negative) in the cumulative emissions plots (although part of the tail warming shown here may be in response to near-constant non-CO2 forcing), and an allowance for Earth system feedbacks that are unrepresented or underrepresented in existing Earth system models and thus not included in the spread of TCRE from ESMs. Physical or biogeochemical lags in the Earth system, beyond those quantified by the ZEC or specifically enumerated as unrepresented feedbacks, are not accounted for in this 445 framework, though the updated framework of Nicholls et al., (2020a) allows for non-linearities between cumulative CO2 emissions and CO2-induced warming in remaining carbon budget calculations.
The pair of scenarios explored here bracket a wide range of possible dynamics in the Earth system over the next few centuries, from a high CO2 concentration world with continuous and overwhelming global warming over the coming centuries to one in 450 which CO2 is stabilized and reduced following a peak warming during this century (Figs. 1-2). In each of these scenarios, the models studied in general follow the expected linearity in TCRE, before 2200 (Fig. 3). At the same time their internal dynamics vary widely from each other, particularly in the terrestrial carbon cycle and under high levels of global warming, where there is little agreement on the geographic and mechanistic drivers of the terrestrial carbon cycle responses to the warming . Possibly this is due to some degree of tuning (either implicit or explicit) to capture the observed globally-integrated 20th 455 century carbon balance trajectory, a constraint whose influence weakens at regional levels and over time into the future (Hoffman et al., 2014). signature of large soil carbon losses at high latitudes under the high-warming scenario. Overall, the models differ widely in the aggregated magnitude of their responses to climate and elevated CO2 . The structural differences likely underlie the diversity of global and regional responses, although given the myriad structural and parametric differences between the models it is not possible to attribute the dynamics in a more rigorous way (Fisher and Koven, 2020), nor is it possible with the limited sample size considered here to assess whether model agreement in shorter term response arises due 470 to common representation of relevant processes or calibration constraints imposed by historical global carbon-climate dynamics. Nonetheless, the diverse potential for global and regional carbon cycle dynamics to change sign under these scenarios highlights the continued need for improved comprehension of the major drivers of terrestrial carbon cycle dynamics.
The ocean carbon cycles of the models, and the thermal response of the climate system to greenhouse gas forcing, in general 475 shows better qualitative agreement with each other (Figs. 1c, 6), but again ensemble spread increases after the 21st century in the high-emission scenario, and other surprises may be in store. In particular, two distinct types of lags in the physical system may lead to further warming beyond the time period in which greenhouse gases increase: if carbon emissions cease without overshoot, lags in the physical climate may lead to continued warming after cessation, while in the overshoot case, mechanisms such as AMOC slowdown may temporarily obscure some of the warming, but then upon recovery of AMOC this temporary 480 regional cooling may dissipate, leading to a resumption of warming long after the CO2 has stabilized ( fig. 8). If such dynamics are real features of the Earth system, this would be of critical concern -even if we deploy large negative emissions, we still would have to have a plan for a world in which all they do is stabilise, rather than reduce, temperatures. Of particular note is that CESM2 showed a negative zero emission commitment in , despite showing the large 23rdcentury warming with near-zero inferred emissions in the overshoot and stabilization scenario here, indicating that the ZEC 485 framework as currently defined by MacDougall et al. (2020) as the temperature change evaluated 50 years following net-zero emissions may be insufficient for quantifying such lagged effects of CO2 on climate. Further, we note that CESM2 shows the highest effective ECS of any of the models whose transient climate response is within the "likely range" as constrained by observed warming trends (Nijsse et al., 2020), and thus underscores the possibility for nonlinearities in temperature versus cumulative emissions unless behavior can also be constrained by paleoclimate evidence (Sanderson, 2020;Tierney et al., 490 2020 Given that the plant-physiological and other CO2 concentration-dependant processes represented in models are not routinely tested against observations from the highly out-of-sample conditions experienced under each of these scenarios (e.g. very high atmospheric CO2 concentrations under SSP5-8.5 or rapidly decreasing CO2 concentrations under SSP5-3.4-overshoot), it is to 495 be expected that model differences will be large. Despite this, it is important to note that the ensemble spread in compatible emissions, which include all these uncertain carbon-cycle feedbacks, is relatively small when compared to the mean magnitude of the emissions themselves, particularly under the overshoot scenario ( fig. 2a-b). Thus the uncertainty associated with carbon cycle feedbacks is relatively small compared to the anthropogenic emissions themselves. Further, the uncertainty does not conflict with the central result that warming is roughly proportional to cumulative emissions, at least for this century and the 500 following one, and these results support the need for rapid reductions in CO2 emissions to prevent the extreme impacts associated with warming.
Nonetheless, these results also suggest that there may be longer-term surprises in the coupled climate-carbon system to be encountered both in high-emissions and overshoot warming scenarios. The evident lack of consistent predictions in the 505 terrestrial models, combined with the known structural differences, and the fact that none of the models include a complete set of processes that may be considered likely to affect the terrestrial carbon cycle, support the approach of budgeting for feedbacks present in the Earth system but not included in ESMs, at least until greater convergence in terrestrial carbon cycle models can be shown. The ocean carbon cycle as well shows greater uncertainty on this time horizon than on the pre-2100 dynamics, for which there is greater agreement. The wide range of results shown here, despite a small number of models 510 analyzed, also underscore the need for further testing of model dynamics on these longer timescales, the inclusion of more models and more systematic exploration of parameter and structural uncertainty on these longer-term dynamics, as well as the identification and use of observational constraints that are relevant to these longer-term dynamics of the coupled carbon and climate systems. At the same time, unanticipated physical dynamics, such as the transient weakened-AMOC-driven cooling and its subsequent reversal may also be relevant on long timescales. Thus, we should continue to anticipate that surprises in 515 the long timescale climate response are possible, even under relatively mitigated global warming scenarios, and which should be taken into consideration when setting global climate policy.

Conclusions
We examine four CMIP6 ESMs, alongside a reduced-complexity EMIC, in a pair of experiments that extend to the year 2300, 520 to explore the dynamics of the coupled carbon and climate systems on this timescale, which is longer than those typically considered in ESM analysis. We show that under contrasting high-emissions and overshoot scenarios warming is approximately proportional to total cumulative CO2 emissions, but also that distinct deviations from linearity arise in some of the ESMs on post-2200 timescales. These deviations underscore the limits to our ability to coherently project the dynamics of the Earth system on these longer timescales. We note that, as on shorter timescales, the projections of terrestrial carbon 525 dynamics differ most strongly between models, and that on the longer timescales there is still enormous uncertainty in projected carbon dynamics. This uncertainty is evident in multiple ways: between the model projections of global carbon changes; between the model projections of the geographical regions contributing to feedbacks; between the pools responsible for the basic mechanisms of carbon cycle variability; and between one century to the next within models. We also show that lagged temperature effects leading to warming after cessation or reversal of emissions, beyond what has been shown in earlier or 530 simpler models, may be possible outcomes in these projections. These results show that a greater emphasis on identifying, attributing, and reducing uncertainty is needed on the wider range of possible futures that can be explored on these longer timescales, and that until such uncertainty can be reduced, we must anticipate and allow for surprises such as these in formulating global climate policy over these longer timescales.

Author Contributions
C.K. and K.Z. conceived of study, with contributions from D.L.. All authors contributed to the design and/or use of one of the