Comment on esd-2021-2

This manuscript presents possible scenarios for the Earth’s climate during the next million years. It brings interesting and new material to this rather overlooked and underresearched area. Overall, I am favourable to publication, but I have nevertheless several important comments that the authors should consider in a revised version of the manuscript. Most importantly, the whole exercise is based on shaky hypotheses: I am aware that there aren’t many alternatives, but it is all the more important to present and discuss them thoroughly.

1 -Using Quaternary climate (and more precisely the last 800 kyr period) to calibrate a conceptual model to be applied to the future is certainly not the ideal choice since there are no very hot periods, but mostly glacial ones. The (partial or complete) melting of Greenland or Antarctica is therefore not considered, and the effect of high CO 2 levels cannot be calibrated. In other words, the whole exercise is more an extrapolation than an interpolation. This is something known to be quite dangerous. I perfectly understand this choice, based on the availability of data, but it remains nonetheless not satisfactory. This should be stated much more clearly in the paper.
2 -Our knowledge of the long-term carbon cycle and of the ultimate fate of fossil-fuel carbon is also very thin, shaky or uncertain. The manuscript uses model results (Lord et al. 2016;based on Lenton et al. 2006) that have unfortunately no "real world" tests. The imposed exponential decay of carbon is therefore based on the (mostly theoretical) idea that the carbon cycle is regulated uniquely through silicate weathering. This approach neglects many other important processes that are known to have played a critical role on these timescales in the past and even today, like organic matter burial or kerogen weathering. Again, I do not contest the value of using a simple hypothesis, but this should be explained and discussed.
To summarize, sentences like "we produce a probabilistic forecast" (line 14 in the abstract) are not acceptable. This is obviously not a "forecast" but only a possible scenario, based on our very limited knowledge of the dynamics of geological transitions in the past.
We are clearly not able today to "forecast" what the Anthropocene era will be, and this should be stated much more clearly.
Other comments: 3 -It appears that one of the most critical parameter, K, is not well constrained using the conceptual model or the chosen paleoclimatic dataset, as explained in §3.2.
« Our results indicate that with the model derived in this study the possible values of the coefficient K range between -1279 and -31 W m-2, with a median of -393 W m-2 » Using results from an Emic model (CLIMBER-2, Ganopolski et al, 2016) the authors decided to select only a very small subset of solutions ("Accepted") that are all in the tail of the distribution of "Valid" solutions as shown on Fig.2. This appears as a strong shift in the overall strategy and raises a few questions: How does the correlation to data vary across the histogram on Fig.2? Are the "Valid" solutions close to the 0.7 correlation limit and the center of the distribution farther away from this limit? The Ganopolski et al (2016) insolation-CO2 threshold is also based on a parameter selection using a comparison to (basically) the same ice volume data. The problem is therefore not that the paleodata does not constrain well the K parameter but that the chosen conceptual model and the CLIMBER-2 model do not represent the role of CO 2 onto the dynamics of ice sheets in the same way. Why do the authors choose to trust one model against the other? And to adjust on model on the other? More technically, how are inceptions defined in the conceptual model? 4 -Another strong limitation concerns the simple addition of "natural" and "anthropogenic" carbon, as presented line 182: « In addition, we assume that natural and anthropogenic CO2 anomalies can be simply summed up and that at the preindustrial time the global Carbon cycle was in equilibrium. This is, obviously, a very strong assumption since even a rather small imbalance in the global Carbon cycle which is impossible to detect at the millennial timescales can result in a very large "drift" of the Earth system from its preindustrial state at the million years timescale. » Indeed. This is actually why the anthropogenic CO2 decreases through a small imbalance between silicate weathering and carbonate preservation. The conceptual model assumes that there is NO natural dynamics in the carbon cycle besides glacial cycles. On Fig.8b the CO2 is just following the imposed decrease in the absence of (northern hemisphere) icesheets. But what about a possible role of Antarctica? What about some internal dynamics? And even on the calibration period (Fig.4b) the CO2 results are quite different from the data. In other words, the added value of a dynamic CO2 component in this conceptual model is not obvious. 5 -It seems to me that the 3 variables (v, CO2 and T) are almost identical (up to scaling) in the natural and in the no-anthropogenic cases. Are these 3 variables necessary at all to express the dynamics of the system? I believe only one variable could have produced almost the same results.

« the last 800 kyr (see below). This period was selected because it is dominated by the long glacial cycles which are expected to continue in the future »
This is not the case with anthropogenic forcing… I would prefer the authors to acknowledge that this is the only period where we know both the ice-sheet and CO2 evolutions. Using another time period (much warmer) would be preferable for the next million-years.
Line 99 : « This approach, obviously, is not applicable for a possible future Antarctic and Greenland melting under high CO2 concentrations. This is why we do not consider future sea level rise above the preindustrial level and it is required that v≥0 at any time» This appears a strong limitation of the study and it should be acknowledged as such in the abstract and in the conclusion. A discussion on how to lift this problem would also be appreciated.

« Namely, we assume that on the relevant timescales (103-105 yrs), the natural component of CO2 concentration is in equilibrium with external conditions and can be expressed through a linear combination of global temperature and global ice volume »
This is probably why the "natural CO2" results are not so good (Fig.1b & 4b)…? They are mostly simple "mirrors" to the ice-volume and temperature ones. « This is why we prescribe that before the MBT the minimum ice volume must be 0.05 in normalized units: » This seems a very ad-hoc assumption: I do not understand the reason for this adjustment, beyond providing artificially a better correlation.

Line 258 :
« the new glacial inception will not be met in the near future even in the absence of anthropogenic influence on climate. » Again, this seems a very ad-hoc constraint: the physical explanation is to be found in the insolation forcing, and the tuned models should provide this mostly as a result, not as an a priori constraint.

« corr(x,y) denotes the linear Person correlation »
The correlation is not always the best metric, though it is simple to compute… Why only using the correlation with ice-volume data and not the two other paleoclimatic data?

« For the selection of solutions, no conditions are imposed on the goodness of fit »
Well, it seems to me that Equation (16) is a condition on the goodness of fit! This also contradicts line 265:

« We wish to find P to maximize the optimization target function Cv »
Probably the authors should clarify their language: they are only choosing parameters that satisfy all the constraints (including (16)): this is a feasibility problem, not an optimisation problem (though it is usually provided in optimisation packages).
Why choosing correlation > 0.7 (or why selecting 353 parameter sets)? Is there a need to have a large enough parameter set with a large enough dispersion? Or does this relates to the parameter K problem (see above comment) ?
Line 293 : « For global mean surface temperature anomalies (with respect to preindustrial conditions) we use two reconstructions » Some discussion on the nature and on the accuracy of these proxies could be useful. In particular why using "ice-volume" as a preferred target? Overall, the temperature does not have any dynamic role in the model (it can be replaced by v and CO2). So why using it?
Line 315 : « some solutions display an amplitude range significantly larger than the observed one, reaching the imposed lower limit of 150 ppm » Actually, not "some" solutions, but "most" or even "all" solutions.
Line 314-317 : Correlations of 0.5 or 0.56 appear not very good to me. Why optimizing only the correlation with ice volume? Line 327 : « In general, we conclude that the model has a satisfactory ability also when used in predictive mode and, thus, we confidently venture to utilize it as a tool for the forecast of the next 1 Myr climatic evolution. » This is overly optimistic : the climate system is very different in the anthropogenic case. The word "forecasting" is fully inappropriate: the system is obviously non-stationary and a "statistical forecast" has here no meaning at all. At best, you can call this a possible scenario. « Most of the solutions agree that the planet will remain in a long interglacial state for the next 50 kyr » Not « most » but « all » since it was built into the assumptions (something questionable, see above). I do not understand this statement: the contrary would be problematic.
Line 532 : « the past does not perfectly constraint the future evolution of the climate -ice sheets -Carbon cycle system. » In particular using only the last 800 ka Quaternary period. The main question is the choice of the time window used in the past.
Line 534 : « The selected model versions exhibit a large sensitivity to fossil-fuel CO2 releases » How does this relate to the K parameter choice (based on Climber results)? It seems to me that the "Valid" set is even more sensitive. This should certainly be discussed in much more details since it represents a large part of the manuscript.

Conclusions:
« this relationship is poorly constrained by the paleoclimatic data because during previous interglacials CO2 was close to or lower than the preindustrial level. » « Reducing this uncertainty by performing experiments similar to those described in Ganopolski et al. (2016) but with more advanced Earth system models can help to reduce uncertainties in future projections. » As explained above, I do not like this conclusion. The Ganopolski et al. (2016) threshold was based on the same data, so the difficulty is not so much within the data, but much more with the model. Besides, enlarging the scope outside the Quaternary would certainly help a lot. The authors should better highlight the key difficulties (my main comments 1 & 2).