|The authors have gone to great length to take into account many comments of the four reviewers and have in that process substantially improved the manuscript. However, despite this great work, I feel that some of my major points have not been sufficiently addressed and strong disagreements remain. Most importantly, it reads to me still more like an opinion piece, a very interesting opinion piece indeed, but with very little evidence to support the claims made in the manuscript.|
As already said before, it is very well written and a pleasure to read. However, I would have liked if the proposition by reviewer 2 to look at some ECS in much more detail would have been taken into consideration. Now it still remains quite speculative.
1) I feel that the authors judgement of the three kinds of emergent constraints is strongly biased in favor of type 1. The constraint of type 1 seems to be given more confidence than the ones of type 2 and 3. I find little to no justification why this should be the case. For example, the authors mention that the EC that relates past warming to TCR is likely robust. So far, I do not see why this should necessarily be the case. What if a tipping point occurs: freshening of the Southern Ocean shuts down deep convection and no warm subsurface waters comes to the surface and heat uptake would be altered? The very different sea ice extent in the CMIP5 and CMIP6 models could move this moment to the early 21st century or to the end of the 21st century or even into the 22nd century. As such, it would also have strong consequences for the TCR and historic warming. historic warming due to a change in albedo and cannot act as a factor later on? An example is the NorESM model that has a huge Southern Ocean Sea ice mass, much larger than the other models but a rather normal extent, suggesting that a very thick part of sea ice exist. Whereas the first sea ice (thin) disappears quickly, the thick part takes centuries to disappear. Or what if Arctic Sea Ice was melted early in the model and resulted in a strong albedo change but in others this comes later, maybe even after the 70 years of the 1% run that is used to quantify TCR? Without a mechanistic explanation, historic vs future trend relationship could potentially be pure ‘luck’. I would argue that a type 1 constraint is, without a mechanistic explanation of the underlying processes, much less robust than an EC that identifies the driving process. What intrigues me in this example (Tokarska et al., 2020) even more is that the slope of the EC changes from CMIP5 to CMIP6 by around 100%. So, an overestimation of the historic warming by 0.1°C would have a twice as large effect on TCR. This suggest that other processes are in place and merits more discussion and that two relationships are found in both model ensembles but apparently not the same. This seems to remain undiscussed in that paper and in this review, which assesses other types of constraints much more critical. Having said all this, I find the claim that type 1 constraints are more robust unfounded.
2) On the other hand, type 2 constraints are more criticized although they identified the leading process. It states, “A plausible, robust, process-based EC is still conditional on the plausibility of the relevant process as it is represented in the class of models used in the ensemble.” In many cases, these processes are demonstrated in observational studies. For example, Terhaar et al. (2020) have found their EC after observational studies indicated that deep water formation in the Barents Sea is responsible for most of the anthropogenic carbon inventory change in the Arctic Ocean. From my perspective, this is more robust: Identifying with observations the dominant process for a projection, see how this process is represented in models and see if the observational-based hypothesis holds in models. If this is the case, the EC should be considered very robust.
3) This leads me to my main criticism, which was in my opinion not sufficiently addressed in the responses. The authors ‘unload’ model shortcomings almost entirely on the ECs and argue that model projections do not have uncertainties. However, the IPCC report and multiple studies use the standard deviation across a model ensemble as uncertainty and the mean of the entire ensemble (or a subsample after excluding physically wrong models) as the best estimate. I hence strongly disagree with the response that MME do not make a statement about uncertainties. Like all scientific studies, the method of EC is not perfect and never claims. It, however, can help to analyze a model ensemble and to learn about its strengths and weaknesses. I will try to make my point clear with the simple example in the paper. First, the authors show that not assuming the deep ocean can lead to a wrong constraint on T280 warming by using the T70 warming. This is indeed the case; however, it is not an issue with the emergent constraint. The problem lies in the models that do not consider a deep ocean. If our knowledge does not include the deep ocean, we would expect the warming as simulated by model 1. The difference lies thus only in the difference of lambda, which itself would depend on how the feedback mechanisms are calculated. Within that model ensemble with very different lambdas, knowledge of the ‘real’ lambda or T70 would indeed improve the projection of that model ensemble. The EC would give the most likely projection assuming no deep ocean exist and hence improve the projection of such a model ensemble. This is, however, not reality, but that is due to the models and not the way these models are analyzed. The example in this manuscript is hence rather an example why model shortcomings are a problem and not emergent constraints. I think a fundamental misunderstanding between the authors and me is the way we interpret emergent constraints. To me emergent constraints help to analyze existing model outputs. To that extent they cannot erase strong shortcomings like missing processes (in most cases). They can however reduce uncertainties in the model ensemble because of how the existing knowledge is numerically represented (see lambda example). They hence reduce the uncertainties in existing projections and can account for an identified bias, but they may miss biases if all models do.
4) In the Conclusions, the authors argue that EC is effectively model weighting. I disagree. No model is weighted when using emergent constraints. An important mechanism is identified for a projection and that mechanism relates the predictor and predictand in the same way across all models, they are equally weighted.
5) In general, I have the feeling, that the authors and me agree but that that the assessment of EC depends on the variable that is constrained. A local advection driven process, such as Cant uptake in the Southern Ocean, can by observations and models be linked to the formation of mode and intermediate waters. ECS or TCR are, however, depended on many different variables and unlikely to be constrained by one single process. I think this difference should be emphasized more strongly, especially given the Conclusions about multi-variable metrics. Overall, I feel that the ECs that constrain a local process are being taken prisoners by the often-spurious ECS constraints.
6) Could you add prediction intervals and r2 values on figure 1 please. That would help a lot.