the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A causality-based method for multi-model comparison: Application to relationships between atmospheric and marine biogeochemical variables
Abstract. We introduce an novel approach to compare Earth System Model output using a causality-based approach. The method is based on the PCMCI+ algorithm, which identifies causal relationships between multiple variables. We aim to investigate the causal relationships between atmospheric (North Atlantic Oscillation – NAO), oceanic (gyre strength, stratification, circulation), and biogeochemical variables (nitrate, iron, silicate, net primary production) in the North Atlantic subpolar gyre, a critical region for the global climate system with a well characterised multi-year variability in physical and biogeochemical properties in response to the North Atlantic Oscillation. We test a specific multivariate conceptual scheme, involving causal links between these variables. Applying the PCMCI+ method allows us to differentiate between the influence of vertical mixing and horizontal advection on nutrient concentrations, spring bloom intensity, as well as to highlight model-specific dynamics. The analysis of the causal links suggests a dominant contribution of vertical mixing to peak spring bloom intensity compared to transport. The strength of the links is variable among models. Stratification is identified as an important factor controlling spring bloom NPP in some, but not all, models. Horizontal transport also significantly influences biogeochemistry. However, horizontal transport generally exhibits lower contributions than vertical mixing. Most of the links found are model-specific, hence likely contributing to inter-model spread. The limitations of the method are discussed and directions for future research are suggested.
- Preprint
(2816 KB) - Metadata XML
-
Supplement
(6348 KB) - BibTeX
- EndNote
Status: open (until 27 Nov 2024)
-
RC1: 'Comment on esd-2024-31', Anonymous Referee #1, 05 Nov 2024
reply
The manuscript by Benard and colleagues describes an approach to identify causal relationships across a subset of state or diagnostic variables in Earth system models. The method is based on the PCMCI+ algorithm, which quantifies how previous values of one variable (A) enhances the prediction of the others (B), while considering different time-lags. It differs from a typical correlation analysis in that this method includes intermediate ‘explanatory variable’ (C), which provides additional information (e.g., mechanistic understanding) linking variables A and B. As examples, the authors applied the method to analyse the relationships between surface nutrient concentrations and primary production in the eastern North Atlantic subpolar gyre as simulated in five ESMs. Using their preindustrial control simulations, they show that the simulated interannual spring bloom (primary production) intensity is predominantly modulated by the winter vertical mixing-induced nutrient variability. The role in lateral transport is less important.
Major comments
Not clear who will most benefit from using the PCMCI+. The method appears to be quite useful, especially for people with limited understanding of the non-linear and complex interactions between different ESM components. On the other hand, based on the example, the users ought to have a priori understanding of which variables are to be included in the analysis (performance goes down with increasing number of variables). If one fails to include key variables, the outcome could have been very different or misleading. I would like to see more information on how to select which variables to analyse and if there is a way to validate the results. For instance, can you show that mixing is actually driving the primary production variability? Do we need to run additional sensitivity experiments with the models to confirm the results?
Applying the method on different ESMs across the preindustrial simulations is useful to characterise the ESMs, but this comes out short of my expectations. It indicates that the models are different, which is not a surprise. When more models are involved, how can we synthesise the findings? A clustering approach could be an idea (Couespel et al., 2024).
Model ‘evaluations’ were mentioned several times and appears to be doable, as also stated in the conclusions. So why not do this? I suggest the authors to consider applying the method on observations/reanalysis and ESMs historical simulations and thereby actually demonstrate its usefulness and added values beyond conventional model-data evaluation.
There are some inconsistencies of messages in the manuscript. For instance, statement such as in L169: NAO affects MLD and again in L289: “MLD partially driven by NAO” but the models indicate insignificant relationship (Fig. S7). Does this mean the models are wrong?
As is, the manuscript presents a proof of concept for understanding model behaviour and I wonder if it is more appropriate for journals such as the Geoscientific Model Development. For ESD, I would expect new understanding of Earth system processes, i.e., beyond that MLD affect PP intensity. I would recommend expanding the analysis to include future scenarios, which will considerably enhance the manuscript value for the ESMs community. As the authors hinted (Sect. 6), this additional analysis could reveal new understanding on how the non-linear mechanism of primary production will evolve in the future and therefore potentially explains the spread in the projections, which is a key outstanding research question today.
There are numerous ‘hand-waving’ statements, without any clear demonstrations, that should be avoided such as: L407-408, L409-410, L447-448, L462-463.
Minor comments
L1: … a novel causality-based approach to compare Earth System model outputs.
Repetition: L9-10 and L11-12
L12-13: “Most of the links … contributing to inter-model spread.” This is not specifically shown. Unless you have evidence, remove it.
L25: “leading to projections with still scattered outcomes” is redundant, as you already mention differences in future climate states and dynamics.
Paragraph L27-36: This is not the full picture. Increasing studies are now applying emergent constraint to link biogeochemical variables in causal relationships to understand the reasons behind projection spread, such as Fu et al. (2016), Kwiatkowski et al. (2017), Goris et al. (2023), and many others.
L42-45: previous studies on should be cited here, e.g., Thomas et al. (2008), Tjiputra et al. (2012), Keller et al. (2012)
L57: … Pelegri et al. (1996) and Williams et al. (2011) highlight ….
L62-64: what graphs? These would also fit better in Sect. 2. Perhaps the authors can also add a figure illustrating this so-called graph.
L66: ‘model spread’ is mentioned again here, but it is never shown and analysed in the paper.
L67: remove “The next”
L79-85: The graph is mentioned again, but I find it difficult to visualise this.
L92: … irrelevance of event B to explain event A …
L103-105: could you clarify what happens if variables Y and Z are not independent, e.g., surface wind speed and MLD?
L111-115: this is not very clear to me. An example with actual model variables would be useful.
L154: remove first period and correct references format.
Fig.1: increase axes label sizes. Increase fonts size of NAC/IC/EGC
L164: remove first period
L1784: the surface nutrient stock
Fig. 2: why there’s an arrow from NAO to nutrients? The only arrow to primary productivity should be from nutrient, as PP in models is usually driven by phytoplankton growth rate (temperature, light) and nutrient availability.
L189: extended
L202: Brody et al. (2013)
Table 2: what is T in the slp formulation?
L227, L235: Switch Figure S1 and S2 in supplementary. Use model names instead of modelling centres in figs. S1 and S2.
L245: via horizontal advection
L247: beginning
L252: which graph?
L266: …most and least similar links displayed in Fig. 4. The results …
L267: Figs. S3-S6.
L268: … not necessarily true … (?)
L269: IPSL-CM6A-LR
L271: Figs. S3-S6
Fig. 4: increase fontsize and color readability
Fig. 4 caption: …. introduced in Eq. 4.
L288: Patara et al. (2011)
L308-309: not true for IPSL-CM6A-LR (nitrate)
L324-326: this statement is true only if the relationship holds into the future, i.e. there is no emerging new limiting factor for PP. I really recommend the authors to consider analyzing the scenario simulations in order to back up such statements.
L343: +0.3 and +0.2
L346: +0.33
L350: -0.24
L358: +0.33
L380: sidered. The results …. the blue curve illustrates …
L391: … lower it. … number of variables
L404: previous study
L408: great if the authors can actually show this
L410: again, this is hand waving, and need to be shown or reformulate the sentence
L440: explore the interannual variability
L445: than horizontal transport … Also elsewhere, it is better to explicitly state ‘horizontal’ when referring to advection or transport.
References
Couespel et al. (2023): doi:10.1038/s43247-024-01257-2
Fu et al. (2016): doi:10.5194/bg-13-5151-2016
Goris et al. (2023): doi:/10.5194/gmd-16-2095-2023
Keller et al. (2012): doi:10.3402/tellusb.v64i0.18738
Kwiatkowski et al. (2017): doi:10.1038/NCLIMATE3265
Thomas et al. (2008): doi:10.1029/2007GB003167
Tjiputra et al. (2012): soi:10.5194/bg-9-907-2012
Citation: https://doi.org/10.5194/esd-2024-31-RC1 -
RC2: 'Comment on esd-2024-31', Anonymous Referee #2, 09 Nov 2024
reply
The authors present an approach to compare outputs from Earth System Models (ESMs) by identifying causal relationships between atmospheric and marine biogeochemical variables. The goal of this approach is to improve the understanding of how model differences affect projections, particularly for marine primary productivity in climate-sensitive regions like the North Atlantic subpolar gyre. They leverage a causality technique known PCMCI+, which discovers causal graphs with strengths and lagged interactions, allowing comparisons across models. The study introduces a custom dissimilarity metric to quantify differences between models' causal graphs, incorporating both the strength and lag of interactions.
Overall I found the work interesting. However, there are many segments lacking precise writing making the work needlessly difficult to follow. I believe the paper could be greatly improved by providing explicit details and addressing a few other key concerns.
Major Comments:
- I found the description of PCMCI+ very confusing and had to turn to source material to learn what exactly it was doing. As written, there are a lot of terms that seemingly pop up out of nowhere (e.g. what is α_Yτ=ti, ϵXτ=0|Zτ=tj,...) that are not defined. I believe the paper could be strengthened if the language in this subsection was tightened, and provided more explicit details. If the authors think the details are too distracting, moving them to the appendix would also be acceptable.
- I personally like the idea of the dissimilarity measure between causal structures. However, the current stated distance feels a bit ad hoc, and could use some further explanation.
- As I understand, the authors use a convolution across time lags to try and dull the otherwise potentially sharp dissimilarity that might arise between models due to small time lag differences. Is this an accurate summary of why convolution is being applied? If so, further elaboration for why this is actually an ‘issue’?
- How is padding for the convolution being applied? It looks like the authors are applying full padding based on their example. They should explicitly state this, and perhaps show the full padding in their example. If they are not doing full padding, please explain what you are doing for your convolution at the edge.
- What was the justification for width 3 Gaussian? Did that just seem to cover the expected minor differences in time lags between models? How should one choose the width of the kernel?
- My primary concern with the dissimilarity measure however has to do with its lack of resilience to the number of variables. By definition, the number of entries in the tensor M[i,j,t] depends on the partial correlation between all the processes and the number of time lags. This can be extremely large, or moderately small. If I add more processes to consider, the size of M can grow rather quickly (in a worst case scenario, if you have L variables, I believe PCMCI+ can return N=L(L-1)^2 connections, so M has L^2(L-1)^4T entries). If you want to measure the distance between M and M', then the Euclidean distance has the tendency to crush "near" and "far" points in high dimensions. So the metric of model comparison is heavily dependent on the number of variables involved in the problem. In other words, two models might look very distant if you only choose a small number of variables, but actually look quite close when you choose a larger number of variables. How do the authors control for this to ensure that this is a consistent and reliable measure of similarity between models?
- Line 191: “Interactions between variables should not evolve during the 500 years of simulation.” I don't believe this to be true. The causal interaction should have bursts of strength due to the nonlinearity inherent in the system. For example, if you took the wavelet power spectrum of any of your time series, I would strongly bet that strength of the signal at different frequencies is heavily time dependent. The choice of starting date is known to be important for causal interactions, even within preindustrial control (see Brandstator, Teng “Is AMOC More Predictable than North Atlantic heat Content”)
- Line 194: How are the variables normalized? Does each variable receive the same, say, standard normalization?
- Section 4 needs some clarification, particularly with regards to the PCMCI+ algorithm and the metric chosen. In Section 4.1, the authors write “Focusing on sub-matrices provides insight into differences between models with respect to specific causal relationships. For instance, computing the dissimilarity on the sub-matrix corresponding to the drivers of one variable allows to explore specific dynamics shared by two models (similar impacts, similar lags).” Are you doing the following:
- For a single variable, you get a causal structure for each ESM (which can look quite different).
- You can then get a measure of distance between the causal graphs. This measure provides the "average" difference between the causal graph structures.
If that is the case, how are you getting the causal graph structure from a single variable? Doesn’t PCMCI+ give the structure of the whole system? Or, are you computing some sub-part of the network using a PCMCI+ type algorithm? For your dfe example, does this mean that you have looked at parents starting at dfe for all models. This results in a causal graph for each ESM (with dfe as base node), and then you computed distances between these causal graphs? This requires further clarification.
- Figure 4: This information feels too compressed. How am I to evaluate the difference between "close" and "far" here? How much further is the far model from the close model? Can I compare the results within a figure? How does MLD_no3 compare to Trsp_dfe? What is the point of the outermost circle, since you have a color bar? What does “no model assigned” mean? What do the _si, _no3, _dfe mean on the variables gyre, transport, MLD, ect? Does MLD_no3 mean the effect of MLD on no3 (or other way around)? Overall, I think this point is rather confusing. You pick ISPL model. Then for each variable, you build the causal graph (somehow, again not clear from my previous comment). You then measure distances between other models. The closest model is then determined based off your metric. Is that what is happening?
- Figure 5: How is model agreement measured? Is “model agreement” the same as model strength? Is that the same as the metric you proposed earlier? There is a “significance” thrown around. What is significance and how is that determined?
- Figure 7: Am I to read this as NAO has a larger impact on stratification than Gyre does on transport? Are these boxes comparable? If so, is this just a consequence of the time lag you chose?
- Line 387: This needs clarification. Do you mean the ground truth is the values of $\alpha_{i,j}$? Or, the non-null values of $\alpha_{i,j}$?
Minor Comments:
- Line 183 typo: wall->well
- Table 1: A very brief appendix which discusses the primary differences between the Ocean, atmosphere and BGC models would be nice.
- Line 255: Might be helpful to have a plot here of a causal graph structure.
- Line 382: Typo: “.A”
- Line 390: Typo: “..”
- “Thus, this experiment clearly illustrates that it is important to give a relatively low number of variables to the algorithm.” I suppose this comment is sort of in alignment with my previous on choice for number of variables.
Citation: https://doi.org/10.5194/esd-2024-31-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
131 | 42 | 7 | 180 | 23 | 4 | 5 |
- HTML: 131
- PDF: 42
- XML: 7
- Total: 180
- Supplement: 23
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1