Detecting transitions and quantifying differences in two SST datasets using spatial permutation entropy

Gancio, Juan; Tirabassi, Giulio; Masoller, Cristina; Barreiro, Marcelo

doi:10.5194/esd-17-533-2026

Articles | Volume 17, issue 3

https://doi.org/10.5194/esd-17-533-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/esd-17-533-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 3

Research article

|

12 May 2026

Research article |

| 12 May 2026

Detecting transitions and quantifying differences in two SST datasets using spatial permutation entropy

Juan Gancio, Giulio Tirabassi, Cristina Masoller, and Marcelo Barreiro

Download

Final revised paper (published on 12 May 2026)
Preprint (discussion started on 15 Oct 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4879', Anonymous Referee #1, 26 Nov 2025

Find comments in the attached PDF.

Citation: https://doi.org/10.5194/egusphere-2025-4879-RC1
- AC1: 'Reply on RC1', Juan Gancio, 16 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4879/egusphere-2025-4879-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-4879-AC1
RC2:
'Comment on egusphere-2025-4879', Anonymous Referee #2, 27 Nov 2025

This work presents the results of the computation of the Spatial Permutation Entropy (SPE) on two different sea-surface temperature datasets (ERA5 and NOAA OI v2), for two different regions (Nino3.4 and Gulf Stream regions). This tool has not yet been applied on climate data in this setting and it allows to detect some temporal transitions in the datasets. There are mainly two parts to the results: the first part exposes how to detect temporal transitions in the datasets from temporal transitions in the SPEs (Sec. 4.1) and the second part compares the two datasets on the basis of their respective SPEs (Sec. 4.2).
Overall, it seems also that this tool can provide some interesting insights about the spatiotemporal structure of the datasets, but precise conclusions and concrete results about the efficiency of the method are lacking or, at least, not clearly exposed. I would not recommend publication without major revisions because of this.
Major comments:
The authors claim that the proposed method (SPE + PELT) allows to detect temporal transitions in datasets. There is an effort to estimate the robustness of the detection of change points but a general assessment of the success rate is missing: if the goal is to provide a method to detect transitions in a given dataset, there should be an estimation of the number of transitions that the method will indeed detect. There is no estimation of the number of changes that the method did not detect (one transition is actually detected from the time series of SMI_{NS}, but not from H_{NS}, lines 216-217), even though the detected changes are all linked to some changes in the methodology to produce the datasets.
The two datasets considered in this work are also compared, but I do not really understand what is the conclusion of this comparison. The detected change points (except one) are those detected from H’s time series, so is there any additional conclusion with respect to Sec. 4.1? It is mentioned that there are small-scales differences between the two datasets (which can be expected, to some extent), can we conclude that the datasets are not reliable at these scales? If yes, how to identify a scale above which the datasets agree sufficiently ? I also wonder how does this technique compare with a Fourier or spectral analysis of the datasets.
Minor comments:
1. A smaller SPE is systematically interpreted as a more pronounced gradient because ordered patterns like ‘0123’ would be more frequent. This is probably correct in most cases, but is it always true ? The fact that non-ordered patterns like ‘2031’ become more frequent is also consistent with a smaller SPE, in contradiction with the more pronounced gradient interpretation. Did previous studies on SPE establish that we can confidently interpret a decrease in the SPE as an increase of the gradient ?
2. I feel like some context about the physics of the SST in the studied regions is missing to understand some of the interpretations. For example, why asymmetries in the increase of the SST lead on one hand to a decrease of H_{WE} in the El Niño region, and to an increase of H_{WE} in the Gulf stream region on the other hand ? Why the behavior of H_{WE} is expected to be different for delta = 1 and delta = 8 for El Niño (lines 194-200) ? Why H_{NS} with delta = 8 cannot capture the NS gradients appropriately (lines 198-200) ?
3. There is no uncertainty quantification on the SPE, so that we do not know if some observed trends are really relevant. For example, are the trends in H_{WE} in Fig. 4a and 4d really significant ? In addition, these trends are qualitatively consistent with the mentioned asymmetries in the increase of the SST, but are these asymmetries really the cause of these trends ? (lines 177-184). It would be interesting to analyze which patterns become prevalent to give some sound basis for these explanations.
4. The PELT algorithm contains a penalty parameter P which must be chosen and a method is proposed to test with respect to P the robustness of the detection of change points. I find it difficult to understand precisely the different steps of the method, and therefore to assess its effectiveness. For example, how is used the 99.5^th percentile of P* (line 344) ? How precisely are the surrogates used in this method ? Change points are considered robust if their R is larger than the median value of R (lines 350-351), does that mean that half of the points change points are robust ?
5. It would increase the readability if all details about the results presented (which value of delta is used, which region is considered) were indicated on the Figures themselves (in particular for Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig.7 and Fig. 8), next to each subplot. Ideally, this would appear even if there is only value of delta or one region considered in the Figure. This information is available in the captions, but the comparison of Figures would be easier if they were more visually explicit.
Technical comments
Explicit formulas for H_{WE} and H_{NS} would be appreciable, since these are the main quantities in the paper.
Line 101: is the j index of p_j(t) the same index as the j index of X_{ij} ?
Lines 145-157: it would increase the readability if this paragraph on the details of PELT and its penalty parameter is moved to the appendix (so that everything about PELT is in the appendix)
Lines 185-186: which dataset is used here to compute the SST anomaly ? Could this influence the results ?
Line 191: I do not understand what ‘reflecting the north-south gradients that occur as the equatorial zone is warmer than the north-south edges of the region’ means. How can there be a uniform pattern across the region if the central part is warmer than the edges ?
Line 199: why \delta = 8 is equivalent to each word spanning 6° ? I would say that each word spans 2°, in agreement with what is written in line 205.
Lines 185-200: has this analysis been done for the Gulf stream region ? Are the conclusions the same ?
Line 205: write → right
Lines 216-222: El Niño is written differently 3 times
Line 221: should Fig. 7e be Fig. 7i ?
Line 223: should ‘relative constant’ be ‘relatively constant’ ?
Line 238: there seems to be a verb missing in the sentence ‘Increased cloud coverage in this region during winters could difficult infrared measurement’
Line 243: the acronym CPD should be explained before the appendix
Lines 245-285 (Sec. 4.3): the discussion in this Section seems to be partly redundant with Appendix A, and difficult to understand without reading first this Appendix. Could it be moved to the Appendix ?
Line 291: what models are referred to ?

Citation: https://doi.org/10.5194/egusphere-2025-4879-RC2
- AC2: 'Reply on RC2', Juan Gancio, 16 Dec 2025
  
  Find replies to comments in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4879-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (18 Dec 2025) by Gabriele Messori

AR by Juan Gancio on behalf of the Authors (20 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (22 Jan 2026) by Gabriele Messori

RR by Anonymous Referee #2 (02 Feb 2026)

Suggestions for revision or reasons for rejection

The manuscript exposes the results of the computation of the Spatial Permutation Entropy (SPE) on two SST datasets (ERA5 and NOAA OI v2) over two regions (El Nino and Gulf Stream). The SPE is computed from “spatial patterns”, which encode how neighbouring pixels values compare against each other, and the SPE is defined as an entropy over the distribution of these patterns. Spatial Mutual Information (SMI) on the distribution of patterns are also considered. I thank the authors for the precisions added with respect to the previous versions of the manuscript, which made some explanations clearer. I think in particular that the videos showing the evolution of the histogram of the patterns are useful to interpret the variations of the SPE.

There are three ways in which the SPE can be used:
1. It can be used to detect transitions in a dataset, either through change points in H_{NS} and H_{WE} (Sec. 4.1), or through SMI_{NS} and SMI_{WE} (Sec. 4.2, 2 datasets required)
2. Variations of the SPE can be interpreted in terms of change of relative importance of spatial patterns, which the authors interpret as increase or decrease of the gradient patterns (Sec. 4.1)
3. Characterize the similarity of two datasets (how much the distribution of patterns is the same for the two datasets, Sec. 4.2)

This underlines the potential versatility of the tool, as emphasized by the authors in the manuscript. However, I think that concrete and pragmatic results (quantitative and qualitative) are lacking for the SPE to be used by other researchers in their work on different datasets.

Main comments:

About use 1. of the SPE: it is shown in Sec. 4.2 that the transitions detected by PELT on SPE time series are not all detected by PELT on SMI_{hist}, and none are detected in the time series of the Pearson’s spatial cross-correlation coefficient (r) and of the Average Absolute Difference (AAD). I think that this is the major clue of the manuscript in favour of the SPE for detecting transitions. The contrast of SMI_{hist}, the AAD and r with the SPE is otherwise quite poor, so that it is not clear whether these tools could be complementary to the SPE (see below). The comparison of the SPE with the AAD and r is done when using the two datasets to detect transitions, and no equivalent comparison is done in the much more common case where only one dataset is available for the variables of interest.

I also wonder whether the AAD and r are the best choices to provide a point of comparison. It seems from lines 338-339 that techniques to detect transitions already exist in the geophysical literature, why did the authors not use the methods of these papers to provide points of comparison ?

Accordingly to the remarks added by the authors in this new version of the manuscript, there is no guarantee that the SPE detects all change points (there are actually some change points which were detected in some configurations of the SPE but not in others) so that the authors propose to use this tool in conjunction with other data analysis techniques. Unfortunately, the authors do not suggest which other tools could be used in complement to the SPE. It is left to the user to further understand the weak points of the SPE to identify which tools should be used in addition to the SPE. A way of addressing this issue would be provide a quantification of the transitions detected. Such tests can be done by generating synthetic datasets with known transitions. I think that, yet another possibility would be to count the fraction of transitions detected by the SPE in the ERA5 and in the NOAA datasets, after having listed all the changes in the methodology to produce these datasets (it seems reasonable to me that an almost exhaustive listing is doable since I expect datasets like ERA5 to be well-documented). With synthetic datasets, one could also investigate which type of transitions is detected by the method and which are not.

About use 2. of the SPE: the authors interpret variations of the SPE as increases or decreases of gradients in the SST. I think that this is an interesting potential use, but I think that the caveat for such a use should be refined. Indeed, as explained at lines 221-230, some low values of the SPE with delta = 8 in the NS direction of the El Nino region are not explained by gradients of the SST. It can be checked explicitly on the videos showing the histograms of patterns for ERA5 on this region that there are a lot of histograms which are not dominated by the patterns 0123 and 3210 (for example on 1998-06, 2002-11, 2010-03, 2010-05, 2015-06, 2015-10, 2025-01). I think that this is a nice example that a low SPE does not automatically mean that the 0123 and 3210 patterns dominate and that using the SPE in such a way would require more investigation to be able to confidently draw conclusions on datasets for which we do not whether there are gradients or not.

About use 3.: the authors finally use the SPE to compare the two datasets. Again, I find the conclusions somewhat vague. There are two conclusions: a) the datasets become more similar with time, b) the datasets are more similar at large scales. I think that conclusion a) is to be expected, given the “significant advances in Earth observation systems due to the introduction of new satellite observations and new data processing methodologies” (lines 293-294). Stated in this way, conclusion b) should also be expected. It would interesting to be able define a scale (possibly depending on the user’s needs) where the datasets agree sufficiently (this scale would probably depend on time, since the datasets are more similar with time). I think that Fig. C10 provides an interesting point to start this analysis, and I think that developing this in the main text would support this way of using the SPE. Characterizing qualitatively the differences in the datasets could also be interesting, so that users could choose one or the other depending on the processes they study.

Minor comments:

I do not understand the explanation about the sudden drops of H_{WE} for delta = 8 in the lines 217-221: why uneven cooling/warming explain sudden drops ? What does exactly mean “at the smaller scale the variations of SST are more uniform” (lines 220-221) ? Is there a reference for that ? Or is does it just mean that the SST values are very noisy at these scales ?

What can we conclude about the fact that there are correlations between the SST anomaly and the SPE for the El Nino region (lines 207-216), especially given that no correlation is found in the Gulf Stream region (lines 231-233) ?

What can we conclude about the fact that a transition in 2016 is detected in SMI_{NS} with delta = 1 in the El Nino region (lines 251-252) ? Is this related to some change in the datasets ? Why was it not detected from Fig. 3 ?

Are the transitions reported in lines 259-264 already found in Sec. 4.1 ? If no, does that mean that it is better to have two datasets to compare to detect transitions in one of them, rather than computing the SPE on one dataset ?

I find Appendix A confusing:
1. If I understand correctly, the CPD algorithm used for the SMI is described in lines 352-361, while the one used for the SPE, the AAD and r is described at lines 349-350 and 362-376. Is that correct ?2. Are surrogates created for the SMI ? I would think so from lines 341-347, but not from lines 352-361.
3. I do not really understand the point of the steps described in lines 368-376: from what I understand, the previous step allows to identify a value of P for which no false change points are detected, so why add another step ?
4. Why make a difference between the points mentioned in the main text and those reported in Table A1 ? If change points are considered robust (and therefore reported in Table A1), why not consider them in the text ?
5. The first two quartiles of P are the same than those of R (Eq. (A1)), so the lines 368-376 seem to simply describe that half of the change points are considered robust, is that correct ? Since change points are supposed to correspond to something real, it is quite arbitrary to consider that half of them.

Are there other methods to choose a suitable penalty parameter than the one described in Appendix A ? The original paper about PELT seems to expose some of them and Rocha and de Souza Filho (2020) (cited at line 339) seems to discuss methods to choose penalty functions. Why did the authors not use these functions ? To have a better idea of the performance of the SPE to detect transitions, I think that it would be better not to use new methods to choose P.

Technical comments:

Appendix B seems to have redundant explanations with Appendix A, please merge the two.

I find the notation to report coefficients and p-values in the caption of Fig. 4 not very clear.

Fig. C10: is there a reason why the results displayed were computed with L = 3 instead of L = 4 as in the rest of the paper ?

Line 100: k is not an integer, so k \in [1, …, L!] is not really correct
Line 154: “we have also performed the analysis…” (“the” is missing)
Line 324: “Both the size…” (no comma after “both”)
Line 416-417: the sentence “The first one corresponding to a change point with linear trend before and after, which survives detrending, and the second one to a trend/no-trend transition” misses verbs.
Lines 189-198: The p-value of the coefficient from the fit in panel (a) of Fig. 4 implies that the coefficient can be considered to be 0. But this paragraph is a little misleading about that (it seems to say that all entropies follow the same kind of trend).
Line 385: “panel a” → “panel (a)”, same for “panel d”
Lines 221-230: it would maybe be easier to understand if it is said explicitly that the patterns with delta = 8 span more than half of the length in the NS region for the selected region.
Lines 254-255: “at long scales, warming signals are consistently identified in both, ERA5 and NOAA”: how exactly are identified the warming signals in Fig. 6 ? Is this related to what is discussed about Fig. 4 ?
Lines 264-265: it should be said the transition here is in addition to the 2007 one reported just above.
Line 305: what is meant by “that are consistent with the two datasets”?
Line 350: “ADD” → “AAD”
Line 355: “non” → “none”
Line 362: “ADD” → “AAD”

Hide

RR by Anonymous Referee #1 (03 Feb 2026)

RR by Anonymous Referee #3 (22 Feb 2026)

ED: Reconsider after major revisions (23 Feb 2026) by Gabriele Messori

AR by Juan Gancio on behalf of the Authors (17 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (31 Mar 2026) by Gabriele Messori

RR by Anonymous Referee #3 (14 Apr 2026)

ED: Publish subject to minor revisions (review by editor) (19 Apr 2026) by Gabriele Messori

AR by Juan Gancio on behalf of the Authors (23 Apr 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (28 Apr 2026) by Gabriele Messori

AR by Juan Gancio on behalf of the Authors (29 Apr 2026) Manuscript

Short summary

In this work, we apply a novel quantifier, the spatial permutation entropy, to sea surface temperatures obtained from two commonly used products: ERA5 and NOAA OI v2 (NOAA Optimal Interpolation version 2). We report small scale differences between these products, as well as persistent trends at the large scale, which could be a consequence of global warming. We also report sudden changes that were not uncovered before, which correlate with different changes in the methodology or data sources of the products.