the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Classification of synoptic circulation patterns with a two-stage clustering algorithm using the structural similarity index metric (SSIM)
Abstract. We develop a new classification method for synoptic circulation patterns with the aim to extend the evaluation routine for climate simulations. This classification is applicable for any region of the globe of any size given the reference data. Its unique novelty is the use of the structural similarity index metric (SSIM) instead of traditional distance metrics for cluster building. This classification method combines two classical clustering algorithms used iteratively, hierarchical agglomerative clustering (HAC) and k-medoids, with the only one pre-set parameter – the threshold on the similarity between two synoptic patterns expressed as the structural similarity index measure SSIM. This threshold is set by the user to imitate the human perception of the similarity between two images (similar structure, luminance and contrast) and the number of final classes is defined automatically.
We apply the SSIM-based classification method on the geopotential height at the pressure-level of 500 hPa from the reanalysis data ERA-Interim 1979–2018 and demonstrate that the built classes are 1) consistent to the changes in the input parameter, 2) well separated, 3) spatially and temporally stable, and 4) physically meaningful.
We use the synoptic circulation classes obtained with the new classification method for evaluating CMIP6 historical climate simulations and an alternative reanalysis (for comparison purposes). The output fields of CMIP6 models (and of the alternative reanalysis) are assigned to the classes and the quality index is computed. We rank the CMIP6 simulations according to this quality index.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2639 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on esd-2022-29', Anonymous Referee #1, 04 Aug 2022
-
AC2: 'Reply on RC1', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC2-supplement.pdf
-
AC2: 'Reply on RC1', Kristina Winderlich, 07 Oct 2022
-
AC1: 'Comment on esd-2022-29', Kristina Winderlich, 17 Aug 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC1-supplement.pdf
-
EC1: 'Reply on AC1', Gabriele Messori, 09 Sep 2022
Dear Authors,
>> We would like to ask the Editor to make the judgement, which options the manuscript may
>> still have.In reply to the above query, I would suggest that you provide replies to all reviewer comments. Based on the comments and your replies I will then inform you of my decision regarding the manuscript.
Best Regards,
Gabriele Messori
Citation: https://doi.org/10.5194/esd-2022-29-EC1
-
EC1: 'Reply on AC1', Gabriele Messori, 09 Sep 2022
-
RC2: 'Comment on esd-2022-29', Anonymous Referee #2, 23 Aug 2022
Review of the manuscript entitled: “Classification of synoptic circulation patterns with a two-stage clustering algorithm using the structural similarity index metric (SSIM)” by Kristina Winderlich, Clementine Dalelane and Andreas Walter
Summary
The authors develop a new classification method for synoptic circulation patterns with the aim to extend the evaluation routine for climate simulations. Its unique novelty is the use of the structural similarity index metric (SSIM) instead of traditional distance metrics for
cluster building. This classification method combines two classical clustering algorithms used iteratively, hierarchical agglomerative clustering (HAC) and k-medoids. The authors apply the classification method to ERA-interim and NCEP1 reanalysis, and CMIP6 models. The authors wish to demonstrate that the built classes are consistent, well separated, spatially and temporally stable, and physically meaningful. Finally, the authors rank the CMIP6 models according to their ability to represent the weather types using different quality indices.Dear authors,
The purpose of using synoptic circulation patterns to evaluate climate models is a welcomed aim, but is not the first time this is done, as it may seem from the text. Indeed, the ability of models to capture the characteristics of synoptic patterns is an important aspect of improving climate model simulations. The SSIM is generally an interesting and seems to be promising approach for the classification of weather regimes. The article is generally well written, however it should be extended to serve as a high quality research article in ESD.
My comments and suggestions to improve the manuscript are as follows:
General comments
- Many classification algorithms attempt to categorize weather types/regimes over the Atlantic-European-Mediterranean region. If the authors suggest a new procedure, they should at least demonstrate why their classification is better than other classification procedures. Indeed, the authors try to explain their choices, but do not demonstrate how their procedure is superior in comparison to other classifications. Perhaps the authors can randomly select days and subjectively see for how many of them the classification does a decent job? Comparing to the original classification you mention in the text would then provide a semi-quantitative way of demonstrating the improvement from one classification to the other.
- Forty-three classes seems a rather large number of weather types and can probably be significantly reduced by some sort of EOF analysis. If not, it should at least be explained why the authors do not use this approach as it is very common. Furthermore, I would like to see some further explanation on how do these synoptic types relate to the four canonical weather regimes.
- The CMIP6 model evaluation section in its current form is rather short and does not provide very useful information for model developers. This section should probably be extended. It would be nice to have some discussion as to why you think some models are better or worse. Additional analysis is of course welcomed, but should probably be balanced with the length of the article.
Specific comments
Abstract
- What do you mean with physically meaningful? There may be different meanings to physical, and you should probably clarify this in the text.
- Line 10: This sentence should be at the very end of the abstract.
- Do you think your classification would be useful for extended-range weather forecasts? If so, mention this and in the abstract and discuss in the conclusions.
Introduction
- Line 43 – 47: From the introduction, it sounds as if you are the first and only group evaluating models based on weather regimes. However, there is an increasing body of knowledge working in this direction. To name a few articles:
References
Dorrington, J., Strommen, K., and Fabiano, F.: Quantifying climate model representation of the wintertime Euro-Atlantic circulation using geopotential-jet regimes, Weather Clim. Dynam., 3, 505–533, https://doi.org/10.5194/wcd-3-505-2022, 2022.
Fabiano, F., Christensen, H.M., Strommen, K. et al. Euro-Atlantic weather Regimes in the PRIMAVERA coupled climate simulations: impact of resolution and mean state biases on model performance. Clim Dyn 54, 5031–5048 (2020). https://doi.org/10.1007/s00382-020-05271-w
Hochman A, Alpert P, Harpaz T, Saaroni H, Messori G. 2019. A new dynamical systems perspective on atmospheric predictability: eastern Mediterranean weather regimes as a case study. Science Advances 5: eaau0936. https://doi.org/10.1126/sciadv.aau0936
- Line 58: Please discuss the number of regimes some more. There are a few articles focusing on this aspect in the literature. Some use two regimes (Wallace and Gutzler, 1981), others use four (Vautard 1990), six (Falkena et al., 2020) or seven (Grams et al., 2017) regimes. This is important as you use an outstanding number of 43.
References
Falkena, S. K., de Wiljes, J., Weisheimer, A., & Shepherd, T. G. (2020). Revisiting the identification of wintertime atmospheric circula-tion regimes in the Euro-Atlantic sector. Quarterly Journal of the Royal Meteorological Society, 146, 2801–2814. https://doi.org/10.1002/qj.3818
Grams, C. M., Beerli, R., Pfenninger, S., Staffell, I., & Wernli, H. (2017). Balancing Europe’s wind-power output through spatial deployment informed by weather regimes. Nature Climate Change, 7, 557–562. https://doi.org/10.1038/nclimate3338
Vautard, R. (1990). Multiple weather regimes over the North Atlantic: Analysis of precursors and successors. Monthly Weather Review, 118,2056–2081. https://doi.org/10.1175/1520-0493(1990)118<2056:MWROTN>2.0.CO;2
Wallace, J. M., & Gutzler, D. S. (1981). Teleconnections in the geopotential height field during the Northern Hemisphere winter. MonthlyWeather Review, 109, 784–812. https://doi.org/10.1175/1520-0493(1981)109<0784:TITGHF>2.0.CO;2
- Line 64-66: This is a very strong critic on all prior classifications and should be further explained why none fit your purpose. These classification procedures were all used extensively in the literature. If you state this, you should at least demonstrate how your classification is superior.
Data and methods
- Line 80: If you use ERA-interim and not ERA5 reanalysis, you should at least say why, and mention some of the studies comparing the two data sets. I do not expect much difference for large-scale weather regimes, but this should be at least discussed.
- Line 82: Please justify why you use 12:00UTC and not daily or all 6-hourly data.
- Line 82: How did you coarse grain the data and why to 2×3 degrees?
- You often use ‘synoptic scale’, but I think it is more accurate to consider these regimes as large-scale features. I would try being more accurate on this. Perhaps change throughout the text.
- Line 95: Why 151 days of smoothing? Please justify this choice.
Results
- Lines 436-440: I do not completely understand how you obtained high resolution relative to coarse resolution in figure 9.
- Line 454-456: Your motivation was not to use centroids in the introduction and methods section, but then you test your medoids and say that they are very similar to the centroids. Is this not a circular argument?
- Section 4.6: Perhaps provide some illustrations of the different classes in the CMIP6 models, in addition the quality indices in the table.
- Table 3: I believe that there is not much difference between the models in the ‘transit’ and ‘persist’ values because there are so many classes. In addition, for the other indices the standard deviation is rather low, which is a bit surprising for more than 30 models. They all do pretty much the same job, which is again a bit surprising.
- Are the models evaluation criteria significantly different from one another? I think you should test this.
Conclusions
- This section is rather very short and should have a bit more discussion with respect to other articles evaluating models using a classification procedure. The article would also benefit from explaining what is better or similar in the new classification with respect to other methodologies used in the literature. The potential use of this methodology in climate projections or extended-range weather forecasts should probably also be discussed.
Technical comments:
- Line 82-84: Please rephrase, something is missing here.
- Line 307: This should be ‘Results’ and not ‘Method’ section.
- Line 318: Change ‘gives us an evidence that’ to ‘provides evidence that’.
- Line 357: Change ‘gives an evidence that’ to ‘provides evidence that’.
Figures:
- Figure 4: It is very hard to see anything with so many panels.
- Figure 10: I think you mixed up between left and right in the caption. In addition, are there significant difference in the right panels?
- Table 3: It should probably be DJF for winter in the upper row and not ‘JDF’.
Citation: https://doi.org/10.5194/esd-2022-29-RC2 -
AC3: 'Reply on RC2', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC3-supplement.pdf
-
RC3: 'Comment on esd-2022-29', Anonymous Referee #3, 09 Sep 2022
This paper describes a novel method of clustering circulation fields, and then applies this method to assess the ability of CMIP6 models to simulate realistic circulation patterns. The paper is generally clearly written and straightforward to understand, but I feel that the authors have not sufficiently justified the use of their method over something simpler like k-means. The analysis of circulation in the CMIP6 models is also rather brief. I therefore recommend major revisions.
Major comments
The bulk of the paper describes a new two-step classification method, arguing that previously used methods are 'suboptimal'. However, I don't think that the authors have sufficiently motivated their choice of method - my suspicion is that standard k-means clustering would give similar results.
The authors argue that k-means clustering has a number of drawbacks:
i) the number of clusters has to be pre-specified.
(But the authors' similarity threshold parameter seems to play a similar role, as it is subjectively chosen and also influences the number of clusters.)
ii) k-means centroids could be misleading and unrepresentative of the fields in the cluster.
(But does this not also apply to medoids, as a single field chosen to represent a set of fields? Surely any daily field will contain its own set of small scale features that don't resemble those of other fields. The authors appear to find that the cluster centroids and medoids are pretty similar anyway.)
iii) k-means clusters could be sensitive to outliers. (But does this actually happen in the case of the geopotential height fields?)
The authors quote image processing references to justify the similarity metric used here over (say) mean square error. It would be more convincing if the authors could show actual examples of deficiencies in k-means clusters constructed from their circulation data, and/or that clusters produced using their method were superior to those produced using k-means (for example, using the criteria set out in section 3.3).
2. The analysis of the CMIP6 models is rather limited - there's a ranking of the models according to various metrics, but not much more. Why did the authors choose these particular metrics over the wide variety of other possibilities? Do the HIST statistics correspond to
biases in the mean state of the models? Can the authors suggest any reasons why some models are better than others - eg resolution?Also, the transition statistics are likely to be very noisy with 43 different circulation types. How can we be confident that the transition results from ERA-Interim are a meaningful benchmark - is there enough reanalysis data to do this?
Again, it would be interesting to know if the results of the model evaluation analysis are signficantly different if k-means derived clusters are used instead.
Minor commentsLine 49 - "Hochman et al proved" - I think 'proved' is only an appropriate word when discussing mathematical proofs. I suggest something like 'argued' or 'demonstrated'. Also, people arguing that clusters represent genuine low-frequency weather regimes tend to find relatively few of them (four in winter seems a popular choice). Presumably the authors are not arguing that the 43 types they analyse here each represent a physical weather regime in this sense?
Line 58 - 'the moving atmosphere' - I'm not sure what this means.
Line 90 onwards - standardising the height fields means that information about the amplitude of the circulation anomalies is lost. But different amplitude anomaly patterns could produce quite different responses in eg surface air temperature and precipitation, so I'm not sure the standardisation step is beneficial.
line 111 - "The k-means clustering assigns every data element to the cluster center that is closest to it, if only by a small margin." Isn't this true of any method that assigns each field to one of a set of a classes?
line 112 - "This makes the method sensitive to noise in the data and may lead to an assignment of a data element to a structurally dissimilar cluster center." - what does "structurally dissimilar" mean here? How can we distinguish the noise from the structure in any given field? Can the authors show examples of fields that are far apart under the Euclidean distance metric but close together under the similarity metric, or vice versa?
line 116 - Doesn't using medoids also risk inflating the significance of small-scale noise in the daily field chosen as the medoid?
line 137 - "Wang and Bovik (2009) demonstrated that the MSE has serious disadvantages when applied on data with temporal and spatial dependencies" - dependencies on what? Does this mean temporal and spatial correlations?
line 194 - is the similarity between two clusters measured using their medoid fields?
line 267 - Is the algorithm stable if applied to slightly different initial subsets of the data? The number of patterns may be stable, but do the same patterns emerge from the clustering?
Figure 3 - it would make more sense to have the transition between the blues and reds in the colour bar at zero, not +0.25.
Line 245 - should there be a reference to figure 6 here?
Line 282 - "However, it is necessary to demand that a cluster medoid represents all cluster elements and their whole entity as a group." Does comparing the mediod and centroid really guarantee this?
Line 307 - is section 4 meant to be labelled 'Method', the same as section 3?
Figure 4 - Can the colour bar be included in the figure? There's room in the bottom row of panels.
Line 320 - "This correspondence gives us an evidence that, albeit not tuned to and not required to mimic semi-manual classifications, the new classification method determines not just arbitrary synoptic patterns but those described by experts in semi-manual classifications."
I'm not convinced - given that there are 43 different types, it seems quite likely that some of them could resemble Grosswetterlagen patterns by chance.
Figure 7 - the text in the figure labels could be much larger for legibility.
line 447 - again, I don't think one can infer that this is an inherent advantage of the SSIM method without making a comparison with other cluster methods.
Citation: https://doi.org/10.5194/esd-2022-29-RC3 -
AC4: 'Reply on RC3', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC4-supplement.pdf
-
AC4: 'Reply on RC3', Kristina Winderlich, 07 Oct 2022
-
AC5: 'Comment on esd-2022-29', Kristina Winderlich, 17 Jan 2023
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC5-supplement.pdf
Interactive discussion
Status: closed
-
RC1: 'Comment on esd-2022-29', Anonymous Referee #1, 04 Aug 2022
-
AC2: 'Reply on RC1', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC2-supplement.pdf
-
AC2: 'Reply on RC1', Kristina Winderlich, 07 Oct 2022
-
AC1: 'Comment on esd-2022-29', Kristina Winderlich, 17 Aug 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC1-supplement.pdf
-
EC1: 'Reply on AC1', Gabriele Messori, 09 Sep 2022
Dear Authors,
>> We would like to ask the Editor to make the judgement, which options the manuscript may
>> still have.In reply to the above query, I would suggest that you provide replies to all reviewer comments. Based on the comments and your replies I will then inform you of my decision regarding the manuscript.
Best Regards,
Gabriele Messori
Citation: https://doi.org/10.5194/esd-2022-29-EC1
-
EC1: 'Reply on AC1', Gabriele Messori, 09 Sep 2022
-
RC2: 'Comment on esd-2022-29', Anonymous Referee #2, 23 Aug 2022
Review of the manuscript entitled: “Classification of synoptic circulation patterns with a two-stage clustering algorithm using the structural similarity index metric (SSIM)” by Kristina Winderlich, Clementine Dalelane and Andreas Walter
Summary
The authors develop a new classification method for synoptic circulation patterns with the aim to extend the evaluation routine for climate simulations. Its unique novelty is the use of the structural similarity index metric (SSIM) instead of traditional distance metrics for
cluster building. This classification method combines two classical clustering algorithms used iteratively, hierarchical agglomerative clustering (HAC) and k-medoids. The authors apply the classification method to ERA-interim and NCEP1 reanalysis, and CMIP6 models. The authors wish to demonstrate that the built classes are consistent, well separated, spatially and temporally stable, and physically meaningful. Finally, the authors rank the CMIP6 models according to their ability to represent the weather types using different quality indices.Dear authors,
The purpose of using synoptic circulation patterns to evaluate climate models is a welcomed aim, but is not the first time this is done, as it may seem from the text. Indeed, the ability of models to capture the characteristics of synoptic patterns is an important aspect of improving climate model simulations. The SSIM is generally an interesting and seems to be promising approach for the classification of weather regimes. The article is generally well written, however it should be extended to serve as a high quality research article in ESD.
My comments and suggestions to improve the manuscript are as follows:
General comments
- Many classification algorithms attempt to categorize weather types/regimes over the Atlantic-European-Mediterranean region. If the authors suggest a new procedure, they should at least demonstrate why their classification is better than other classification procedures. Indeed, the authors try to explain their choices, but do not demonstrate how their procedure is superior in comparison to other classifications. Perhaps the authors can randomly select days and subjectively see for how many of them the classification does a decent job? Comparing to the original classification you mention in the text would then provide a semi-quantitative way of demonstrating the improvement from one classification to the other.
- Forty-three classes seems a rather large number of weather types and can probably be significantly reduced by some sort of EOF analysis. If not, it should at least be explained why the authors do not use this approach as it is very common. Furthermore, I would like to see some further explanation on how do these synoptic types relate to the four canonical weather regimes.
- The CMIP6 model evaluation section in its current form is rather short and does not provide very useful information for model developers. This section should probably be extended. It would be nice to have some discussion as to why you think some models are better or worse. Additional analysis is of course welcomed, but should probably be balanced with the length of the article.
Specific comments
Abstract
- What do you mean with physically meaningful? There may be different meanings to physical, and you should probably clarify this in the text.
- Line 10: This sentence should be at the very end of the abstract.
- Do you think your classification would be useful for extended-range weather forecasts? If so, mention this and in the abstract and discuss in the conclusions.
Introduction
- Line 43 – 47: From the introduction, it sounds as if you are the first and only group evaluating models based on weather regimes. However, there is an increasing body of knowledge working in this direction. To name a few articles:
References
Dorrington, J., Strommen, K., and Fabiano, F.: Quantifying climate model representation of the wintertime Euro-Atlantic circulation using geopotential-jet regimes, Weather Clim. Dynam., 3, 505–533, https://doi.org/10.5194/wcd-3-505-2022, 2022.
Fabiano, F., Christensen, H.M., Strommen, K. et al. Euro-Atlantic weather Regimes in the PRIMAVERA coupled climate simulations: impact of resolution and mean state biases on model performance. Clim Dyn 54, 5031–5048 (2020). https://doi.org/10.1007/s00382-020-05271-w
Hochman A, Alpert P, Harpaz T, Saaroni H, Messori G. 2019. A new dynamical systems perspective on atmospheric predictability: eastern Mediterranean weather regimes as a case study. Science Advances 5: eaau0936. https://doi.org/10.1126/sciadv.aau0936
- Line 58: Please discuss the number of regimes some more. There are a few articles focusing on this aspect in the literature. Some use two regimes (Wallace and Gutzler, 1981), others use four (Vautard 1990), six (Falkena et al., 2020) or seven (Grams et al., 2017) regimes. This is important as you use an outstanding number of 43.
References
Falkena, S. K., de Wiljes, J., Weisheimer, A., & Shepherd, T. G. (2020). Revisiting the identification of wintertime atmospheric circula-tion regimes in the Euro-Atlantic sector. Quarterly Journal of the Royal Meteorological Society, 146, 2801–2814. https://doi.org/10.1002/qj.3818
Grams, C. M., Beerli, R., Pfenninger, S., Staffell, I., & Wernli, H. (2017). Balancing Europe’s wind-power output through spatial deployment informed by weather regimes. Nature Climate Change, 7, 557–562. https://doi.org/10.1038/nclimate3338
Vautard, R. (1990). Multiple weather regimes over the North Atlantic: Analysis of precursors and successors. Monthly Weather Review, 118,2056–2081. https://doi.org/10.1175/1520-0493(1990)118<2056:MWROTN>2.0.CO;2
Wallace, J. M., & Gutzler, D. S. (1981). Teleconnections in the geopotential height field during the Northern Hemisphere winter. MonthlyWeather Review, 109, 784–812. https://doi.org/10.1175/1520-0493(1981)109<0784:TITGHF>2.0.CO;2
- Line 64-66: This is a very strong critic on all prior classifications and should be further explained why none fit your purpose. These classification procedures were all used extensively in the literature. If you state this, you should at least demonstrate how your classification is superior.
Data and methods
- Line 80: If you use ERA-interim and not ERA5 reanalysis, you should at least say why, and mention some of the studies comparing the two data sets. I do not expect much difference for large-scale weather regimes, but this should be at least discussed.
- Line 82: Please justify why you use 12:00UTC and not daily or all 6-hourly data.
- Line 82: How did you coarse grain the data and why to 2×3 degrees?
- You often use ‘synoptic scale’, but I think it is more accurate to consider these regimes as large-scale features. I would try being more accurate on this. Perhaps change throughout the text.
- Line 95: Why 151 days of smoothing? Please justify this choice.
Results
- Lines 436-440: I do not completely understand how you obtained high resolution relative to coarse resolution in figure 9.
- Line 454-456: Your motivation was not to use centroids in the introduction and methods section, but then you test your medoids and say that they are very similar to the centroids. Is this not a circular argument?
- Section 4.6: Perhaps provide some illustrations of the different classes in the CMIP6 models, in addition the quality indices in the table.
- Table 3: I believe that there is not much difference between the models in the ‘transit’ and ‘persist’ values because there are so many classes. In addition, for the other indices the standard deviation is rather low, which is a bit surprising for more than 30 models. They all do pretty much the same job, which is again a bit surprising.
- Are the models evaluation criteria significantly different from one another? I think you should test this.
Conclusions
- This section is rather very short and should have a bit more discussion with respect to other articles evaluating models using a classification procedure. The article would also benefit from explaining what is better or similar in the new classification with respect to other methodologies used in the literature. The potential use of this methodology in climate projections or extended-range weather forecasts should probably also be discussed.
Technical comments:
- Line 82-84: Please rephrase, something is missing here.
- Line 307: This should be ‘Results’ and not ‘Method’ section.
- Line 318: Change ‘gives us an evidence that’ to ‘provides evidence that’.
- Line 357: Change ‘gives an evidence that’ to ‘provides evidence that’.
Figures:
- Figure 4: It is very hard to see anything with so many panels.
- Figure 10: I think you mixed up between left and right in the caption. In addition, are there significant difference in the right panels?
- Table 3: It should probably be DJF for winter in the upper row and not ‘JDF’.
Citation: https://doi.org/10.5194/esd-2022-29-RC2 -
AC3: 'Reply on RC2', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC3-supplement.pdf
-
RC3: 'Comment on esd-2022-29', Anonymous Referee #3, 09 Sep 2022
This paper describes a novel method of clustering circulation fields, and then applies this method to assess the ability of CMIP6 models to simulate realistic circulation patterns. The paper is generally clearly written and straightforward to understand, but I feel that the authors have not sufficiently justified the use of their method over something simpler like k-means. The analysis of circulation in the CMIP6 models is also rather brief. I therefore recommend major revisions.
Major comments
The bulk of the paper describes a new two-step classification method, arguing that previously used methods are 'suboptimal'. However, I don't think that the authors have sufficiently motivated their choice of method - my suspicion is that standard k-means clustering would give similar results.
The authors argue that k-means clustering has a number of drawbacks:
i) the number of clusters has to be pre-specified.
(But the authors' similarity threshold parameter seems to play a similar role, as it is subjectively chosen and also influences the number of clusters.)
ii) k-means centroids could be misleading and unrepresentative of the fields in the cluster.
(But does this not also apply to medoids, as a single field chosen to represent a set of fields? Surely any daily field will contain its own set of small scale features that don't resemble those of other fields. The authors appear to find that the cluster centroids and medoids are pretty similar anyway.)
iii) k-means clusters could be sensitive to outliers. (But does this actually happen in the case of the geopotential height fields?)
The authors quote image processing references to justify the similarity metric used here over (say) mean square error. It would be more convincing if the authors could show actual examples of deficiencies in k-means clusters constructed from their circulation data, and/or that clusters produced using their method were superior to those produced using k-means (for example, using the criteria set out in section 3.3).
2. The analysis of the CMIP6 models is rather limited - there's a ranking of the models according to various metrics, but not much more. Why did the authors choose these particular metrics over the wide variety of other possibilities? Do the HIST statistics correspond to
biases in the mean state of the models? Can the authors suggest any reasons why some models are better than others - eg resolution?Also, the transition statistics are likely to be very noisy with 43 different circulation types. How can we be confident that the transition results from ERA-Interim are a meaningful benchmark - is there enough reanalysis data to do this?
Again, it would be interesting to know if the results of the model evaluation analysis are signficantly different if k-means derived clusters are used instead.
Minor commentsLine 49 - "Hochman et al proved" - I think 'proved' is only an appropriate word when discussing mathematical proofs. I suggest something like 'argued' or 'demonstrated'. Also, people arguing that clusters represent genuine low-frequency weather regimes tend to find relatively few of them (four in winter seems a popular choice). Presumably the authors are not arguing that the 43 types they analyse here each represent a physical weather regime in this sense?
Line 58 - 'the moving atmosphere' - I'm not sure what this means.
Line 90 onwards - standardising the height fields means that information about the amplitude of the circulation anomalies is lost. But different amplitude anomaly patterns could produce quite different responses in eg surface air temperature and precipitation, so I'm not sure the standardisation step is beneficial.
line 111 - "The k-means clustering assigns every data element to the cluster center that is closest to it, if only by a small margin." Isn't this true of any method that assigns each field to one of a set of a classes?
line 112 - "This makes the method sensitive to noise in the data and may lead to an assignment of a data element to a structurally dissimilar cluster center." - what does "structurally dissimilar" mean here? How can we distinguish the noise from the structure in any given field? Can the authors show examples of fields that are far apart under the Euclidean distance metric but close together under the similarity metric, or vice versa?
line 116 - Doesn't using medoids also risk inflating the significance of small-scale noise in the daily field chosen as the medoid?
line 137 - "Wang and Bovik (2009) demonstrated that the MSE has serious disadvantages when applied on data with temporal and spatial dependencies" - dependencies on what? Does this mean temporal and spatial correlations?
line 194 - is the similarity between two clusters measured using their medoid fields?
line 267 - Is the algorithm stable if applied to slightly different initial subsets of the data? The number of patterns may be stable, but do the same patterns emerge from the clustering?
Figure 3 - it would make more sense to have the transition between the blues and reds in the colour bar at zero, not +0.25.
Line 245 - should there be a reference to figure 6 here?
Line 282 - "However, it is necessary to demand that a cluster medoid represents all cluster elements and their whole entity as a group." Does comparing the mediod and centroid really guarantee this?
Line 307 - is section 4 meant to be labelled 'Method', the same as section 3?
Figure 4 - Can the colour bar be included in the figure? There's room in the bottom row of panels.
Line 320 - "This correspondence gives us an evidence that, albeit not tuned to and not required to mimic semi-manual classifications, the new classification method determines not just arbitrary synoptic patterns but those described by experts in semi-manual classifications."
I'm not convinced - given that there are 43 different types, it seems quite likely that some of them could resemble Grosswetterlagen patterns by chance.
Figure 7 - the text in the figure labels could be much larger for legibility.
line 447 - again, I don't think one can infer that this is an inherent advantage of the SSIM method without making a comparison with other cluster methods.
Citation: https://doi.org/10.5194/esd-2022-29-RC3 -
AC4: 'Reply on RC3', Kristina Winderlich, 07 Oct 2022
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC4-supplement.pdf
-
AC4: 'Reply on RC3', Kristina Winderlich, 07 Oct 2022
-
AC5: 'Comment on esd-2022-29', Kristina Winderlich, 17 Jan 2023
The comment was uploaded in the form of a supplement: https://esd.copernicus.org/preprints/esd-2022-29/esd-2022-29-AC5-supplement.pdf
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
734 | 284 | 66 | 1,084 | 48 | 53 |
- HTML: 734
- PDF: 284
- XML: 66
- Total: 1,084
- BibTeX: 48
- EndNote: 53
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1