the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring the coupled ocean and atmosphere system with a data science approach applied to observations from the Antarctic Circumnavigation Expedition
Sebastian Landwehr
Michele Volpi
F. Alexander Haumann
Charlotte M. Robinson
Iris Thurnherr
Valerio Ferracci
Andrea Baccarini
Jenny Thomas
Irina Gorodetskaya
Christian Tatzelt
Silvia Henning
Rob L. Modini
Heather J. Forrer
Yajuan Lin
Nicolas Cassar
Rafel Simó
Christel Hassler
Alireza Moallemi
Sarah E. Fawcett
Neil Harris
Ruth Airs
Marzieh H. Derkani
Alberto Alberello
Alessandro Toffoli
Gang Chen
Pablo Rodríguez-Ros
Marina Zamanillo
Pau Cortés-Greus
Lei Xue
Conor G. Bolas
Katherine C. Leonard
Fernando Perez-Cruz
David Walton
Download
- Final revised paper (published on 30 Nov 2021)
- Supplement to the final revised paper
- Preprint (discussion started on 21 Apr 2021)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on esd-2021-16', Anonymous Referee #1, 28 May 2021
Summary and overall impression
This manuscript makes use of a large interdisciplinary dataset from the Antarctic Circumnavigation Expedition, a 90-day cruise from December 2016 to March 2017, in combination with the sparse PCA (sPCA) method to extract process understanding from this comprehensive dataset. The study has a very broad scope, aiming to obtain a holistic understanding of the process biogeochemical and physical processes in the Southern Ocean and atmosphere. The method (sPCA), goes beyond standard PCAs, which are commonly used in oceanography and meteorology. sPCAs aim to increase interpretability when dealing with many variables and processes. In addition, the authors apply a bootstrapping approach in order to quantify the uncertainty of their sPCA results.
I find this a very exciting study and it has the potential to be relevant and valuable to the community. I see three main strengths of the manuscript. First, it presents a method (sPCA) that is relatively new in Earth System Science and may be useful for further studies analyzing ship data. Second, the method allows the authors to conduct an extremely multidisciplinary analysis including a broad range of observed variables and are able to extract an understanding of the dominant processes in the study region. Third, the study is based on a new comprehensive observational dataset from a historically under-sampled region (the Southern Ocean), and includes measurements in the ocean, atmosphere, and cryosphere, covering all sectors, different interfrontal zones, both open ocean and near islands and continents, and covering a broad range of physical and biogeochemical variables.
At the same time, I have several major comments that I believe need to be addressed before publication. My main concern is the description of the method. I have to admit that I am not too familiar with standard PCAs, and sPCAs are completely new to me. Assuming that this may be the same for many readers, I believe the manuscript can gain considerable clarity by improving the description of the methods (see general comments for more specific details on this and other major comments).
General comments
sPCA method: I suggest expanding Section 3.1. I would appreciate a discussion on why setting some weights to zero is ok and why this does not lose crucial information. In a standard PCA, we say e.g., 80% of the variability is linked to OV1, 5% each to OV2 and OV3. We then know that there is a remaining 10% of variability due to other processes. With sPCA (the way I understand it from the manuscript) we reduce the complexity, ignoring some variables, to explain all of the remaining variability. Here, we get to 100%, but we actually know that we weighted many variables with 0 in order to do so. Isn’t the standard approach more complete in its interpretation? What are the pros and cons of each? It should also be mentioned if the user chooses which weights are set to zero, or if the algorithm does that. (My apologies if I have misunderstood the sPCA method. If that is the case, I suggest you clarify it).
LVs: I find the current explanation of what an LV is quite confusing (L85-87), which led to further confusion later in the document. I recommend making it really clear here what an LV is in an sPCA and how it is different to the OVs. I recommend explicitly stating that the LVs are the processes we want to understand (i.e., the output from the sPCA) with the help of OVs (i.e., the input to the sPCA). (it becomes clearer later in the document, but is needed early on).
Please add a section that summarizes what happens during the sPCA to add clarity on the method for people unfamiliar with it. The way I understand how the sPCA works from your manuscript, the user chooses a set number of processes they want to know about (here: 14), feeds all OVs (here: 111) into the algorithm. Some of the OVs are set weighted 0 to reduce the number of OVs for each LV. (→ This should be discussed and mentioned if this happens randomly.) The algorithm then identifies 14 different sets of OVs. The users then see which OVs have non-zero weights in each LV to determine which process each LV represents. i.e. the user has to make a choice: if sea surface temperature, salinity, and MLD are OVs in an LV, then the LV might represent a process linked to ocean circulation. (→ For each LV, it would be good to know which OVs are in it so that the reader can understand how the label for each LV was chosen). We can then also see the percentage of the variability that process has on the variability in all of the 111 variables.
→ Is this correct? If yes, it might give you hints about which pieces of information the reader might want to hear about. If not, my understood explanation might give you hints about which parts were confusing.
Unimportant variables for an LV “are forced to be zero”: could we accidentally lose information here? Is this a subjective choice by the authors or done by the algorithm? This should be discussed further.
Research Question(s): Another concern is linked to the research question(s) the article wants to answer. It is such a broad study that scratches on so many topics that it becomes a bit blurry in the introduction where this is all going. The way it is currently presented, it appears as a data mining approach of plugging in all the data and seeing what happens. Were there some hypothesis before that you wanted to test? I would find it helpful to add a (couple of) specific research question(s) and build on that in the introduction why we want to know about that. E.g., Is it about the processes? Is it about showing that sPCAs are a good tool? (or both). Are there some processes we are unsure about, which the sPCA might shine a light on?
Linked to my previous comment: it is not clear to me which findings are confirmations of processes we already knew, and which findings are new insights. This should be clarified.
Eddies: One process that doesn’t seem to be covered in this study, but is a known driver behind variability in the Southern Ocean are mesoscale eddies. This should be discussed.
Seasonality: Please add a discussion on the fact that the cruise is only 90 days long (i.e., during one season) and that the ship is moving during that time, making it difficult (or impossible?) to conduct a seasonal analysis. The discussion should include why it is possible (or not possible?) to robustly conclude on any seasonal signals with this data.
Specific and minor comments to the text:
L. 131: In this section, I would have liked to also find out a bit more about the measurements, e.g., if the ocean measurement are at the sea surface only (same for atmosphere) and I recommend adding a sentence or two stating the nature of the measurements (sensors, air/water/ice samples… were some data collected by platforms other than the ship, such as satellites/planes…?).
L. 145: It should also be mentioned here (and possibly in the abstract/introduction) that this is an unsupervised machine learning approach (as stated in the Conclusion).
Citation: https://doi.org/10.5194/esd-2021-16-RC1 - AC1: 'Reply on RC1', Julia Schmale, 25 Jun 2021
-
RC2: 'Comment on esd-2021-16', Anonymous Referee #2, 01 Jun 2021
Comments on "Biogeochemistry and Physics of the Southern Ocean-Atmosphere System Explored With Data Science" by Landwehr et al.
This manuscript presents a detailed exploration of a very large ensemble of measurements of in-situ variables from the Southern Ocean and from the Southern Atmosphere. It emphasizes the technique of sparse principal component analysis which indicates possible causal relationships and tries to identify underlying processes explaining how the variations of the observed variables. As it is now the manuscript is well written but it could benefit from incorporating the minor remarks I have below. I also propose to more clearly delineate the advantages of the sPCA method to guide the reader about the choice of analysis made here.
Major comments:
The following sentence at the end of the discussion (Page 60, lines 1102-1103) would need to be better backed up by the authors:
“In summary, we find that the sPCA is not only capable of resolving many of the complex connections between the OVs (Observed Variables) but also to provide estimates of their relative importance for the observed variability of each OV.”
I would welcome a paragraph stating, with possible examples from the results and the discussion, the strengths of the sPCA method. The weaknesses are well described but the reader would also like to have the view of the authors on what guided them to select this method for an analysis.
The distance to the continent (Latent Variable 5, LV5) is not the best indicator of land influences as the authors seem to suggest. A much better indicator would be a 222Rn concentration measurement. Radon-222 is a radiogenic gas which emission flux is 100 times more important over land than over ocean. As such, you can use the concentration of 222Rn to trace how long ago an air parcel was over a continent. Several authors have used this property as a measure of the continental influence of an air parcel travelling over the ocean (Heimann et al. (1990) Balkanski and Jacob (1990)).
Minor comments:
Caption of Figure 1: do you really mean “microbial gases” or is it rather “biogenic gases”. If you use the terms ‘microbial gases’ you imply that these gases are exclusively emitted by microbial organisms.
Was there any attempt made to tag the air masses or use back-trajectories to know how long ago this air mass was above continents? It could (for example) explain why certain air masses have a higher O3 content as discussed in lines 457-458 page 26.
Lines 480-482: did you check whether the values of RH for these warm air masses. Could the values of RH be an indicator for prior precipitation?
Page 31, lines 531-534, the following sentence comes a bit out of nowhere:
“There is no apparent explanation for the inclusion of carbon monoxide (CO), the mass concentration of sulfate in nonrefractory particulate matter (SO2− 4 ), and the atmospheric isoprene concentration (Isopreneair ), and further analysis is beyond the scope of this work.”
You might be missing something important here relative to isoprene. It would be worth investigating or asking other groups to think about this positive correlation between extratropical cyclone activity and isoprene in air. Isn’t it simply that isoprene sources are abundant in the subtropical regions and the cyclones channel rapidly air from lower latitudes to the latitudes at which you are making these measurements?
With regards to the results described for LV2: Drivers of the cloud condensation nuclei population.
You do not mention that small particle in the nucleation mode will eventually end up in the accumulation mode upon growth and coagulation. Condensation nuclei (CN) that are not activated will join the accumulation mode aerosol.
A very noteworthy reference concerning CCN is the one from Lee et al (2013). The authors studied twenty right parameters that cover all important aerosol processes to understand the cause of uncertainty for CCN.
Lines 685-687 why is your hypothesis limited to rainout and does not include washout? “To check our hypothesis
concerning rainout, we investigated the precipitation rate along the backward trajectories for the previous three days (see
Figure 14)”
Paragraph 5.5 why is LV12 not related to Nccn,0.15, Nccn,0.30 and Nccn,1.0?
Monahan et al. (1986) parametrization of seasalt emission predicts that these small seasalt aerosols would be abundantly produced at high wind speeds.
Page 41, line 713: please be more specific than ‘The relatively large size of airborne SSA droplets’ since particles much larger than 2 or 3 um do not scatter as efficiently at visible wavelengths than particles between 0.2 and 2 um.
FVFM is defined line 1664: ‘’FVFM is the maximum photochemical efficiency of photosystem II’ and used line 738 without definition.
Lines 762-764: explain for the non-specialist what to look for in Figure 5: “ Bacterial abundance has a relatively high negative contribution to LV11 (see Figure 5), as bacterial concentrations are linked to the availability of dissolved organic matter (a product of particulate organic matter including POC and PON) and nutrients (Church et al., 2000; Kirchman et al., 2009).”
Page 54, line 989: You wrote “strong precipitation even”, did you mean “strong precipitation event”?
References
Lee, L. A., Pringle, K. J., Reddington, C. L., Mann, G. W., Stier, P., Spracklen, D. V., Pierce, J. R., and Carslaw, K. S.: The magnitude and causes of uncertainty in global model simulations of cloud condensation nuclei, Atmos. Chem. Phys., 13, 8879–8914, https://doi.org/10.5194/acp-13-8879-2013, 2013.
Martin Heimann, Patrick Monfray & Georges Polian (1990) Modelling the long-range transport of 222Rn to subantarctic and antarctic areas, Tellus B: Chemical and Physical Meteorology, 42:1, 83-99, DOI: 10.3402/tellusb.v42i1.15194
Monahan, E. C., D. E. Spiel, and K. L. Davidson, A model of marine aerosol generation via whitecaps and wave disruption, in Oceanic Whitecaps and Their Role in Air-Sea Exchange, edited by E. C. Monahan and G. Mac Niocaill, pp. 167–174, D. Reidel, Norwell, Mass., 1986.
Yves J. Balkanski & Daniel J. Jacob (1990) Transport of continental air to the subantarctic Indian Ocean, Tellus B: Chemical and Physical Meteorology, 42:1, 62-75, DOI: 10.3402/tellusb.v42i1.15192
Citation: https://doi.org/10.5194/esd-2021-16-RC2 - AC2: 'Reply on RC2', Julia Schmale, 25 Jun 2021
-
RC3: 'Review of Landwehr et al., 2021, ESD', Anonymous Referee #3, 02 Jun 2021
General comments:
I find this paper uses an interesting approach that has a potentially high value and high impact for the ocean-atmosphere interdisciplinary research community. The paper takes the observations from the Antarctic Circumnavigation Expedition (ACE, austral summer 2016/2017) cruise and combines them with a sparse Principal Component Analysis (sPCA) to understand how different observed variables are linked together and to the general context (e.g. distance from land, cyclone activity, etc.). The paper is also very long, which makes reading and understanding the entire content of the paper and really getting into the new conclusions that result from this study extremely difficult.
I support this paper as a proof of concept for this approach, but I find the science questions posed (or hypotheses) and conclusions in the study are very weak. This paper should be published after the comments from the other reviewers and the comments below are addressed.
Major comments:
Most of the conclusions made using this very complex analysis are simplified statements of well known phenomena. So, I’m not sure what is the added value of this approach compared to what is already known. This is seen in the various “In summary” statements that come at the end of each section that focuses on the latent variables (LVs). This is seen most clearly in the summary for LV7 and LV10, which mostly put things into a seasonal and diurnal cycle context. I do not see what we have learned by using this “data science” approach. One way to address this would be to acknowledge in the abstract and very early in the study that there are no main scientific conclusions using data science in this study, but that this sets up the methodology that can be used in the future for this purpose.
The paper should be re-titled to more clearly reflect the paper content. The paper focuses on all of the aspects of the ACE cruise, not just biogeochemistry and physics. I would recommend something more general like “Understanding processes observed in the southern ocean-atmosphere system using ACE observations combined with data science”.
I recommend that the authors work on shortening the paper by moving some of the very lengthy discussion into supplementary materials or into an annex to make this paper more readable. I would like the authors to get to the point of what was learned in addition to what is already known more quickly.
The authors should discuss how different timescales of processes that occur in nature that control the observed variables that were seen as a snapshot in space and time on the ship. Is it fair to group things into a data science approach variables that are observed in the atmosphere, ice, and ocean that have very different lifetimes and controlling factors that may not be co-located (i.e. relating them in the same space and time may give the wrong correlations/dependencies compared to what happens in nature)?
How do non-local processes get integrated into this approach? This is not currently clear for me.
The authors should expand their discussion of missing data and the influence this has on their analysis (as noted by reviewer 1).
Minor comments:
There are a few small typos as noted by reviewer 2. I suggest a careful re-reading before publication.
Citation: https://doi.org/10.5194/esd-2021-16-RC3 - AC3: 'Reply on RC3', Julia Schmale, 25 Jun 2021
Peer review completion
hotspotsof interaction. Code and data are open access.