Identifying the control cities of O<sub>3</sub> Pollution using Complex networks

Zhao, Zhi-Dan; Xue, Demei; Sun, Haojun; Wang, Weiping; Ying, Na

doi:https://doi.org/10.5194/esd-2024-4

Preprints

https://doi.org/10.5194/esd-2024-4

Preprints

25 Apr 2024

| 25 Apr 2024

Status: this preprint was under review for the journal ESD. A final paper is not foreseen.

Identifying the control cities of O₃ Pollution using Complex networks

Zhi-Dan Zhao, Demei Xue, Haojun Sun, Weiping Wang, and Na Ying

Abstract. In recent years, ozone (O₃) pollution has been rapidly spreading, restricting further improvement of air quality in China. Investigating the interaction of O₃ concentration and identifying their driven cities are important for the prevention and control of O₃ pollution in China. However, the interaction between O₃ pollution between cities and their driven cities has not yet been revealed. In this study, we fill this gap based on the integration of complex network methods, the Louvain community partitioning algorithm and the maximum matching network control theory. O₃ network model exhibits a structured cluster framework, such as Northeast, North China, Sichuan and Chongqing, and Southeast coastal areas. And the driver nodes are mainly concentrated in the central region, while the non-driver nodes are mainly located in the coastal periphery. We also found that the proportion of driven nodes exhibits a positive relation with the threshold. In addition, the coincidence degree of the driven node is related to the choose of threshold. A closer threshold value corresponds to a higher coincidence ratio of the driven nodes. The correlation of driven nodes predicting non-driven nodes is stronger than non-driven nodes predicting driven nodes, suggesting that driven nodes have more influence in the O₃ network than non-driven nodes. The results provide scientific guidance for national O₃ pollution prevention and regional synergy formatting. Furthermore, the introduced network-based approaches offer a mythological framework for the study of air pollution in key cities and clusters.

This preprint has been withdrawn.

Received: 25 Jan 2024 – Discussion started: 25 Apr 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2565 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (2565 KB)

Download & links

This preprint has been withdrawn.

Zhi-Dan Zhao, Demei Xue, Haojun Sun, Weiping Wang, and Na Ying

Interactive discussion

Status: closed

RC1: 'Comment on esd-2024-4', Anonymous Referee #1, 08 Aug 2024

The manuscript presents an analysis of the time series of O3 concentrations from different Chinese cities using concepts and tools from network theory. Correlation networks are constructed in which nodes are the cities and links between two of them are set if Pearson correlation between O3 fluctuations in these two cities exceeds a given threshold. The obtained networks are analysed with methods of structural network controllability, community detection, and long short-term memory (LSTM) predictability.
In my opinion the quality of the paper is well below the standards expected for publication in Earth System Dynamics, and then I cannot recommend publication. The reasons for my negative assessment are two: lack of any discussion of the implications of the mathematical results, and lack of clarity in the explanation of the methodology and presentation of the results. In addition, in most cases there is no clear justification for the use of the particular methodologies employed. In the following I provide a non-exhaustive list of weak points, substantiating my general criticisms.
- The authors correctly identify key places where the results of their mathematical exercises need understanding and explanation. Examples are ‘Future research should focus more on the explainable practical role of these network structures’ (line 208). ‘Future work should therefore focus on understanding the practical significance and use of driven nodes in O3 networks’ (line 245). ‘Future research should focus more on understanding the practical implications of seasonal changes in O3 network nodes over time’ (line 271). ‘Future research should pay more attention to the significance of these nonlinear relationships in practical O3 application scenarios’ (line 304). ‘Future research should focus more on exploring the practical implications of these results’ (line 349). ‘Future research should pay more attention to the mechanism behind the above phenomenon’ (line 372). ‘Future research should focus on the underlying mechanisms leading to these phenomena’ (line 398) ... But no single attempt to interpret these or other results is included in this paper. Are the O3 fluctuations dominated by industrial processes, urban emissions or by climatic/meteorological effects? What is the meaning of the communities found? Just geographical proximity or some type of common behaviour? In which way can the network be controlled? Acting on industrial or urban sources of precursors of O3? Are the nodes identified as ‘driver’ nodes sources of pollution or sites sensible to climatic fluctuations? I recommend the authors to concentrate in one or a few of these relevant questions, and try to answer them using their networks results. Until this is done, the present paper remains a mathematical exercise of application of network tools to some time series, and then not particularly relevant to be published in Earth System Dynamics.
- The description of the network controllability is particularly confusing. The authors talk about nodes that are ‘driver’, ‘driven’, ‘non-driven’, ‘drive’, ‘drivens’, or ‘driving’, in a way that does not seem to be always consistent. As the paper advances the nomenclature seems to stabilize in ‘driven nodes’ and ‘non-driven nodes’. This is particularly inconvenient, since in the paper by Liu et al. 2011 (to which the authors refer for the methodology). The nomenclature used is of ‘driver nodes’ (=’unmatched’) and ‘matched nodes’. It is not clear from the manuscript, but it seems to me that what the authors call here ‘driven nodes’ are in fact the ‘driver nodes’ of Liu et al. Since the authors do not analyse any interpretation or implication of their identification of ‘driven nodes’, there is no way to check if my interpretation of the nomenclature is correct, although it helps to understand figures 7-9 (panels b and d).
- Continuing with the issue of controllability, note that the methodology in Liu et al. 2011 refers to ‘structural controllability’ and is based on the use of structural (physical) links between the different nodes. Here, links are of statistical nature, so that a statistical link can be originated from many different sets of structural connections. The authors do not discuss at all why the methodology used (maximum matching algorithm, etc. from Liu et al.) can be used here. Note the difference, for example, with the network constructed by Tian et al. 2014: there the links represent transport by winds, so that they can be considered ‘structural’ and not just statistical as in the present manuscript.
- The authors give no argument to justify for the use of the Louvain community detection method instead of the many others available (and some of them free of the resolution limit which affects the Louvain method). Since the resulting communities are not interpreted, it is difficult to assess if the method has achieved a good partitioning of the system.
- The role of the LSTM forecasting (which is not mentioned in the abstract) is difficult to assess: the authors do not give any hint on why the forecasting is done from/to driven to/from non-driven nodes. Is not more natural to do the forecasting exercise between all pair of nodes? And the final result, presented in Fig. 13, gives some kind of correlation as a function of distance. In which sense is this different or better than just plotting the correlation between all pair of cities as a function of its distance? As with the rest of approaches, there is no explanation to justify what is being done.
- In general, the description of the different procedures is very confusing. For example, in line 92 it is said that tau is positive, whereas in the next line tau takes positive and negative values. At the end it seems (but I am not completely sure by reading the description) that really two networks are constructed, one for tau>0 and another for tau<0. There is no reason given to associate positive values of tau to positive correlations, and negative tau to negative correlations. In equation (6) the quantities Sigma_xx, k_x, or m are not defined. Perhaps ‘i’ is ‘a’ and ‘m’ is ‘n’ …?
- Finally, there are many typos in the paper (e.g. line 22: ‘the choose of threshold’; line 25: ‘mythological’; line 37: ‘are one of the main precursor …’; line 75: ‘The The’, …). And in general the paper has not been re-read carefully: for example the sentence in lines 104-106 is clearly copied from an unrelated paper: ‘ Interventionary studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.’
In summary, the paper is not of enough quality to deserve publication in Earth System Dynamics.

Citation: https://doi.org/10.5194/esd-2024-4-RC1
RC2: 'Comment on esd-2024-4', Anonymous Referee #2, 13 May 2025

This study applies a Louvain algorithm for community detection to Ozone concentration data from Chinese cities over a period of five years and analyses this data with a maximum matching algorithm, then applies a range of evaluation procedures to assess the structure of the identified networks. The research question could be more clearly formulated. The approach to analyse ozone (or other pollutant) transport with network theory is already well established (e.g. Tian & Gunes 2014 Complex Networks V, Xiao & Lu 2020 Pollution Study). Using control network theory to identify cities with the stongest influence on the pollution dynamics of the whole network could allow for policy relevant implications in regards to potential interventions, as has for example been done with a weighted network approach for air pollution in general in Guo et al. 2022, Int. J. Environ. Res. Public Health.
There are, however, significant issues with the manuscript in its current form and I can not recommend publication in Earth System Dynamics at this point. IN summary, the manuscript lacks consistency and thoroughness in the presentation of the methodology as well as results. Most importantly, the interpretation of the results is missing and the manuscript does not allow for conclusions from the statistical findings onto actual physical processes. Below is a selection of identified issues with the manuscript that hopefully render my conclusion comprehensible.
The manuscript seems to be using non-standard nomenclature. It is not explained how "driven cities" are defined and understood across the manuscript. The terms are also used inconsistently, e.g. "driving cities" (line 50) or "O3 driven cities" (line 54), "drive nodes" (line 134). Later, the term "hub nodes" is also introduced but not defined. I am assuming that all of them refer to what is typically called "driver nodes" in network control theory (see e.g. Liu et al. 2011 Nature), where driver nodes require external input in order to control the network and allow conclusions about the controllability of the network. On several occasions abbreviations are not introduced (e.g. line 158) and internal references are incorrect (e.g. "Section II B" referenced in line 163 and "Section Ⅲ C" referenced in line 166 do not exist).
Choices for thresholds are not clearly motivated across the manuscript and seem somewhat arbitrary (Figures 1-6). Different thresholds can yield very different network topologies and community structures. While necessary for the chosen methodology, splitting into positive and negative threshold networks may oversimplify the system and I am missing a discussion of how this impacts the interpretation of the results.
The existing sections in the manuscript are not cleanly separated. In particular, the results section contains parts that rather belong to the methods (see e.g. lines 183-184), introduction (lines 201-202), or conclusion (lines 209-210, lines 246-247) sections. Lines 361-365 should be introduced in the methods rather than the results.
Further, the manuscript contains frequent grammatical and spelling errors that decrease its accessibility. For example, "mythological" (line 25) is likely meant to be "methodological". Several formulations are rather informal for a scientific paper, such as "It's glad" (line 33), "quickly" (line 108, 114), "Of course" (line 170). There are repetitions, sometimes of nearly the same sentence (see e.g. lines 108-111 and lines 112-113).
The color-coding between Figures 1, 2 and 3 is inconsistent with no good reason, making the results more difficult to interpret. They are also missing a legend for the colors, and as the colors denote regions this should be made consistent with Figure 13. The slopes in Figure 13 should be labelled within the Figure rather than described in the caption.
Overall, there is a lack of consistency in nomenclature as well as in the presentation of the results, and the frequent errors and inconsistencies give an impression of a lack of effort. The manuscript would benefit from substantial revision, including thorough editing and copy-editing. Most importantly, the technical findings on network structure are not sufficiently interpreted in the context of ozone transport between cities. Without a clear connection to the physical processes and a systematic discussion of limitations, the results risk being interpreted as statistical artifacts rather than meaningful insights on the underlying physical processes and it is accordingly not possible for me as a reviewer to judge whether meaningful insights have been found. Given these substantial issues, I do not believe the manuscript meets the standards expected of an ESD paper in its current form.

Citation: https://doi.org/10.5194/esd-2024-4-RC2

Interactive discussion

Status: closed

RC1: 'Comment on esd-2024-4', Anonymous Referee #1, 08 Aug 2024

The manuscript presents an analysis of the time series of O3 concentrations from different Chinese cities using concepts and tools from network theory. Correlation networks are constructed in which nodes are the cities and links between two of them are set if Pearson correlation between O3 fluctuations in these two cities exceeds a given threshold. The obtained networks are analysed with methods of structural network controllability, community detection, and long short-term memory (LSTM) predictability.
In my opinion the quality of the paper is well below the standards expected for publication in Earth System Dynamics, and then I cannot recommend publication. The reasons for my negative assessment are two: lack of any discussion of the implications of the mathematical results, and lack of clarity in the explanation of the methodology and presentation of the results. In addition, in most cases there is no clear justification for the use of the particular methodologies employed. In the following I provide a non-exhaustive list of weak points, substantiating my general criticisms.
- The authors correctly identify key places where the results of their mathematical exercises need understanding and explanation. Examples are ‘Future research should focus more on the explainable practical role of these network structures’ (line 208). ‘Future work should therefore focus on understanding the practical significance and use of driven nodes in O3 networks’ (line 245). ‘Future research should focus more on understanding the practical implications of seasonal changes in O3 network nodes over time’ (line 271). ‘Future research should pay more attention to the significance of these nonlinear relationships in practical O3 application scenarios’ (line 304). ‘Future research should focus more on exploring the practical implications of these results’ (line 349). ‘Future research should pay more attention to the mechanism behind the above phenomenon’ (line 372). ‘Future research should focus on the underlying mechanisms leading to these phenomena’ (line 398) ... But no single attempt to interpret these or other results is included in this paper. Are the O3 fluctuations dominated by industrial processes, urban emissions or by climatic/meteorological effects? What is the meaning of the communities found? Just geographical proximity or some type of common behaviour? In which way can the network be controlled? Acting on industrial or urban sources of precursors of O3? Are the nodes identified as ‘driver’ nodes sources of pollution or sites sensible to climatic fluctuations? I recommend the authors to concentrate in one or a few of these relevant questions, and try to answer them using their networks results. Until this is done, the present paper remains a mathematical exercise of application of network tools to some time series, and then not particularly relevant to be published in Earth System Dynamics.
- The description of the network controllability is particularly confusing. The authors talk about nodes that are ‘driver’, ‘driven’, ‘non-driven’, ‘drive’, ‘drivens’, or ‘driving’, in a way that does not seem to be always consistent. As the paper advances the nomenclature seems to stabilize in ‘driven nodes’ and ‘non-driven nodes’. This is particularly inconvenient, since in the paper by Liu et al. 2011 (to which the authors refer for the methodology). The nomenclature used is of ‘driver nodes’ (=’unmatched’) and ‘matched nodes’. It is not clear from the manuscript, but it seems to me that what the authors call here ‘driven nodes’ are in fact the ‘driver nodes’ of Liu et al. Since the authors do not analyse any interpretation or implication of their identification of ‘driven nodes’, there is no way to check if my interpretation of the nomenclature is correct, although it helps to understand figures 7-9 (panels b and d).
- Continuing with the issue of controllability, note that the methodology in Liu et al. 2011 refers to ‘structural controllability’ and is based on the use of structural (physical) links between the different nodes. Here, links are of statistical nature, so that a statistical link can be originated from many different sets of structural connections. The authors do not discuss at all why the methodology used (maximum matching algorithm, etc. from Liu et al.) can be used here. Note the difference, for example, with the network constructed by Tian et al. 2014: there the links represent transport by winds, so that they can be considered ‘structural’ and not just statistical as in the present manuscript.
- The authors give no argument to justify for the use of the Louvain community detection method instead of the many others available (and some of them free of the resolution limit which affects the Louvain method). Since the resulting communities are not interpreted, it is difficult to assess if the method has achieved a good partitioning of the system.
- The role of the LSTM forecasting (which is not mentioned in the abstract) is difficult to assess: the authors do not give any hint on why the forecasting is done from/to driven to/from non-driven nodes. Is not more natural to do the forecasting exercise between all pair of nodes? And the final result, presented in Fig. 13, gives some kind of correlation as a function of distance. In which sense is this different or better than just plotting the correlation between all pair of cities as a function of its distance? As with the rest of approaches, there is no explanation to justify what is being done.
- In general, the description of the different procedures is very confusing. For example, in line 92 it is said that tau is positive, whereas in the next line tau takes positive and negative values. At the end it seems (but I am not completely sure by reading the description) that really two networks are constructed, one for tau>0 and another for tau<0. There is no reason given to associate positive values of tau to positive correlations, and negative tau to negative correlations. In equation (6) the quantities Sigma_xx, k_x, or m are not defined. Perhaps ‘i’ is ‘a’ and ‘m’ is ‘n’ …?
- Finally, there are many typos in the paper (e.g. line 22: ‘the choose of threshold’; line 25: ‘mythological’; line 37: ‘are one of the main precursor …’; line 75: ‘The The’, …). And in general the paper has not been re-read carefully: for example the sentence in lines 104-106 is clearly copied from an unrelated paper: ‘ Interventionary studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.’
In summary, the paper is not of enough quality to deserve publication in Earth System Dynamics.

Citation: https://doi.org/10.5194/esd-2024-4-RC1
RC2: 'Comment on esd-2024-4', Anonymous Referee #2, 13 May 2025

This study applies a Louvain algorithm for community detection to Ozone concentration data from Chinese cities over a period of five years and analyses this data with a maximum matching algorithm, then applies a range of evaluation procedures to assess the structure of the identified networks. The research question could be more clearly formulated. The approach to analyse ozone (or other pollutant) transport with network theory is already well established (e.g. Tian & Gunes 2014 Complex Networks V, Xiao & Lu 2020 Pollution Study). Using control network theory to identify cities with the stongest influence on the pollution dynamics of the whole network could allow for policy relevant implications in regards to potential interventions, as has for example been done with a weighted network approach for air pollution in general in Guo et al. 2022, Int. J. Environ. Res. Public Health.
There are, however, significant issues with the manuscript in its current form and I can not recommend publication in Earth System Dynamics at this point. IN summary, the manuscript lacks consistency and thoroughness in the presentation of the methodology as well as results. Most importantly, the interpretation of the results is missing and the manuscript does not allow for conclusions from the statistical findings onto actual physical processes. Below is a selection of identified issues with the manuscript that hopefully render my conclusion comprehensible.
The manuscript seems to be using non-standard nomenclature. It is not explained how "driven cities" are defined and understood across the manuscript. The terms are also used inconsistently, e.g. "driving cities" (line 50) or "O3 driven cities" (line 54), "drive nodes" (line 134). Later, the term "hub nodes" is also introduced but not defined. I am assuming that all of them refer to what is typically called "driver nodes" in network control theory (see e.g. Liu et al. 2011 Nature), where driver nodes require external input in order to control the network and allow conclusions about the controllability of the network. On several occasions abbreviations are not introduced (e.g. line 158) and internal references are incorrect (e.g. "Section II B" referenced in line 163 and "Section Ⅲ C" referenced in line 166 do not exist).
Choices for thresholds are not clearly motivated across the manuscript and seem somewhat arbitrary (Figures 1-6). Different thresholds can yield very different network topologies and community structures. While necessary for the chosen methodology, splitting into positive and negative threshold networks may oversimplify the system and I am missing a discussion of how this impacts the interpretation of the results.
The existing sections in the manuscript are not cleanly separated. In particular, the results section contains parts that rather belong to the methods (see e.g. lines 183-184), introduction (lines 201-202), or conclusion (lines 209-210, lines 246-247) sections. Lines 361-365 should be introduced in the methods rather than the results.
Further, the manuscript contains frequent grammatical and spelling errors that decrease its accessibility. For example, "mythological" (line 25) is likely meant to be "methodological". Several formulations are rather informal for a scientific paper, such as "It's glad" (line 33), "quickly" (line 108, 114), "Of course" (line 170). There are repetitions, sometimes of nearly the same sentence (see e.g. lines 108-111 and lines 112-113).
The color-coding between Figures 1, 2 and 3 is inconsistent with no good reason, making the results more difficult to interpret. They are also missing a legend for the colors, and as the colors denote regions this should be made consistent with Figure 13. The slopes in Figure 13 should be labelled within the Figure rather than described in the caption.
Overall, there is a lack of consistency in nomenclature as well as in the presentation of the results, and the frequent errors and inconsistencies give an impression of a lack of effort. The manuscript would benefit from substantial revision, including thorough editing and copy-editing. Most importantly, the technical findings on network structure are not sufficiently interpreted in the context of ozone transport between cities. Without a clear connection to the physical processes and a systematic discussion of limitations, the results risk being interpreted as statistical artifacts rather than meaningful insights on the underlying physical processes and it is accordingly not possible for me as a reviewer to judge whether meaningful insights have been found. Given these substantial issues, I do not believe the manuscript meets the standards expected of an ESD paper in its current form.

Citation: https://doi.org/10.5194/esd-2024-4-RC2

Zhi-Dan Zhao, Demei Xue, Haojun Sun, Weiping Wang, and Na Ying

Viewed

Total article views: 851 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
750	65	36	851	59	90

HTML: 750
PDF: 65
XML: 36
Total: 851
BibTeX: 59
EndNote: 90

Views and downloads (calculated since 25 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	40	5	7	52
May 2024	85	6	3	94
Jun 2024	46	4	2	52
Jul 2024	22	4	4	30
Aug 2024	37	3	3	43
Sep 2024	18	2	0	20
Oct 2024	9	2	0	11
Nov 2024	9	3	2	14
Dec 2024	14	3	0	17
Jan 2025	17	2	2	21
Feb 2025	7	2	2	11
Mar 2025	22	4	3	29
Apr 2025	16	3	1	20
May 2025	27	10	3	40
Jun 2025	20	2	2	24
Jul 2025	31	4	1	36
Aug 2025	47	5	1	53
Sep 2025	263	0	263
Oct 2025	20	1	0	21

Cumulative views and downloads (calculated since 25 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	40	5	7	52
May 2024	85	6	3	94
Jun 2024	46	4	2	52
Jul 2024	22	4	4	30
Aug 2024	37	3	3	43
Sep 2024	18	2	0	20
Oct 2024	9	2	0	11
Nov 2024	9	3	2	14
Dec 2024	14	3	0	17
Jan 2025	17	2	2	21
Feb 2025	7	2	2	11
Mar 2025	22	4	3	29
Apr 2025	16	3	1	20
May 2025	27	10	3	40
Jun 2025	20	2	2	24
Jul 2025	31	4	1	36
Aug 2025	47	5	1	53
Sep 2025	263	0	263
Oct 2025	20	1	0	21

Viewed (geographical distribution)

Total article views: 833 (including HTML, PDF, and XML) Thereof 833 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 30 Oct 2025

Download

This preprint has been withdrawn.

Preprint (2565 KB)
Metadata XML

Short summary

Understanding the dynamic characteristics of O₃ pollution is crucial for the joint prevention and control of O₃ pollution but remains a major challenge due to insufficient understanding of its driving cities. Here, using a complex network model, we identified the national O₃ pollution driving nodes and their reliability. We also demonstrated their relationship with the threshold and distance. Our work has implications for developing collaborative control policies for O₃ pollution areas.


Total:	0
HTML:	0
PDF:	0
XML:	0

Identifying the control cities of O3 Pollution using Complex networks

Interactive discussion

Interactive discussion

Viewed

Viewed (geographical distribution)

Identifying the control cities of O₃ Pollution using Complex networks