Despite the great success of machine learning, its application in climate dynamics has not been well developed. One concern might be how well the trained neural networks could learn a dynamical system and what will be the potential application of this kind of learning. In this paper, three machine-learning methods are used: reservoir computer (RC), backpropagation-based (BP) artificial neural network, and long short-term memory (LSTM) neural network. It shows that the coupling relations or dynamics among variables in linear or nonlinear systems can be inferred by RC and LSTM, which can be further applied to reconstruct one time series from the other. Specifically, we analyzed the climatic toy models to address two questions: (i) what factors significantly influence machine-learning reconstruction and (ii) how do we select suitable explanatory variables for machine-learning reconstruction. The results reveal that both linear and nonlinear coupling relations between variables do influence the reconstruction quality of machine learning. If there is a strong linear coupling between two variables, then the reconstruction can be bidirectional, and both of these two variables can be an explanatory variable for reconstructing the other. When the linear coupling among variables is absent but with the significant nonlinear coupling, the machine-learning reconstruction between two variables is direction dependent, and it may be only unidirectional. Then the convergent cross mapping (CCM) causality index is proposed to determine which variable can be taken as the reconstructed one and which as the explanatory variable. In a real-world example, the Pearson correlation between the average tropical surface air temperature (TSAT) and the average Northern Hemisphere SAT (NHSAT) is weak (0.08), but the CCM index of NHSAT cross mapped with TSAT is large (0.70). And this indicates that TSAT can be well reconstructed from NHSAT through machine learning.

All results shown in this study could provide insights into machine-learning
approaches for paleoclimate reconstruction, parameterization scheme, and
prediction in related climate research.

The coupling dynamics learned by machine learning can be used to reconstruct time series.

Reconstruction quality is direction dependent and variable dependent for nonlinear systems.

The CCM index is a potential indicator to choose reconstructed and explanatory variables.

The tropical average SAT can be well reconstructed from the average Northern Hemisphere SAT.

Applying neural-network-based machine learning in climate fields has attracted great attention (Reichstein et al., 2019). A machine-learning approach can be applied to downscaling and data mining analyses (Mattingly et al., 2016; Racah et al., 2017) and can also be used to predict the time series of climate variables, such as temperature, humidity, runoff, and air pollution (Zaytar and Amrani, 2016; Biancofiore et al., 2017; Kratzert et al., 2019; Feng et al., 2019). Besides, previous studies found that some temporal dynamics of the underlying complex systems can be encoded in these climatic time series. For example, chaos is a crucial property of climatic time series (Lorenz, 1963; Patil et al., 2001). Thus, there is significant concern regarding the ability of machine-learning algorithms to reconstruct the temporal dynamics of the underlying complex systems (Pathak et al., 2017; Du et al., 2017; Lu et al., 2018; Carroll, 2018; Watson, 2019). The chaotic attractors in the Lorenz system and the Rossler system can be reconstructed by machine learning (Pathak et al., 2017; Lu et al., 2018; Carroll, 2018), and the Poincaré return map and Lyapunov exponent of the attractor can be recovered as well (Pathak et al., 2017; Lu et al., 2017). These results are important to deeply understand the applicability of machine learning in climate fields.

Though applying machine learning to climate fields has been attracting much attention, there are still open questions about what can be learned by machine learning during the training process and what is the key factor determining the performance of the machine-learning approach to climatic time series. These issues are crucial for investigating why machine learning cannot perform well with some datasets and how to improve the performance for them. One possible key factor is the coupling between different variables. Because different climate variables are coupled with one another in different ways (Donner and Large, 2008), the coupled variables will share their information content with one another through the information transfer (Takens, 1981; Schreiber, 2000; Sugihara et al., 2012). Furthermore, a coupling often results in the fact that the observational time series are statistically correlated (Brown, 1994). Correlation is a crucial property for the climate system, and it often influences the analysis of climatic time series. The Pearson coefficient is often used to detect the correlation, but it can only detect the linear correlation. It is known that when the Pearson correlation coefficient is weak, most of the tasks based on traditional regression methods will fail at dealing with the climatic data, such as fitting, reconstruction, and prediction (Brown, 1994; Sugihara et al., 2012; Emile-Geay and Tingley, 2016). However, a weak linear correlation does not mean that there is no coupling relation between the variables. Previous studies (Sugihara et al., 2012; Emile-Geay and Tingley, 2016) have suggested that although the linear correlation of two variables is potentially absent, it might be nonlinearly coupled. For instance, the linear cross-correlations of sea air temperature series observed in different tropical areas are overall weak, but they can be strong locally and vary with time (Ludescher et al., 2014); such a time-varying correlation is an indicator of nonlinear correlation (Sugihara et al., 2012). These nonlinear correlations of the sea air temperature series have been found to be conductive to the better El Niño predictions (Ludescher et al., 2014; Conti et al., 2017). The linear correlations between the ENSO/PDO index (El Niño–Southern Oscillation and Pacific Decadal Oscillation) and some proxy variables are also overall weak, but nonlinear coupling relations between them can be detected and contribute greatly to reconstructing longer paleoclimate time series (Mukhin et al., 2018). These studies indicate that nonlinear coupling relations would contribute to the better analysis, reconstruction, and prediction (Hsieh et al., 2006; Donner, 2012; Schurer et al., 2013; Badin et al., 2014; Drótos et al., 2015; Van Nes et al., 2015; Comeau et al., 2017; Vannitsem and Ekelmans, 2018). Accordingly, when applying machine learning to climatic series, is it necessary to pay attention to the linear or nonlinear relationships induced by the physical couplings? This question is what we want to address in this study.

In a recent study (Lu et al., 2017), a machine-learning method called
reservoir computer was used to reconstruct the unmeasured time series in the
Lorenz 63 model (Lorenz, 1963). It was found that the

In this paper, we apply machine-learning approaches to learn the coupling relation between climatic time series (training period) and then reconstruct the series (testing period). Specifically, we aim to make progress on how the machine-learning approach is influenced by the physical couplings of climatic series, and the abovementioned questions are addressed. There are several variants of machine-learning methods (Reichstein et al., 2019), and recent studies (Lu et al., 2017; Reichstein et al., 2019; Chattopadhyay et al., 2020) suggest that three of them are more applicable to sequential data (like time series): reservoir computer (RC), backpropagation-based (BP) artificial neural network, and long short-term memory (LSTM) neural network. Here we adopt these three methods to carry out our study and provide a performance comparison among them. We first investigate their performance dependence on different coupling dynamics by analyzing a hierarchy of climatic conceptual models. Then we use a novel method to select explanatory variables for machine learning, and this can further detect the nonlinear observability (Hermann and Krener, 1977; Lu et al., 2017) for a complex system without any known explicit equations.

Finally, we will discuss a real-world example from the climate system. It is known that there exist atmospheric energy transportation systems between the tropics and the Northern Hemisphere, and this can result in coupling between the climate systems in these two regions (Farneti and Vallis, 2013). Due to the underlying complicated processes, it is difficult to use a set of formulas to cover the coupling relation between the average tropical surface air temperature (TSAT) series and the Northern Hemisphere surface air temperature (NHSAT) series. We employ machine-learning methods to investigate whether the NHSAT time series can be reconstructed from the TSAT time series and whether the TSAT time series can also be reconstructed from the NHSAT time series. In this way, the conclusions from our model simulations can be further tested and generalized.

Our paper is organized as follows. In Sect. 2, the methods for reconstructing time series and detecting coupling relation are introduced. The analyzed data and climate conceptual models are described in Sect. 3. In Sect. 4, we will investigate the association between the coupling relation and reconstruction quality by machine learning and present an application to real-world climate series. Finally the summary is given in Sect. 5.

Firstly, we introduce our workflow for learning couplings of dynamical
systems by machine learning and reconstructing the coupled time series. The
total time series can be divided into two parts: the training series (time
lasting denoted as

During the training period,

The second step is accomplished with the testing series to apply the
inferred coupling relation

The first objective of this study is to answer the question of whether the coupling
relation

If machine learning can infer the intrinsic coupling relation
between

Diagram illustration for reconstructing time series by machine
learning. (1) The available part of the dataset

A newly developed neural network called RC (Du et al., 2017; Lu et al.,
2017; Pathak et al., 2018) has three layers: the input layer, the reservoir
layer, and the output layer (see Fig. 2). If

After this reservoir neural network is trained, we can use it to estimate

Schematic of the RC neural network: the three layers are the input
layer, reservoir layer, and output layer. The input layer consists of a matrix

Here, the used BP neural network is a traditional neural computing
framework, and it has been widely used in climate research (Watson, 2019;
Reichstein et al., 2019; Chattopadhyay et al., 2020). There are six layers
in the BP neural network: the input layer has 8 neurons and four hidden layers
with 100 neurons each; the output layer has 8 neurons. In each layer, the
connectivity weights of the neurons need to be computed during the training
process, where the backpropagation optimization with the complicated
gradient decent algorithm is used (Dueben and Bauer, 2018). A crucial
difference between the BP and the RC neural networks is as follows: unlike
RC, all neuron states of the BP neural network are independent of the
temporal variation of time series (Reichstein et al., 2019; Chattopadhyay et
al., 2020), while the neurons of RC can track temporal evolution (such as
the neuron state

The LSTM neural network is an improved recurrent neural network to deal with
time series (Reichstein et al., 2019; Chattopadhyay et al., 2020). As Fig. 3
shows, LSTM has a series of components: a memory cell, an input gate, an
output gate, and a forget gate. When a time series

Schematic of the LSTM architecture. LSTM has a memory cell, an input gate, an output gate, and a forget gate to control the information of the previous time to flow into the neural network.

The crucial improvement of LSTM on the traditional recurrent neural network (Reichstein et al., 2019) is that LSTM has a forget gate which controls the information of the previous time to flow into the neural network. This will enable the neuron states of LSTM to track the temporal evolution of time series (Kratzert et al., 2019; Reichstein et al., 2019; Chattopadhyay et al., 2020), and this is the crucial difference between the LSTM and the BP neural networks.

Here, we also test the LSTM neural network without the forget gate and call
it LSTM

The root-mean-square error (RMSE) of residuals is used here to evaluate the
quality of reconstruction (Hyndman and Koehler, 2006). The residual
represents the difference between the real series

As mentioned in the introduction, the linear Pearson correlation is a
commonly used method to quantify the linear relationship between two
observational variables. The Pearson correlation between two series,

The terms “mean” and “SD” denote the average and standard deviation for the series, respectively.

To measure the nonlinear coupling relation between two observational
variables, we choose the convergent cross mapping (CCM) method that has been
demonstrated to be useful for many complex nonlinear systems (Sugihara et
al., 2012; Tsonis et al., 2018; Zhang et al., 2019). Considering

Embedding

Estimating the weight parameter

Cross mapping the value of

The cross mapping skill from

According to previous studies (Sugihara et al., 2012; Ye et al., 2015), the
CCM index is related to the ability of using one variable to reconstruct
another variable: if

For a linearly coupled model, the autoregressive fractionally integrated moving
average (ARFIMA) model (Granger and Joyeux, 1980) maps a Gaussian white
noise

For a nonlinearly coupled model, the Lorenz 63 chaotic system (Lorenz, 1963)
depicts the nonlinear coupling relation in a low-dimensional chaotic system.
The system reads as

For a high-dimensional model, the two-layer Lorenz 96 model (Lorenz, 1996) is a
high-dimensional chaotic system, and it is commonly used to mimic
midlatitude atmospheric dynamics (Chorin and Lu, 2015; Hu and Franzke,
2017; Vissio and Lucarini, 2018; Chen and Kalnay, 2019; Watson, 2019). It
reads as

TSAT, NHSAT, and the Nino 3.4 index are chosen as the example from real-world
climatic time series used for reconstruction analysis. The original data were
obtained from the National Centers for Environmental Prediction (NCEP)
(

For the training and testing datasets before analysis, all the used time series are standardized to take zero mean and unit variance so that any possible impact of mean and variance on the statistical analysis is avoided (Brown, 1994; Hyndman and Koehler, 2006; Chattopadhyay et al., 2020). The total series were divided into two parts: 60 % of the time series for training the neural network and 40 % for the testing series. Specific data lengths of the training series and testing series will also be listed in the results section.

We first consider the simplest case: the linear coupling relation between
two variables. Here, two time series

Details of reconstructing ARFIMA (3, 0.2, 3).

Detailed comparisons between the real and reconstructed series are shown in
Fig. 4c and d. When

It is known that a strong linear correlation is useful for training neural
networks and reconstructing time series. When the linear correlation between
variables is very weak, could these machine-learning methods be applied to
learn the underlying coupling dynamics? To address this question, two
nonlinearly coupled time series,

There is a very weak overall linear correlation between variables

Details of Lorenz 63 system reconstruction.

Details of reconstructing the Lorenz 96 model.

In a nonlinearly coupled system, it is known that the coupling strength
between two variables cannot be estimated by the linear Pearson correlation
(Brown, 1994; Sugihara et al., 2012). Here, we use CCM to estimate the
coupling strength between

Figure 5b shows the results of reconstructing

As mentioned in Sect. 2.2, a BP neural network does not track the temporal
evolution, since its neuron states are independent of the temporal variation
of time series. For LSTM

From the above results, it is revealed that RC and LSTM are able to learn both linear and nonlinear coupling relations, and the coupled time series can be well reconstructed. In this section, we further investigate what factors can influence the reconstruction quality.

When reconstructing time series of the linear model of Eq. (11), it can be
found that the reconstruction is bidirectional (see Fig. 4d and Table 1):
one variable can be taken as an explanatory variable to reconstruct another
variable well; oppositely, it can also be well reconstructed by another
variable. Furthermore, when the linear correlation is weak but the nonlinear
coupling is strong, will the bidirectional reconstruction be still allowed?
The answer is usually no! For example, when comparing the reconstruction
quality of reconstructing

Therefore, we further discuss how to select the suitable explanatory
variable or the reconstruction direction. Tables 1 and 2 show that the
reconstruction quality in a linear coupled system highly depends on the
Pearson correlation; however, it is different for a nonlinear system. For the
Lorenz 63 system, the bidirectional CCM coefficients between the variables

Choosing direction and variable is important for the application of neural networks in reconstructing nonlinear time series, but this is derived from the low-dimensional Lorenz 63 system. In this subsection, we present the results from a high-dimensional chaotic system of the Lorenz 96 model. Furthermore, we will investigate the association between the CCM index and reconstruction quality in the machine-learning frameworks.

Firstly, we use variables

Scatter plot of nRMSE values and the CCM index values. Blue and gray dashed lines are the fitted linear trends for the scatters.

The reconstruction between

Influence of strong nonlinear coupling on linear Pearson
correlation and machine-learning performance.

Besides, the number of the chosen explanatory variables also influences the
reconstruction quality. If more than one explanatory variable in the same
layer is used, the reconstruction of

In the above results, the CCM index is used to select explanatory variable
for RC and LSTM. Now we employ more variables to test the association
between the CCM index of the data and the performance of RC and LSTM. The
values of the CCM index are calculated between

In nonlinear systems, the performance of reconstruction through BP and LSTM

The experiment is set up as follows: in Eq. (13), the value of

Details of the temperature record reconstruction.

However, RC and LSTM are not restricted to the Pearson correlation in this
nonlinear system. When

The natural climate series are usually nonstationary and are encoded with the information of many physical processes in the earth system. In the following, we illustrate the utility of the above methods and conclusions by investigating a real-world example.

The daily NHSAT and TSAT time series are shown in Fig. 10a. There are quite different temporal patterns in the NHSAT and TSAT series, with a weak linear correlation (0.08; see Table 4) between them. In the scatter plot for the NHSAT and TSAT (Fig. 10b), the marked nonlinear structure is observed between NHSAT and TSAT. Such a weak linear correlation will make the linear regression method fail to reconstruct one series from the other. Meanwhile, there is no explicit physical expression that can transform TSAT and NHSAT to each other. Now we try to use machine learning to learn their coupling between them and then to reconstruct these climate series. The CCM index when NHSAT cross maps TSAT is 0.70, and the CCM index when TSAT cross maps NHSAT is 0.24 (Table 4). The CCM index means that the information content of TSAT is well encoded in the records of NHSAT, and the information transfer might be mainly from TSAT to NHSAT. This finding is consistent with previous studies (Vallis and Farneti, 2009; Farneti and Vallis, 2013). Further, the CCM analysis indicates that the reconstruction from NHSAT to TSAT might obtain a better quality than that from the opposite direction.

The results validate our conjecture that the nRMSE of reconstruction from NHSAT to TSAT is lower than that from TSAT to NHSAT (Table 4). By using RC, the TSAT time series can be relatively well described by the reconstructed ones (Fig. 11a), with nRMSE equal to 0.13. This nRMSE is a bit high because some extremes of the TSAT time series have not been well described (Fig. 11b). When using TSAT to reconstruct the time series of NHSAT, the reconstructed time series cannot describe the real time series of NHSAT (Fig. 11c), and the corresponding nRMSE is equal to 0.21. Besides, we also use LSTM and BP to reconstruct these natural climate series; the performances of these two neural networks are worse than RC (Table 4). For BP, this worse performance may be due to its inability to deal with nonlinear coupling. LSTM performs worse than RC in this real-world case, which might be induced by the used simple variant of LSTM architecture.

We can further improve the reconstruction quality of TSAT. Considering that the tropical climate system interacts not just with the Northern Hemisphere climate system, we can use the information of other systems to improve the reconstruction. Looking at the time series of the Nino 3.4 index (Fig. 10a), some of its extremes occur at the same time intervals as the extremes of TSAT. Moreover, when the Nino 3.4 index is included in the scatter plot (Fig. 10c), a nonlinear attractor structure is revealed. We combine NHSAT with the Nino 3.4 index to reconstruct the time series of TSAT through RC. The reconstructed TSAT (Fig. 11e) is much closer to the real TSAT series, and the corresponding nRMSE has been reduced to 0.08.

Finally, we make a further comparison between the real TSAT and the reconstructed TSAT. (i) The annual variations of the reconstructed TSAT are close to those of the real TSAT (Fig. 12a). (ii) The power spectra of TSAT and the reconstructed TSAT are compared in Fig. 12b, and the main deviation occurs at the frequency bands of 0–15 d. The reason might be that the local weather processes are not input into this RC reconstruction. This conjecture can be further confirmed through a red-noise test with response time of 15 d for the residual series (this red-noise test is the same as the method used in Roe, 2009). All data points of the residual series lie within the confidence intervals (Fig. 12c). This means that the residual is possibly induced by local weather processes, and this information is not input into RC for the reconstruction.

In this study, three kinds of machine-learning methods are used to reconstruct the time series of toy models and real-world climate systems. One series can be reconstructed from the other series by machine learning when they are governed by the common coupling relation. For the linear system, variables are coupled through the linear mechanism, and a large Pearson coefficient can benefit machine learning with bidirectional reconstruction. For a nonlinear system, the coupled time series often have a small Pearson coefficient, but machine learning can still reconstruct the time series when the CCM index is strong; moreover, the reconstruction quality is direction dependent and variable dependent, which is determined by the coupling strength and causality between the dynamical variables.

Choosing suitable explanatory variables is crucial for obtaining a good reconstruction quality. But the results show that machine-learning performance cannot be explained only by the linear correlation. In this study, we suggest to use the CCM index to select explanatory variables. Especially for the time series of nonlinear systems, the strong CCM index can be taken as a benchmark to select an explanatory variable. When the CCM index is higher than 0.5 in this study, the nRMSE is often smaller than 0.1, with the reconstructed series very close to the real series in the presented results. Thus, the CCM index higher than 0.5 may be considered a criterion for choosing appropriate explanatory variables. It is well known that atmospheric or oceanic motions are nonlinearly coupled over most timescales; therefore, in the natural climate series, there would be similar nonlinear coupling relations as found in the Lorenz 63 and the Lorenz 96 systems (weak Pearson correlation but high CCM coefficient). If only Pearson coefficient is used to select the explanatory variable, then some useful nonlinearly correlated variables may be left out.

Finally, it is worth noting the potential application for machine learning
in climate studies. For instance, when a series

If

The code and data used in this paper are available on request from the authors.

YH, LY, and ZF contributed to the design of this study and the preparation of the article.

The authors declare that they have no conflict of interest.

The authors thank the editor, the two anonymous reviewers, and Zhixin Lu for their constructive suggestions. We also thank Christian L. E. Franzke and Naiming Yuan for their in-depth and helpful discussions. We acknowledge support from the National Natural Science Foundation of China (through grant nos. 41675049 and 41975059).

This research has been supported by the National Natural Science Foundation of China (grant nos. 41475048, 41675049, and 41975059).

This paper was edited by C. T. Dhanya and reviewed by two anonymous referees.