Reconstructing coupled time series in climate systems by machine learning

6 Despite the great success of machine learning, its applications in climate dynamics have not been 7 well developed. One concern might be how well the trained neural networks could learn a dynamical 8 system and what can be the potential applications of this kind of learning. Detailed studies show that 9 the coupling relations or dynamics among variables in linear or nonlinear systems can be well learnt 10 by reservoir computer (RC) and long short-term memory (LSTM) machine learning, and these learnt 11 coupling relations can be further applied to reconstruct one series from the other dominated by 12 common coupling dynamics. In order to validate the above conclusions, toy models are applied to 13 address the following three questions: (i) what can be learnt from different dynamical time series by 14 machine learning; (ii) what factors significantly influence machine learning reconstruction; and (iii) 15 how to select suitable explanatory or input variables for the reconstructed variable for machine 16 learning. The results from these toy models show that both RC and LSTM can indeed learn coupling 17 relations among variables, and the learnt implicit coupling relation can be applied to accurately 18 reconstruct one series from the other. Both linear and nonlinear coupling relations between variables 19 can influence the quality of the reconstructed series. If there is a strong linear coupling between 20 variables, all of variables can be taken as explanatory variables for the reconstructed variable, and the 21 https://doi.org/10.5194/esd-2019-63 Preprint. Discussion started: 4 November 2019 c © Author(s) 2019. CC BY 4.0 License.

reconstruction can be bi-directional. However, when the linear coupling among variables is much 22 weaker, but with stronger nonlinear causality among variables, the reconstruction quality is direction-23 dependent and it may be only uni-directional. We propose using convergent cross mapping causality To this question, we first discuss a real-world example from climate system. As we know, there 78 exists coupling in the atmospheric motions between the tropics and the Northern Hemisphere, which 79 is realized through the transfer of atmospheric energy (Farneti and Vallis, 2013). Due to the underlying 80 complicated processes, it is difficult to use a formula to cover this coupling between the tropical 81 average surface air temperature (TSAT) series and the Northern Hemispheric surface air temperature 82 (NHSAT) series. In the result part, it will be shown that the linear Pearson correlation between the 83 TSAT and NHSAT is very weak, but we can still use machine learning to well reconstruct the TSAT 84 series from the NHSAT series. However, the NHSAT series cannot be reconstructed from the TSAT In this paper, to make progress towards understanding how machine learning approach is 90 influenced by the physical couplings of climatic series, we will investigate the machine learning 91 performances and their dependence on different coupling dynamics in climatic conceptual models. It 92 will be demonstrated that machine learning is able to "learn" these couplings, and the "learnt" coupling 93 relation are very useful for reconstructing climatic series. Moreover, our study also analyzes the 94 features of linear relations and nonlinear relations in climate systems, and a method to select 95 explanatory variables for machine learning is proposed. Hence, the underlying mechanism of the 96 abovementioned NHSAT-TSAT reconstruction will be also explained.

97
Our paper is organized as follows. In section 2, the methods about reconstructing time series and 98 data are introduced. In section 3, the ability of machine learning to "learn" different coupling relations 99 of climate system, and the potential application of machine learning to reconstruct climate series, are 100 shown.
t a t a t a n and the learnt coupling relation F have been 120 taken into account.

121
(iii) The first objective of this study is to answer whether the coupling relation can be learnt by machine learning, i.e., whether the learnt coupling 123 relation F can well approximate the real coupling relation F . Since we do not intend to reach an 124 explicit formula of the learnt coupling relation F , we will answer this question indirectly by 125 comparing the reconstructed series 136 an(t')} is available which is input into the trained neural network, and the unknown series b(t') can be 137 reconstructed, denoted as Inspired by several recent studies (As discussed in the introduction), we focus on employing the 141 commonly-used machine learning frameworks for time series to accomplish our study: reservoir  The first layer is the listening reservoir. A time series a(t) (here for brevity, only one 151 dimensional series is considered) is first input into the listening reservoir. The function "f" governing 152 the listening reservoir training process is defined as follows: Here ) (t r represents the initial state of the listening reservoir, and * () rt is the updated state of ) (t r .   Here the symbol  is the L2-norm of a vector (L2 denotes the least square method) and α is the L2 173 regularization (ridge regression) coefficient.

174
The third layer is the prediction reservoir. As Fig.2 shows, the prediction reservoir consists 175 of the function "f" and "Ψ". Actually this is a feedback loop to the former two layers. In this reservoir 176 layer, the input data is rt rather than a(t). Finally, the output of this layer is the reconstruction 177 for the target series b(t), which is denoted as

Evaluation of reconstruction quality
To evaluate the quality of reconstruction by machine learning, the root mean squared error 185 (RMSE) of residual series (Hyndman and Koehler, 2006) is adopted, which represents the difference In the results section, it can be observed that the good reconstruction quality corresponds to the 192 nRMSE value which is smaller than 0.1.  system (Lorenz, 1963) depicts the nonlinear coupling relation in a low-dimensional chaotic system.

208
The system reads When the parameters are fixed at ( , , ) (10, 28,8/ 3) B   , the state in the system is chaotic. We 211 employed the Runge-Kutta integral of the fourth order to acquire the series output from Lorenz63.

212
The time steps were 0.01.

213
A high-dimensional model: The two-layer Lorenz 96 (in the following referred as to Lorenz96) 214 model (Lorenz, 1996) is a high-dimensional chaotic system, which is generally employed to mimic In the first layer of the Lorenz 96 there are 18 variables marked as Xk (k is any integer of interval [1, 219 18]), and each Xk is coupled with Yk, j (Yk, j is from the second layer). The parameters are set as fellows: 220 J = 20, h1 = 1, h2 = 1, and F=10. The scale parameter controls the scale separation of the two layers.

221
When > 1, processes in the second layer will be slower than processes in the first layer because  First of all, we consider the simplest case: the linear coupling relation between two variables.

243
Quantifying the linear coupling strength: To quantify the linear correlation between two series 244 a(t) and b(t), the Pearson correlation coefficient is adopted and is defined as In Eq. (9), the symbols "mean" and "std" denote the average and standard deviation for series a(t) and 247 b(t), respectively.

248
Here, the ARFIMA (3, 0.2, 3) model is taken as an example for "learning" strong linear coupling 249 between two variables. Obviously, there are different temporal structures in x(t) and ε(t), especially 250 for their large-scale trends (Fig. 3a) and power spectra (Fig. 3b). The marked difference between x(t) 251 and ε(t) is in their low-frequency variations, and there are more low-frequency and larger-scale relation between x(t) and ε(t). Then, the learnt coupling together with ε(t') is used to reconstruct x(t'). 256 The reconstruction results and the performance of different machine learning frameworks are listed in 257  Amp.
Amp. reconstruction quality is also well. The best performance of LSTM among the given machine learning 274 frameworks benefits from its network updating all the time. When updating is stopped, then the 275 reconstruction error of LSTM* is no longer better than that of the RC, and is just a little bit better than 276 that of BP (see Table 1). Therefore, if the training of machine learning frameworks is only carried out 277 on the training series and there is no further updating, the RC is the best machine learning framework.   (Sugihara, 1994), and detailed algorithm for CCM can be found in a 296 previously reported study (Tsonis et al., 2018).

297
When the linear correlation between variables is very weak, could machine learning frameworks 298 still be applied to learn the underlying coupling dynamics? To address this question, we turn to the 299 Lorenz 63 system (Lorenz, 1963). There is a very weak linear correlation between variables X and Z 300 (with a Pearson correlation coefficient of 0.002) in the Lorenz63 model (see Table 2), and such a weak 301 linear correlation is induced by the unstable local correlation between variables X and Z (see     Different from the case of linear system with strong linear coupling, the reconstruction for the 320 time series of the Lorenz63 system depends on the used machine learning frameworks (Fig. 4b). The

331
From the above results, it is found that machine learning framework is able to learn different

338
When reconstructing the time series of linear model of Eq. (6), it can be found that the 339 reconstruction is invertible (see Fig. 3d (Brown, 1994). Hence, when the linear coupling is weak but the nonlinear coupling is strong, will the 344 bi-directional reconstruction still be allowed? The answer is usually no. For example, when comparing 345 the reconstruction quality of reconstructing variable Z from variable X (Fig. 4b) with that of 346 reconstructing variable X from variable Z (Fig. 4c) variable Z (Fig. 4b and 4c). CCM index can be taken as a potential precursor to determine the 357 explanatory variable and reconstructed variable for this nonlinear system. Actually, for most of

360
Here the asymmetric reconstruction quality is just caused by the asymmetric causality between 361 variables.

362
The cases of high-dimensional system: It is noted that the above conclusions and results are 363 derived from the low-dimensional chaotic system (Eq. (7)). It is also interesting to know whether they 364 are also applicable to a high-dimensional chaotic system of Lorenz 96 model. Since BP and LSTM* 365 fully fail in reconstructing for Lorenz63 system (Not shown here), which suggests that they are not 366 suitable for handling nonlinear systems, so the results from LSTM and RC will be shown next. Firstly, we use variables X1 and Y1,1 in Eq.(8) to illustrate the direction dependence in the high-377 dimensional system. Details of X1 and Y1,1 are shown in Fig. 6a, and the Pearson correlation between 378 X1 and Y1,1 is weak (only -0.11, see Table 3). For the given value of each parameter in Eq. (8), the 379 forcing from X1 to Y1,1, is much stronger than the forcing from Y1,1 to X1. Hence, CCM index shows 380 the asymmetric causality: reconstructing X1 from Y1,1 is nRMSE=0.01, and in the opposite direction it is nRMSE=0.06 (Table   383 3), which is consistent with the indication of CCM index (Similar results can be found in Figs. 6b and 384 6c).  correlation is very weak (see Table 3). indicate that LSTM will perform worse in some cases than RC, and the best choice of machine learning 394 framework in data reconstruction is RC.

395
Secondly, it is also found that the reconstruction quality is influenced by the chosen explanatory 396 variables: The quality of reconstructing X1 from Y1,1 is better than the quality of reconstructing X1 from 397 X2 by (see Fig. 6b and 6c). Then, if more than one explanatory variable in the same layer are considered, reconstruction reaches lower error than those from unit-variable reconstruction. From Table 1 to The setting of Eq. (8) is as follows: the value of h1 is set as 0, and the value of is decreased 411 from 0.7 to 0.3. When is equal to 0.7, the forcing from X1 to Y1,1 is weak. At that time, the Pearson 412 correlation between X1 and Y1,1 is only 0.48, and the performances of BP and LSTM* are not good.

413
When is equal to 0.3, the forcing is dramatically magnified. As the second panel of Fig. 7a shows, However, the RC is not so much restrained by the Pearson correlation. When is equal to 0.7 430 or 0.3, the values of CCM index are both higher than 0.9, that is to say, the nonlinear coupling strength 431 is not changed by . Then, it can be found that the quality of reconstructed X1 by RC is always good.

432
As Fig. 7b shows, the green dots (RC output) in Fig. 7b  trend of the blue cycles, and this dependency trend is significant because the p-value is much smaller than 0.05. 443

444
The natural climate series are usually nonstationary, and are encoded with the information of 445 many physical processes in the earth system. In the following, we illustrate the utility of the above The daily NHSAT and TSAT time series are shown in Fig. 9a. It is quite different for the 456 oscillation shapes of the NHSAT and TSAT series, and there is a weak linear correlation (0.08, see 457 Table 4) between them. In the scatter plot for the NHSAT and TSAT (Fig.9b), the marked nonlinear 458 structure is observed between NHSAT and TSAT. Such a weak linear correlation will make the linear  (Table 4). CCM index indicates that 463 the reconstruction from NHSAT to TSAT will obtain a better quality than the opposite direction.
464  (Fig.10a). But the corresponding nRMSE is equal to 0.13, this is because some extremes of the TSAT 467 time series have not been described (Fig.10b). When using TSAT to reconstruct the time series of 468 NHSAT, the reconstructed time series cannot describe the original time series of NHSAT (Fig.10c), 469 and the corresponding nRMSE is equal to 0.21. Besides, we also use the LSTM and BP to reconstruct 470 these natural climate series, the performances of these two neural networks are worse than RC (Table   471   4). For BP, this might be due to its inability to deal with nonlinear coupling. As for LSTM, the unstable 472 variance might influence its performance because it is heavily impacted by the unstable variance. We can also improve the reconstruction quality of TSAT. Considering that the tropics climate Nino3.4 is included into the scatter plot (Fig. 9c), a nonlinear attractor structure is revealed. We 487 combine NHSAT with Nino 3.4 index to reconstruct the time series of TSAT, and the reconstruction 488 is by means of the RC machine learning. The reconstructed TSAT (Fig. 10e)   noise test is the same as the method used in Roe, 2009). The gray shadow in Fig. 11c gives the 99% 503 confidence interval of red-noise process, and the black line is the residual series. All data points of the 504 residual series lie within the confidence intervals, and this means, the residual is possibly induced by 505 local weather processes that is not input into the RC. In this study, three kinds of machine learning frameworks is used to reconstruct the time series 508 of toy models and real-world climate systems. One series can be reconstructed from the other series 509 by machine learning when they are governed by the common coupling relation. For the linear system, 510 variables are coupled by the linear mechanism, and a strong Pearson correlation benefits to machine 511 learning with bi-directional reconstruction. For a nonlinear system, the time series often have a weak 512 Pearson correlation, but the machine learning can still well reconstruct the time series when two 513 variables share the common information through their interactions; moreover, the reconstruction 514 quality is direction-dependent and variable-dependent, which is determined by the coupling strength 515 and causality between the dynamical variables.

516
Considering the reconstruction quality dependency, selecting the suitable explanatory variables 517 is crucial for obtaining a good reconstruction quality. Hence, we propose using CCM index to select Lyapunov exponent estimating difficult (Kantz and Schreiber, 2005). In this study, it has been shown 533 that the correct direction for reconstruction can be selected by CCM index without estimating the 534 correct Lyapunov exponent.