An interaction network perspective on the relation between patterns of sea surface temperature variability and global mean surface temperature

Abstract. On interannual- to multidecadal timescales variability in sea surface temperature appears to be organized in large-scale spatiotemporal patterns. In this paper, we investigate these patterns by studying the community structure of interaction networks constructed from sea surface temperature observations. Much of the community structure can be interpreted using known dominant patterns of variability, such as the El Nino/Southern Oscillation and the Atlantic Multidecadal Oscillation. The community detection method allows us to bypass some shortcomings of Empirical Orthogonal Function analysis or composite analysis and can provide additional information with respect to these classical analysis tools. In addition, the study of the relationship between the communities and indices of global surface temperature shows that, while El Nino–Southern Oscillation is most dominant on interannual timescales, the Indian West Pacific and North Atlantic may also play a key role on decadal timescales. Finally, we show that the comparison of the community structure from simulations and observations can help detect model biases.


Introduction
An important issue in climate research is to understand the behavior of the global mean surface temperature (GMST) over the last century (Sutton et al., 2007).Both internal variability and changes in radiative forcing, in particular by anthropogenic emissions of greenhouse gases (GHG), contribute to changes in GMST.The impact of GHG forcing has been extensively studied, being highly relevant to cli-mate sensitivity and hence for projections on future changes of the GMST.The important role of internal variability of ocean heat storage on GMST has been highlighted in the recent study by Balmaseda et al. (2013).
On interannual-to multidecadal timescales, climate variability appears to be organized into large-scale patterns with the El Niño/Southern Oscillation (ENSO) and the Atlantic Multidecadal Oscillation (AMO) as prominent examples.These phenomena are characterized by well-defined spatiotemporal patterns in sea surface temperature (SST).The ENSO variability in the equatorial Pacific is the dominant pattern of variability on interannual timescales (Mcphaden et al., 1998;Wang et al., 2004).ENSO influences the climate of many regions over the globe (Alexander and Bladé, 2002;Deser et al., 2010) and the El Niño 1997-1998 event was estimated to have caused a GMST increase of about 0.6 • C (Trenberth, 1997).The AMO is the dominant pattern of SST variability in the North Atlantic on decadal-to multidecadal timescales (Enfield, 2001).The AMO is thought to be strongly related to variations in the Atlantic Meridional Overturning Circulation (Srokosz et al., 2012), which affect the oceanic meridional heat transport.Sutton and Hodson (2005) show that the AMO has an influence on European Summer temperatures and relations of the AMO with US rainfall were suggested in Enfield (2001).Recently, it was suggested that the variability of global land surface temperatures (GLST) is strongly connected to the AMO (Canty et al., 2013), with a correlation coefficient of 0.65 ± 0.04 (Muller et al., 2013).
In most studies so far on the connection between patterns of variability and GMST, variants of Empirical Orthogonal A. Tantet and H. A. Dijkstra: Network view of SST variability Function analysis (EOF, von Storch and Zwiers, 1999a) have been used to identify spatial patterns of variability.The disadvantage of EOFs, when computed from the global SST field, is that only the dominant mode can be clearly associated to a spatiotemporal pattern of variability.Higherorder modes, which are required to be orthogonal to the first mode, are usually difficult to relate to the patterns of variability as known from regional EOF analyses (Monahan et al., 2009).It is therefore still quite unclear how and how much the different patterns and their interaction contribute to GMST variability.
Here we address the problem on the connection between patterns of SST variability and GMST using complex networks theory.Although this theory has already been successfully applied to many different technological and scientific problems (e.g. in computer sciences, neurosciences, social sciences) it has only recently that it been used in climate research.So-called interaction networks, where links are based on a correlation measurement between variables at specific locations, have been reconstructed to analyze the connectivity of the climate system (Tsonis and Roebber, 2004;Tsonis et al., 2010;Donges et al., 2009b, a), teleconnections (Tsonis et al., 2008), the behavior of El Niño (Gozolchiani et al., 2008(Gozolchiani et al., , 2011;;Tsonis and Swanson, 2008;Yamasaki et al., 2008), synchronization between different spatiotemporal patterns (Tsonis et al., 2007;Wyatt et al., 2011) and the connections between the variability in different climate variables (Donges et al., 2011).
One of the many interesting properties of a network is its possible partition into communities or groups of highly connected nodes which are only weakly connected to the rest of the network.Community detection has recently been applied to climate interaction networks of different atmospheric variables from observations and simulations by (Tsonis et al., 2010) where some of the known patterns of atmospheric variability, such as the North Atlantic Oscillation, could be identified.
We focus in this study on communities in interaction networks reconstructed from global SST observations.The data, their preprocessing and the network reconstruction and analysis tools are presented in Sect. 2. In Sect.3, results on the SST communities in the reconstructed networks are presented and interpreted, and connected to results in the literature based on EOF and composite analysis.Then in Sect.4, we address the connection between the SST communities and the GMST and also compare the results from observations with those from Earth system model (ESM) simulations.In the Sect.5, the main results are summarized and discussed with a focus on the benefits of the interaction network approach.
2 Data and methods

Data and network reconstruction
Monthly mean SST observations over the period 1870 to 2011 compiled in the HadISST data set (Rayner, 2003) were used.The seasonal cycle was removed from the linearly detrended monthly data.Even though the leading order effect of the annual cycle is removed by producing anomaly values, boreal winter anomalies are still larger than in summer.In order to avoid spurious high values of correlations, Tsonis and Roebber ( 2004) and Tsonis et al. (2010) only used the December-January-February months of each year.Since our results were not significantly affected by the selection of the winter months, we decided to use complete years.The detrended monthly anomalies were then low-pass filtered via a Lanczos filter (Duchon, 1979) with a cutoff frequency of 1/13 month −1 and order 144.
Because more grid points are located towards the poles on a regular longitude-latitude grid, which could lead to biases in our network measures, the data was linearly interpolated on a sinusoidal grid in order to conserve the arc length with latitude.In this study, we used a resolution at the equator of 2 • but the results presented below are qualitatively similar when using grid resolutions ranging from 1 • to 5 • .Because of the poor sampling of the region south of 50 • S (Deser et al., 2010, Fig. 3), only the grid points located between 50 • S and 80 • N were kept, resulting in a total of N = 6280 grid points (land surface points excluded) for L = 142 × 12 = 1704 months.
Each ocean grid point is considered as a node in the network and we indicate the time series at node i, i = 1, . . ., N by p i (t k ), k = 1, . . ., L. Contrary to a network as derived from a power grid or the world-wide web, the links of a climate network are not directly tangible because the observables derive from continuous fields.In an interaction network, a link between two nodes is based on a measure of the correlation between the time series of these two nodes.In this study, we used the Pearson correlation coefficient at lag zero, say R ij , between the time series of nodes i and j , defined by as the measure defining the links in the network (Tsonis and Roebber, 2004).A pair of nodes is then considered to be connected if their correlation R ij is over a chosen threshold τ .
The elements a ij of the N × N adjacency matrix A of the undirected and unweighted network are thus given by where and δ are the Heaviside function and Kronecker symbol, respectively.Even though the links in the network Earth Syst.Dynam., 5, 1-14, 2014 www.earth-syst-dynam.net/5/1/2014/ are based on SST correlations at zero lag, a connection should not be considered instantaneous, since the time series were low-pass filtered for 13 months.However, Foster and Rahmstorf (2011) found that lags ranging from 0 to 7 months are relevant to the study of interannual climate variability.Our results have been tested for lagged correlations in this range.Since no qualitative differences were found and correlations were generally weaker for longer lags, only the results for 0 lag correlation are presented in this study.We based our choice of the threshold τ on parametric and non-parametric significance tests.Considering a decorrelation time of 13 months (corresponding to the cut-off frequency of the filter), we found that all correlations over 0.17 are statistically significant at the 5 % level according to a ttest with L/13 = 131 degrees of freedom.
In order to better account for the persistence in the time series, a moving block bootstrap (MBB) test was also applied to the time series (Mudelsee, 2010).The bootstrap works with artificially produced resamples of the time series.In an MBB, blocks of length L B are randomly picked out of the original time series to create a set size N S of surrogate time series.Correlations are then calculated between surrogate time series for each pair of nodes.The choice of the block length is a trade-off between conserving the memory of the original time series and producing maximally independent surrogate time series.For the data set used in this study, we found that the estimated 5 % significance level was robust for block lengths ranging from 15 to 50 yr.Consequently, we chose a block length of L B of 20 yr and a sample set size N S of 4000 for all our MBB tests.Because a grid of 6280 nodes results in ∼ 2 × 10 7 pairs, the MBB tests were realized on a coarser grid of 5 • × 5 • resulting in 1144 nodes.We found that in the worst case the p values of the 5 % significance level was 0.39.
Additionally, an approximate test was used, where correlations were calculated between pairs of surrogate time series associated with the same node for the 6280 nodes of the 2 • grid.The p values found in this case were close to the maximum p values found for each grid point of the first test, suggesting that the approximated version of the MBB test is a good alternative for large gridded data sets.
The results of both parametric and non-parametric tests indicate that a threshold τ = 0.4 guarantees that a link between a pair of nodes of the HadISST data represents a statistically significant correlation at the 5 % level.In addition, thresholds ranging from 0.4 to 0.6 were used to build the network and we found that our results were not qualitatively sensitive to the threshold value over this range, although teleconnections appear weaker as the threshold increases.Subsequently, a threshold of 0.4 will be used and the resulting interaction network built from the HadISST data set will be referred to as the H-SST network.

Analysis techniques
Extended reviews of commonly used network measures can be found in Costa and Rodrigues (2007) and Barthélemy (2011).We here focus on degree centrality, first neighbours maps and communities.The degree centrality d i of a node i counts its number of connections with other nodes of the network.This measure is directly accessible from the adjacency matrix through and allows one to reveal nodes sharing a common variability with other nodes of the network.The edge density ρ of a network is the total number of links in the network normalized by the maximum number of possible links and is an indicator of the sparseness of the network.First neighbours maps describe the fraction of nodes belonging to a given group any node in the network is connected to.They are defined as where "G" is the selected group of N G nodes.A node of a first neighbours map reaches a maximum (minimum) value of 100 % (0 %) if it is connected to all (none) of the nodes of the selected group of nodes.Much focus has recently been given to the partitioning of networks into communities (Newman and Girvan, 2004).Communities are groups of nodes tightly connected together and weakly connected to the rest of the network.As such, they can be regarded as subsystems which operate relatively independently of the other communities (Arenas et al., 2006).Considerable improvements have been made during the past decade regarding the speed and efficiency of the community detection algorithms.In Tsonis et al. (2010), the algorithm of Newman and Girvan (2004) based on the progressive removal of dominant links (in terms of information flow) was applied to determine the community structure of several fields (500 hPa height, sea level pressure and surface temperature) derived from the NCEP/NCAR reanalysis (Kistler et al., 2001) and simulations from the Geophysical Fluid Dynamics Laboratory (GFDL) climate model CM2.1 (Delworth, 2006).
We tested the Multilevel algorithm of Blondel et al. (2008) and the Leading Eigenvector algorithm of Newman (2006) which, as the algorithm used by Tsonis et al. (2010), is based on the optimization of the modularity (cf.Sect.3.1) but has a faster implementation.However, results presented in Sect. 3 below are generated using the Infomap algorithm by Rosvall and Bergstrom (2007) which is based on the compression of the paths of random walkers traveling along the network.This algorithm was the most efficient in the LFR benchmark (Lancichinetti and Fortunato, 2009) among the different algorithms tested.
3 Spatial structure of SST variability

Communities in the H-SST network: detection
The map of degree centrality of the H-SST networks (Fig. 1) indicates that nodes located in the tropics tend to have a higher degree centrality than those located in the extratropics, in agreement with atmospheric surface temperature analyses in Tsonis and Roebber (2004).The tropical Atlantic shows a rather low degree centrality compared to the Indian and Pacific Oceans except over its northwestern part.The eastern tropical Indian Ocean also shows a low degree centrality.Surrounding the tropics, patches of high degree centrality are found in the mid-latitude North Pacific, South Pacific and along the western coast of the American continent.
In order to assess whether the variability in high-degree regions arises from distinct spatial patterns, the community structure of the H-SST network is determined (Fig. 2a).The communities are ordered by the total "PageRank" of their nodes (Brin and Page, 1998).The PageRank corresponds to the fraction of random walkers which would flow through the nodes of a community out of a population of random walkers traveling around the network by the links.In a directed network, a large flow of random walkers can arise from the large inward-degree of the nodes they go through but also from the large inward-degree of the nodes pointing to them and so on.However, in our case, the network is undirected so that the flow of random walkers through a node is equal to its degree centrality divided by the edge density.The total PageRank of a community is thus given by where C i is the set of nodes belonging to community i.Consequently, the PageRank of a community is a measure of the co-variability of its nodes.
A common measure of the quality of a partition of a network into communities is the modularity of this partition (Newman and Girvan, 2004).For a particular division into m communities an m × m symmetric matrix e is defined, for which an element e ij is the fraction of all connections in the network that link nodes in community i to nodes in community j .As such, i e ii gives the fraction of edges in the network that connect nodes in the same community and f i = j e ij represents the fraction of all edges in the network that connect to nodes in community i.In a (random) network, for which edges connect nodes independent of the communities they belong to, we would have e ij = f i f j so that i f 2 i corresponds to the fraction of all edges that connect to the same community in such a randomly wired network.In a modular network, the (e ii ) must be high comparatively to the (f 2 i ), thus, the modularity M is defined as in Newman and Girvan (2004): The modularity of the Infomap partition of the H-SST network into 11 communities (Fig. 2a) is M = 0.23, which is smaller than the modularities of 0.29 and 0.28 of the partitions into 5 and 7 communities of the Multilevel and Leading Eigenvector algorithms, for the same network.We will now justify our choice of the Infomap algorithm in spite of the lower modularity of its partition of the H-SST network.
The main problematic issue we encountered when applying community detection algorithms to SST networks arises from the spatial heterogeneity of the modularity of these networks.Regions where one spatial pattern of variability dominates (e.g.ENSO) are obviously highly modular.However, regions where the variability is dominated by the effect of atmospheric noise can also be weakly modular, so that the existence of communities in such regions is questionable.It is often more optimal in terms of modularity to associate the nodes of such weakly modular regions to one broad but weakly interconnected community.Such a community cannot be considered as a coherent physical pattern of variability, so that it would be preferable not to associate nodes of weakly modular regions to any community.The community detection algorithms we know must distribute every node to a community no matter how modular the network is.Thus, it would be preferable to distribute nodes of weakly modular regions into several small but densely interconnected communities (even if small means one node, in the case of very weakly modular regions).For our networks, the Infomap algorithm was the most capable one to divide weakly modular regions into small communities.Because these small communities are more interconnected, they are more likely to represent physical spatial patterns of variability than the broad sparsely interconnected communities detected by the modularity based algorithms.However, we focus in this study on the dominant patterns.For these reasons, we decided to filter out nodes connected to a fraction of their community smaller than twice the density of the network (i.e.connected to less than ρ(N i −1) nodes of the community, where N i is the number of nodes in the community i) and to remove communities including less than 2% of the total number of nodes in the network.
In the case of the Infomap partition of the H-SST network, 20% of the nodes are removed from their community and 1 community is filtered (Fig. 2b) out of the initial 11 communities (Fig. 2a).When applying the same filtering process to the partitions found by the Multilevel and Leading Eigenvector algorithms, more nodes are removed and fewer communities remain.For the multilevel algorithm, 32% of the nodes are removed and the same initial number of 5 communities remain, while for the Leading Eigenvector algorithm, 28% of the nodes are removed and the same initial number of 7 communities remain.
From these results, we conclude that the ability of the Infomap algorithm to distribute nodes of weakly modular regions into several small but dense communities penalizes its modularity score.On the contrary, the modularity-based algorithms tend to distribute the nodes of these weakly modular regions in a few large but sparsely interconnected communities which are less likely representative of any physical pattern variability.In the study of a climate network, it is preferable to filter out such weakly modular regions because partitioning them into communities could be misleading.Because (i) the detected Infomap partition is less affected by this filtration process than the modularity-based partitions and (ii) the fact that the random walkers in the Infomap algorithm are closely related to the flow in a dynamical networks, the Infomap algorithm appears to be the best choice for the study of patterns of climate variability.
As representations of spatial patterns of variability, the communities of the network can be compared to EOFs.To assess the potential similarities and improvements brought by the detection of communities compared to EOFs, an EOF analysis was also conducted on the same data as used to build the H-SST network.EOF and rotated EOFs (R-EOFs) analysis of SST fields have been used by, e.g.Weare (1976) and Kawamura (1994), respectively.The first 6 EOFs of the SST field are shown in Fig. 3 and the R-EOFs using these 6 EOFs in Fig. 4. The R-EOFs are linear combinations of the 6 initial EOFs maximizing a given simplicity function or criterion, here the normalized varimax criterion of Kaiser (1958).Note that the eigenvector decomposition in the EOF analysis is based on the same correlations used to build the H-SST network.When the correlation matrix is used (and not the covariance matrix), every node is given the same weight without regard to their variance.We can see (Fig. 3) that only the first EOF can be associated to a community.The higherorder EOF modes cannot be associated to any communities and they do not appear to represent any physical pattern of variability.On the contrary, the first 6 rotated EOFs can be associated with the dominant communities (Fig. 4).However, even higher-order rotated-EOFs are very noisy and cannot be associated with any communities or known physical pattern of variability.

Communities in the H-SST network: interpretation
In this section, a physical interpretation of the communities of the H-SST network (Fig. 2b) is given.Community #1 is by far the dominant community in terms of PageRank (69%) and size.Most of the nodes are located in the tropical Pacific but remote patches -located in the Pacific extra-tropics, tropical Indian ocean and northwestern tropical Atlantic -are also part of the community.This result suggests that teleconnections exist between the tropical Pacific and remote regions over the globe.These teleconnections can be deduced from the first neighbours map which is plotted in Fig. 5a for community #1.For each node in the network, this map shows to which percentage of the nodes in the community it is connected to.For example, a node in the equatorial Atlantic is connected to about 25% of the nodes in the community.
From Fig. 5a, it can be seen that community #1 is well defined (nodes inside the community are connected to most of the other nodes of this community and are only sparsely connected to nodes outside the community) and that the different remote patches of the community are connected to each other.We have verified that the equatorial Pacific nodes are anti-correlated with those of the two small patches located around 30 • in the North and South Pacific and to those near New Zealand, while they are positively correlated with the other nodes of the community.The spatial pattern of the community #1 also coincides with the EOF and R-EOF (Fig. 3a and Fig. 4a, respectively), explaining most of the variance.It also corresponds with the (December-February) El Niño/La Niña composites of HadISST SST determined in Alexander and Bladé (2002) (their Fig. 6a).Thus, these results suggest that community #1 is representative of the dominant pattern of variability on interannual timescales, i.e.ENSO.
The teleconnections shown in Fig. 5a have been described extensively in the literature.Perhaps the most well-known re- mote impacts of ENSO are the changed atmospheric circulation over the northern Pacific and North America region and the associated SST anomalies in the North Pacific (Wallace and Gutzler, 1980;Deser and Blackmon, 1995;Lau, 1997).A similar relationship between the tropical and the south Pacific (Liu et al., 2002;Ciasto and Thompson, 2008) as well as the warming of the tropical Indian ocean (Lanzante, 1996;Klein et al., 1999) and tropical Atlantic (Curtis and Hastenrath, 1995;Lanzante, 1996;Enfield and Mayer, 1997;Klein et al., 1999) basins have also been reported.Previous studies suggest that the main mechanism involved in these teleconnections on interannual timescales is the "atmospheric bridge" (Lau and Nath, 1996).This bridge occurs through changes in the Hadley and Walker cells and through the interaction of Rossby waves with the quasi-stationary flow and storm tracks (Trenberth et al., 1998;Alexander and Bladé, 2002).
Community #2, in terms of PageRank, is located in the northern Atlantic.None of the EOFs in Fig. 3 resemble this community but the third R-EOF (Fig. 3c) exhibits clear similarities with it.This spatial pattern bears the signature of the AMO (Guan and Nigam, 2009).
The next community, #3, covers the maritime continent and the Philippine and East China seas, a region below referred to as the Indian Ocean-West Pacific (IWP).The second R-EOF (Fig. 3b) has a similar pattern as this community although it also shows strong variance in the Indian ocean.The IWP is located at the confluence of the Pacific and Indian Oceans and connects the two oceans by the Indonesian Throughflow (ITF).Ramage (1968) has shown that the IWP is one of the greatest sources of energy for the extratropical circulation.Deep convection takes place in this region and the overlying atmosphere is highly sensitive to changes in SST (see Qu et al., 2005 for a review).Interannual variability in this region can be largely understood in terms of Kelvin and Rossby waves generated by remote zonal winds along the Indian and Pacific equatorial regions as well as by changes in the volume transport of the Indonesian Throughflow.
Although the other communities (#4-#10) also exhibit similarities with known spatial patterns of variability, we will not discuss this as their PageRank is much lower than the first three communities.For example, communities #8 and #9 are located in the region of the pathways of the northern Western Boundary Currents (Qiu, 2000;Frankignoul, 2001) and community #4 appears to connect the southern wind-driven gyres (Speich et al., 2002).The first 6 communities can be associated with the first 6 R-EOFs, however, when more EOFs are rotated, no additional pattern of variability could be identified (not shown here).Hence, the community analysis al-

Observations
We have seen that the community structure of the H-SST network is dominated by the first one and related to ENSO variability on interannual timescales.In order to focus on decadal variability we build a new network following the same methodology as before but using low-pass filtered time series of SST with a cutoff period of 8 yr to filter out the 2 to 7 yr band which is influenced strongest by ENSO.The threshold τ was set to 0.6 in order to only keep absolute correlations above the maximum p value of the 5% significance levels calculated over the globe using the MBB method (see Sect. 2).The resulting network, below referred to as the H-SST-LP8y network, has an edge density of 0.087 which is about 30 % smaller than the edge density of the H-SST network.
www.earth-syst-dynam.net/5/1/2014/Earth Syst.Dynam., 5, 1-14, 2014 The degree centrality and the community structure of the H-SST-LP8y network are plotted in Fig. 6a and b, respectively.Overall, the spatial patterns appear similar to those of the H-SST network but important differences exist in the tropics.First of all, the degree centralities of the nodes in the H-SST-LP8y network located in the tropical Pacific and Indian ocean have decreased significantly, compared to those in the H-SST network; in the extra-tropics the opposite effect occurs (Fig. 6a).Community #1 (Fig. 6b) of the H-SST-LP8y network (which exhibits a similar pattern as the ENSO community in the H-SST network) has a much lower PageRank (31% versus 69% in the H-SST network).This result is expected when interannual variability and, a fortiori, variability related to ENSO, is filtered out.A second important difference is that the nodes lying in the Indian Ocean are no longer part of the same community as the nodes of the tropical Pacific.They are, instead, part of the same community as the nodes in the IWP region (community #2 in the H-SST-LP8y network).Furthermore, this IWP community has a PageRank of 26 % which is comparable to the PageRank of the tropical Pacific community #1 (31%).The community of the North Atlantic (NA, community #3 in the H-SST-LP8y network) and the community of the southern wind-driven gyres (community #4 in the H-SST-LP8y network) also exhibit higher PageRanks (15 and 11%, respectively) than the corresponding communities in the H-SST network.These high PageRanks (or high number of links) are representative of the strong co-variability of the nodes in these regions on decadal timescales, indicating that important components of decadal variability are present in the IWP region, the NA and the southern wind-driven gyres communities.In order to investigate the relationship between the spatial patterns of variability represented by the communities and the global mean state of the climate system, correlations were calculated between mean SST time series associated with each community (in both H-SST and H-SST-LP8y data set) and three indices of global mean surface temperature (GMST).The time series representing the SST of a specific community was calculated by spatially averaging the time series of the nodes of the community.This spatial averaging is equivalent to projecting the data set on the community vectors C j of size N where C ij equals 1 if node i belongs to community j , 0 otherwise.Thus, these community time series are to communities what expansion coefficients are to EOFs.
The Global Surface Temperature Anomalies of the NOAA (Smith and Reynolds, 2005) averaged over land and ocean (GMST), land only (GLST) and ocean only (GOST) were used as indices of the mean state of the climate system.The indices were processed using the same methodology as for the H-SST and H-SST-LP8y data sets (1 and 8 yr, respectively, low-pass filtered detrended anomalies) and cover the period 1880-2011.The correlations between the time series of the communities and the indices for the H-SST and H-SST-LP8y networks are presented in Tables 1 and 2, respectively.For each correlation, a 95% confidence interval is given in brackets and correlations significant at the 5% level in bold face.The confidence intervals and significance levels were estimated using the MBB method as described in Sect. 2.
From both tables Tables 1 and 2, it is found that each of the communities is more correlated with the GOST than with the  GLST.Only communities #1-#3 and #5 in the H-SST network (Table 1) are significantly correlated with the GLST.On decadal timescales (Table 2), the first three communities also show significant correlations with the GLST but the ENSO community (#1) does not display anymore the largest correlation to the GLST and to the GOST (as in the H-SST network).Now the Indian Ocean-West Pacific (IWP) community (#2) and the NA community (#3) are best correlated to the GOST with the IWP community displaying largest correlations to the GLST.The correlations of 0.77 and 0.84 between the IWP time series of SST and GLST and GMST, respectively, is striking and the IWP region appears to play a major role in decadal climate variability.
Figure 7 represents the linear regressions against the GMST and GLST indices of the time series of the IWP community (#2), the NA community (#3) and of the bivariate time series of the IWP and NA communities (multipleregression) in a least-square sense.The bivariate fits to the GMST and GLST result in coefficients of multipledetermination (a measure of how much the fitted time series determine the original time series, von Storch and Zwiers, 1999b) of 0.87 and 0.66, respectively.This shows that the patterns of both communities can, together, statistically explain most of the decadal variability of the GMST and the largest part of the decadal variability of the GLST.Using the ENSO time series does not significantly improve the fits.However, this analysis is only statistical and no causality between the time series of the communities and the temperature Table 1.Correlations between the mean time series of the communities of the H-SST network and the indices of mean, land and ocean surface temperatures (all time series 1y low-pass filtered).The 95 % confidence intervals are given in brackets and correlations significant at 5 % in italic face.indices can be stated.Also, the increase in GMST since the 1970s may be explained by the phase synchronization of the time series of the IWP and NA communities, although, once again, the increase of both this index and the time series of the communities could also arise from other factors such as an increased radiative forcing (Fig. 7).

ESM simulations
As communities are able to distinguish patterns of climate variability in the HadISST data, the network methods potentially offer also a more detailed tool to assess the quality of state-of-the-art Earth System Models (ESMs) in simulating these patterns.To investigate this, SST fields from a threemember ensemble of historical simulations (1870-2005), obtained with the MPI-ESM-LR model (Jungclaus et al., 2013;Stevens and Giorgetta, 2013) for the CMIP5 project (Taylor et al., 2012), were used to reconstruct two networks following the same methodology as for the H-SST and H-SST-LP8y networks.The time series of the three members were concatenated giving time series three times as long as the time  series of one simulation (thus increasing statistical significance).These model networks will be referred to below as the ESM-SST and ESM-SST-LP8y networks, respectively.The degree centrality distribution and community structure of the ESM-SST network are plotted in Fig. 8a and b, respectively.Similar to the results for the H-SST network, the ENSO community (#1) and the IWP community (#3) are visible (Fig. 8b) and the ENSO community also dominates with a PageRank of 81%.However, only the southern half of the NA community is present (community #5 in Fig. 8b), likely indicative of an under-representation of the AMOrelated variability in the model.Also, fewer connections between the ENSO community and the extratropical Pacific are visible (Fig. 8a and b) than in the H-SST network.A weaker atmospheric bridge may be responsible for the absence of such teleconnections.Finally, we can see in Table 3 that the ENSO, IWP and NA communities are, as in the observations, strongly correlated to the GOST and GLST.
For the ESM-SST-LP8y network, the degree distribution and community structure (shown in Fig. 9a and b, respectively) are quite similar to those of the ESM-SST network.However, most of the nodes located in the extra-tropics are unassigned, indicative of a weak spatial coherence of the extratropical SST variability.Interestingly, the PageRank of the ENSO community (community #1) is still very high (81%) so that decadal variability is also dominated by the ENSO community.This too strong decadal tropical variability has already been reported by Jungclaus and Keenlyside (2006) for a previous configuration of this model (ECHAM5/MPI-OM).Zanchettin et al. (2012) also showed that the tropical Indian SSTs are over-sensitive to tropical Pacific fluctuations in this model.Decadal variability in the GLST within this ESM can be largely explained (statistically) by the ENSO and IWP communities (Fig. 10), but misses the connection to the NA as found in the observations (Table 4).Contrary to Fig. 7, the fits in Fig. 10 were made without the time series  .46 [0.13, 0.52] 0.48 [0.21, 0.53] 0.48 [0.18, 0.54] of the NA community, the regressions were not significantly improved using this time series.
To conclude, the main aspects of the SST variability found in the observations are also visible in the MPI-ESM-LR model but the variability related to ENSO (NA) is overrepresented (under-represented).These model biases could be efficiently identified using the network approach.

Summary and discussion
In order to study the global climate variability, we have analyzed SST observations and ESM simulations using a climate network approach focusing on the detection of communities.The network techniques provide an innovative and efficient complementary approach to more traditional methods like EOF analysis, composite analysis and correlation maps.
An EOF analysis is usually efficient in detecting the dominant mode of variability but the secondary modes, being constrained to be orthogonal to the first one, are less likely to represent regional patterns of variability (Monahan et al., 2009).Thus, when applying an EOF analysis on the HadISST SST data over the globe, only the first EOF can be attributed to a physical pattern of climate variability, namely ENSO, while the higher EOF modes could not be connected to any known patterns of variability.This shortcoming can be by-passed by rotating the EOFs like was done by Kawamura (1994).However, one of the drawbacks of this method is that the results are sensitive to the choice of the rotation criterion, to the choice of the number of loadings and to the normalization of the EOFs (von Storch and Zwiers, 1999a).
On the contrary, when calculating the degree centrality of the nodes of a network, the connections between nodes are evaluated without respect to any specific mode or direction in the space of the nodes, allowing for the coexistence of different modes of variability.This explains why patterns of variability not, or weakly, related to ENSO are also apparent in the degree centrality distribution of the H-SST network (Fig. 1).However, contrary to an EOF analysis, the degree centrality does not allow one to distinguish different patterns of variability from one another.Thus, a community analysis must subsequently be applied to distribute the nodes into distinct groups of co-varying nodes.
Detecting communities in the HadISST and MPI-ESM networks was more efficient in revealing non-overlapping spatial patterns of SST variability than EOF analysis.We have indeed shown in Sect. 3 that most of the communities in the H-SST network (Fig. 2) can be associated to known physical patterns of variability, which could not have been done using a regular EOF analysis on the global oceans.More patterns could be found using 6 rotated EOFs but the choice of the number of loadings could only be done by evaluating the correspondence of the rotated EOFs with the communities.However, the main drawback of the Infomap community detection algorithm is that it is not able to detect overlapping spatial patterns of variability.For example, we can see from Fig. 5 that the nodes in the Caribbean Sea associated with the second community (NA) could also be associated with the first community (ENSO) which was also suggested by Guan and Nigam (2009).Overlapping community detection algorithms exist (e.g.Shen et al., 2009;Lancichinetti et al., 2009), however, because of their complex implementation and slower execution application of these techniques are outside the scope of this study.
By augmenting the community analysis with first neighbours maps (Fig. 5), teleconnections and links between communities can be revealed.The study of these maps offers several advantages over correlation maps or composites.The main interest of the network approach is that neither a statistical event nor an index has to be defined a priori.For example, the results obtained using the first neighbours of the ENSO community in the H-SST network (Fig. 5a) are similar to those of Alexander and Bladé (2002) using composites, but we did not need to define what El Niño and La Niña events represent statistically.Correlation maps obtained by Klein et al. (1999) and Alexander and Bladé (2002) are also similar to Fig. 5a, but once again, we did not need to define over which domain to spatially average to obtain the relevant time series.In short, using community detection algorithms neither prior information nor a choice of a pattern of variability to be studied is necessary, making pioneering studies easier and maybe also less biased.
The community structure of the H-SST network (Fig. 2) proved to be in good agreement with known spatial patterns of climate variability, further supporting the use of this method to identify climate patterns of variability from large gridded data sets.On interannual timescales, the main community could be associated with the dominant pattern of variability, ENSO and its remote influences were visible as part of the community or in the first neighbours maps.These teleconnections could in part be interpreted using the "atmospheric bridge" concept.Other patterns of variability, such as the AMO, were also visible as independent components of the system.
One of the key differences between the H-SST and the H-SST-LP8y networks is that the Indian Ocean-West Pa-cific (IWP) region appears an independent and strongly intraconnected community in the H-SST-LP8y network whereas its western part is connected to the tropical Pacific in the H-SST network.This suggests that decadal variability in the IWP region is not driven by tropical Pacific variability through the atmospheric bridge as on interannual timescales.Furthermore, a large component of the observed global land surface temperature (GLST) variability can be explained by the time series associated with the IWP community.
To infer whether the IWP SST drives or responds to decadal variability of the overlying atmosphere is a difficult problem, since little is known about the sources of such lowfrequency variability in the climate system.Several arguments support the idea that SST decadal variability can drive land surface temperature (LST) changes.First, the maximum cross-correlation of the time series associated to the IWP community in the H-SST-LP8y network and the GLST index is found for the IWP time series leading the GLST of 18 months, although significant correlations are also found for lags ranging from −240 months (IWP leading) to 77 months (GLST leading).Secondly, Dommenget (2009) suggests, using observations and GCM simulations, that the "ocean's variability is leading to variability with enhanced magnitude over the continents, causing much of the longer timescale (decadal) global-scale continental climate variability".In particular, several studies show that decadal SST fluctuations in the IWP region may influence the East Asian Monsoon (Hu, 1997;Zhou et al., 2009) through changes in the West Pacific subtropical high.Zhang et al. (2004) suggest that the warming of the IWP region can lead to enhanced precipitation over the Tibetan Plateau and that the resulting increased snow cover over the plateau in spring can further induce changes in circulation and precipitation over East Asia during the subsequent summer.Consequently, the Asian continent appears to be very sensitive to SST fluctuations in the IWP region.
If SST decadal variability in the IWP region can indeed account for most of the GLST variability, the next question would be whether SST decadal variability in this region is due to internal variability of the ocean or to the integration of atmospheric stochastic forcing (Hasselmann, 1976).Decadal variations of the Indian Ocean's shallow overturning (Lee, 2004), thermocline (McDonagh and Bryden, 2005) and subtropical gyre (Bindoff et al., 2000) waters and pathways of the Indonesian Throughflow (Valsala et al., 2010) have been reported, which could be related to Indian Ocean internal variability.To prove or discredit these hypothesis is difficult by the lack of long-term vertical profile measurements in the Indian ocean.However, a comparison of GCM and mixed-layer model simulations as in Dommenget and Latif (2008) could help to assess whether the atmospheric stochastic forcing alone can explain SST decadal variability in the IWP region.Anyway, it is crucial to investigate the reasons of the apparent link of the IWP and NA regions to the www.earth-syst-dynam.net/5/1/2014/Earth Syst.Dynam., 5, 1-14, 2014 GLST on decadal timescales as well as the sources of this low-frequency variability.Finally, we showed that networks techniques can also be efficiently used to identify model biases.Using one specific ESM (the MPI-ESM-LR model), we found that decadal variability in the tropical Pacific is too strong compared to observations as well as its influence on the tropical Indian Ocean basin.On the other hand, other patterns such as the NA are much weaker than in observations.Most of the extratropical nodes of the network are unassigned, indicative of a low coherence in the simulated SSTs.However, we also found a strong relationship between the IWP region and the GLST, which is in agreement with that found the observations.How the IWP region and the GLST interact in the model needs further investigation but is outside the scope of this paper.

Fig. 1 :Fig. 1 .
Fig. 1: Degree centrality, as defined by Eq. (3), of the nodes of the H-SST network.A threshold τ = 0.4 was used to construct the network giving an edge density ρ = 0.14.

Fig. 2 :
Fig. 2: Top panel (a): Infomap communities in the H-SST network.Each color represents one community and each node is assigned the color of the community it belongs to.The communities are ordered by decreasing total PageRank of their nodes which is written to the right of the scale.Bottom panel (b): Infomap communities after filtration of nodes which are connected to a fraction of their community smaller than twice the density of the network or belong to communities with nodes smaller than 2% of the network.Nodes in white are unassigned, they do not belong to any community.

Fig. 3 :Fig. 3 .
Fig. 3: First six EOFs determined for the same dataset as used to build the H-SST network.

Fig. 4 : 25 Fig. 4 .
Fig. 4: First six varimax-rotated EOFs estimated from the same dataset as used to build the H-SST network.

Fig. 5 :Fig. 5 .
Fig. 5: First neighbours map of communities (a) #1, (b) #2, and (c) #3 in the H-SST network.For each node, the color scale indicates the percentage of the nodes in the community to which that node is connected.
Fig. 6: (a) Degree centrality of the H-SST-LP8y network.The network was built using the same methodology as the H-SST network except that the time series were 8 years low-pass filtered and a threshold τ = 0.6 was used for significance; this results in an edge density ρ = 0.087.(b) Filtered communities in the H-SST-LP8y network.

Fig. 7 :Fig. 7 .
Fig. 7: Regression of the mean time series of the IWP and NA communities of the H-SST-LP8y network to the (a) GMST and (b) GLST.
Fig. 8: (a) Degree centrality of the ESM-SST network.The network was built using the same methodology as for the H-SST network but using SST from the MPI-ESM-LR historical simulations.A threshold τ = 0.4 was used as for the H-SST network leading to an edge density ρ = 0.073.(b) Filtered communities in the ESM-SST network.
Fig. 9: (a) Degree centrality of the ESM-SST-LP8y network.The network was built using the same methodology as for the H-SST-LP8y network but using SST from the MPI-ESM-LR historical simulations.A threshold of τ = 0.6 was used as for the H-SST network leading to an edge density ρ = 0.077.(b) Filtered communities in the ESM-SST-LP8y network.

Fig. 10 :
Fig. 10: Regression of the mean time series of the ENSO and IWP communities of the ESMP-LP8y network to the (a) GMST and (b) GLST.

Fig. 10 .
Fig. 10.Regression of the mean time series of the ENSO and IWP communities of the ESMP-LP8y network to the (a) GMST and (b) GLST.

Table 3 .
Correlations between the mean time series of the communities of the ESM-SST network and the indices of mean, land and ocean surface temperatures (all time series 1y low-pass filtered).The 95 % confidence intervals are given in brackets and correlations significant at 5 % are italic face.

Table 4 .
Correlations between the mean time series of the communities of the ESM-SST-LP8y network and the indices of mean, land and ocean surface temperatures (all time series 8y low-pass filtered).The 95 % confidence intervals are given in brackets and correlations significant at 5 % in italic face.