We used principal component analysis (PCA) to derive climate indices that describe the main spatial features of the climate in the Baltic states (Estonia, Latvia, and Lithuania). Monthly mean temperature and total precipitation values derived from the ensemble of bias-corrected regional climate models (RCMs) were used. Principal components were derived for the years 1961–1990. The first three components describe 92 % of the variance in the initial data and were chosen as climate indices in further analysis. Spatial patterns of these indices and their correlation with the initial variables were analyzed, and it was detected (based on correlation coefficient between principal components and initial variables) that higher values in each index corresponded to locations with (1) less distinct seasonality, (2) warmer climate, and (3) wetter climate. In addition, for the pattern of the first index, the impact of the Baltic Sea (distance to coast) was apparent; for the second, latitude and elevation were apparent, and for the third elevation was apparent. The loadings from the chosen principal components were further used to calculate the values of the climate indices for the years 2071–2100. An overall increase was found for all three indices with minimal changes in their spatial pattern.

Spatial representation of the climate, e.g., the mapping of climatic zones, is a useful tool in climate analysis. First, it can be used to better convey information about the climate features of the region for applications in climate change adaptation and mitigation. Second, the spatial patterns can give insight into both the possible relationship between and the impact of the climate on other fields, e.g., phenological processes and vegetation distribution (Feng et al., 2012). Third, they illustrate geographical features that influence climate, such as hillsides and coastal zones. There is a wide variety of approaches for creating spatial representations of climate, but usually they belong to either rule-driven or data-driven methods. Rule-driven methods are used more often, the most popular being the Köppen–Geiger classification (Peel et al., 2007). These methods are based on certain predefined rules; for example, thresholds of meteorological variables or frequency of events. Climate zones derived from classifications of this type usually correspond to vegetation distributions in the sense that each climate type is dominated by one vegetation zone or eco-region (Belda et al., 2014). However, predefined rules make these methods subjective. Alternatively, the spatial pattern can be derived from data-driven or analytical methods. These include principal component analysis (PCA; Benzi et al., 1997; Estrada et al., 2009), cluster analysis (Bieniek et al., 2012), or a combination of both methods (Briggs and Lemin, 1992; Fovell and Fovell, 1993; Baeriswyl and Rebetez, 1997; Malmgren et al., 1999; Fan et al., 2014; Forsythe et al., 2015). Analytical methods, depending on the chosen variables, can give results that are similar to those of rule-driven methods, but the results are more homogenous (Netzel and Stepinski, 2016). Analytical methods provide a spatial pattern that must be interpreted before it can be linked with possible applications.

Principal component analysis or empirical orthogonal function analysis has two important applications. First, it can reduce the number of variables that are used to describe regional climate while still retaining most of the variation seen in the initial data. Second, principal components provide new indices that are a linear combination of the chosen variables. The loadings of the chosen principal components are the coefficients that define the newly created indices, which then describe the main features of climate. Variables for PCA can be chosen and indices calculated with a specific purpose in mind; for example, indices for the classification of different types of winters (Hagen and Feistel, 2005) or estimation of crop yield based on the climate (Cai et al., 2013). Indices can also be chosen to describe the climate of the region in general (Estrada et al., 2009). However, the problem with the indices that are derived using analytical methods is that their meaning is not known beforehand, so their interpretation may require further analysis.

For many practical applications, temperature and precipitation are the two main variables of interest for a certain region. They are usually sufficient for representing vegetation types in corresponding climate zones (Zhang and Yan, 2014). Vegetative production, organic matter decomposition, and the cycling of nutrients are strongly influenced by temperature and moisture (Briggs and Lemin, 1992). Distinct changes in temperature and precipitation are to be expected in the future (BACC II, 2015). Thus, any climate patterns based on these two variables will consequently be affected, leaving a significant impact on living organisms. For instance, plant species inhabiting regions subjected to climate change might have too little time to adapt (Mahlstein et al., 2013).

The Baltic state region exhibits significant spatial and temporal
climatic variability, with an influence from air masses of arctic to
subtropical origin (Jaagus and Ahas, 2000; Rutgersson
et al., 2014). The terrain is mostly flat, with the highest elevations
extending slightly above 300

To study the effects of climate change on climate patterns, regional climate model (RCM) data can be used (Castro et al., 2007; Mahlstein and Knutti, 2010; Tapiador et al., 2011; Fan et al., 2014). RCMs are continuously improving and correspond rather well to climate observations (Tapiador et al., 2011). Other advantages of using RCM data are that (a) their data are regularly spaced, while PCA applied to irregularly spaced data can produce distorted loading patterns (Karl et al., 1982), and (b) RCM data are also available as future projections, giving insight into the manifestation of climate change. Additionally, the spatial representativeness of the network of observation stations in the Baltic states has been reported to be problematic (Remm and Jaagus, 2011).

The aim of this work is to define climate indices that represent the main features of Baltic state climate in a compact form. The study consists of several parts. First, RCM data for temperature and precipitation were bias corrected. Second, monthly average values for the reference period 1961–1990 were calculated and standardized. Third, PCA was performed and the main principal components were identified. The acquired principal components and their spatial patterns were analyzed. Fourth, the loadings of chosen principal components were used to calculate indices for the years 2071–2100 and compared to reference data.

The source of the RCM ensemble data is the ENSEMBLES project (van der Linden and Mitchell, 2009). Model data sets for the A1B scenario are given for the time period 1961–2100, and 22 model runs were considered (shown in Table 1).

List of the regional climate model (RCM) ensemble members used (ENSEMBLES) showing the originating institution, the name of the RCM, and the driving general circulation model (GCM). For an explanation of abbreviations, see van der Linden and Mitchell (2009).

Monthly precipitation 1961–1990; bias-corrected median of RCM ensemble.

We used time series of daily average air temperature at 2

Monthly average temperature 1961–1990; bias-corrected median of RCM ensemble.

Two time periods were chosen: 1961–1990 (as a reference climate) and
2071–2100 (as future climate projections). For each time period, monthly
average temperature and precipitation were calculated for each grid point. In
total 24 climatic variables were used for each time period: 12 monthly
precipitation and 12 monthly average temperatures. This is an “R-mode”
analysis according to Cattell (1952). The spatial distribution of these
variables for the reference period is shown in Figs. 1 and 2. Figure 1 shows
a north–south gradient of monthly precipitation during April–June and
an east–west gradient of monthly precipitation during October–January.
Figure 2 shows an east–west gradient of monthly temperatures during
October–February and a north–south gradient of monthly temperatures during
April–June. This implies that some of the variables can be combined in
seasons (as done by Malmgren et al., 1999, and Forsythe et al., 2015)
and that for some months temperature and precipitation are correlated.
A better understanding of variables with similar patterns can be gained by
examining the correlation matrix in Fig. 3. The matrix areas that represent
strongly correlated variables are marked in this figure, and they show the
following relationships.

Figure 3 shows that the 24 monthly variables contain redundant information, and through PCA we can summarize the information and create new variables.

The aim of PCA is to create a new set of uncorrelated variables that are a linear combination of the initial variables and explain as much of the initial variation as possible. An extensive description of PCA can be found in Jolliffe (2002), and its applications to climate are described in Preisendorfer (1988).

Although PCA is a widely used methodology, the terminology in the literature can vary (Wilks, 2011). We will briefly describe the terminology used in this article.

Temperature–precipitation correlation matrix; bias-corrected data. Marked and numbered features show especially high absolute correlation: (1) strong correlation between precipitation levels in winter months; (2) strong correlation between precipitation and temperature in spring months; (3) strong negative correlation between precipitation in autumn and spring temperature; (4) strong correlation between temperatures in autumn and winter months.

Suppose that

An important choice must be made when applying PCA: whether to use
a correlation matrix or covariance matrix in the calculation of
loadings. If the covariance matrix is used then a second choice must
be made: whether to use standardization and what type. The scaling process has
a significant impact on the PCA process. When performing data
standardization, the following issues should be taken into account.

Variables should be of a similar scale; otherwise, variables with considerably larger variance will dominate the principal components. Different scales are usually a consequence of different units of measurement. In our case the variance for precipitation measured in millimeters is considerably larger than that for temperature measured in degrees Celsius.

In the case of variables measured in the same units, variances contain useful information and can improve the interpretation of PCA (Overland and Preisendorfer, 1982). Therefore, for variables that are measured in the same units (for example, average temperature in different months) we wish to keep the ratio between variances of different months. This means that the correlation matrix, in which each variable is divided by its square root of variance, should not be used as it would bring the variances of all 24 variables to 1.

As we are planning to use the acquired loadings as coefficients for the calculation of climate indices for the future time period and compare them with the reference climate, it is necessary that the same standardization process be used for the data of the future time period.

It is important to note that subtraction of the mean (or a similar constant) for each variable does not impact the result of PCA as it does not impact the covariance between variables. However, if the initial values have a zero mean (the mean is subtracted from each variable) then the resulting principal components have a similar scale, and spatial patterns are more convenient to review.

Scree plot (explained variance of each principal component) calculated for the reference (1961–1990) climate.

Variances of climate variables before and after standardization for the years 1961–1990.

Spatial pattern of first three principal components based on monthly temperature and precipitation data for the years 1961–1990.

Taking into account the issues described above we propose using
standardization as defined by Eq. (

The variances before and after such standardization for the reference period are shown in Table 2. The ratio of variances for different months is retained. For data representing the future time period, the standardization is performed by using the mean values and average variances from the reference period. The results of data standardization for the future time period are shown in Table 3. It can be seen that in the future the variance in precipitation data will increase and the variance in temperature data will decrease. However, the distribution of variances over the year is similar.

Another detail that must be considered when using PCA is the choice of method for determining the number of principal components that describe data variation sufficiently well and can be used in further analysis. There are multiple methods to choose from (Preisendorfer, 1988); however, in our case one of the most common methods, the scree plot, gives excellent and clear results. A scree plot is a graph of explained variances in acquired principal components, and the number of principal components is decided based on the break point in such a graph. Components to the left of the break point are retained.

The explained variance and loadings of the first three principal components are shown in Table 4. The scree plot of all principal components is shown in Fig. 4. The first two components already describe 78 % of the variance in the initial variables, while the first three components describe 92 % of the variance. According to Jolliffe (2002) the cutoff point should be between 70 and 90 % of the explained variance. However, the scree plot clearly shows that the first three principal components can be retained, so we chose to further analyze the first three components.

Variances of climate variables before and after standardization for the years 2071–2100.

Figure 5 shows the spatial pattern of the first three principal components for the reference climate. They should be analyzed together with the correlation coefficients between the new variables and initial variables shown in Table 5, in which the bright red or blue colors mark high positive or negative correlation. One can see that variables that were initially highly correlated (positively or negatively; Fig. 3) show similar (or in the case of negative correlation, the opposite) values in Table 5.

Correlation coefficient values (Table 5) show that the first principal component (PC1) has a high positive correlation with the autumn–winter temperature and precipitation and a high negative correlation with temperature and precipitation in late spring and early summer months. This means that higher values of PC1 correspond to places with warmer winters with more precipitation (snow or rain) and colder summers with less precipitation. However, it is also important to note that the total sum of the loadings is above 1, which implies that a constant increase in all variables would also result in higher values of PC1. From the spatial distribution (Fig. 5) we can see that PC1 has an east–west gradient implying less distinction between seasons at the seaside. It can be concluded that PC1 reflects the continentality of climate, and it represents the influence of the Baltic Sea.

Correlation coefficients between indices (principal components) and initial variables for the reference and future climates.

The second principal component (PC2) is positively correlated with all monthly temperatures and negatively correlated with precipitation in autumn. This means that high PC2 values correspond to regions that are generally warmer than others and have low precipitation in autumn. For PC2 a north–south gradient is evident with the warmer climate in the south. This means that PC2 represents the influence of latitude. This pattern is also slightly influenced by geographical features (elevation) and the shape of the coast.

Explained variance and loadings of the first three principal components calculated from temperature and precipitation data for the years 1961–1990.

Correlation coefficients between principal components and standardized initial data for the years 1961–1990. High positive correlation corresponds to darker red color and high negative correlation corresponds to darker blue color.

PC3 is mainly positively correlated with precipitation for most of the year (December–August) and spring temperature (April–May). This means that high PC3 values correspond to places with overall high precipitation or, in other words, an overall wetter year. PC3 mainly reflects the terrain, i.e., the distribution of elevation.

When the spatial patterns of PC2 and PC3 are analyzed the effect of orography can be seen. The location of the highlands is especially visible, while for PC1 the terrain seems to have little impact.

Loadings (linear weights) acquired through PCA from the reference data (Table 4)
can be used as coefficients that define new climate indices. We can use these
coefficients to calculate climate from different data (other time periods or
other geographical locations). It is also important to note that statistics
(mean values and variances) from the reference data used in data standardization
should also be applied to other data for comparison to be possible. In our
case we calculated such climate indices for future climate (corresponding to
the period 2071–2100) and analyzed the change in climate patterns.
The standardization of the variables is shown by Eq. (

It is important to note that

In Fig. 6 the correlation coefficients between indices and initial variables are shown and it can be seen that they are similar to those for past climate. Therefore, they have the same interpretation and it is possible to analyze the change in spatial patterns between the past and future climate. The spatial distributions of future indices are shown in Fig. 7. Statistical descriptors, e.g., the minimal, maximal, and mean value of past and future indices, are summarized in Table 6. In addition, as we have used the same standardization (subtraction of the reference period mean) and climate index calculation process (loadings from the reference period), we can derive conclusions about increases or decreases in these climate indices. However, it is important to note that no conclusions can be derived about the value by which the increase or decrease has happened.

Climate indices (based on principal components from 1961–1990) for the years 2071–2100.

Statistics of climate indices (based on PCA) for past and future data.

All indices have higher values in future climate. This can be
interpreted as an overall warmer climate (increase in PC2) and wetter
climate (increase in PC3). The interpretation of PC1 is more complicated
as coefficients (Table 4) for some variables are positive and
negative for others. An increase in PC1 would be observed in the case of a constant
increase in all variables. However, an increase would also be observed
in the case of a temperature and precipitation decrease in spring and summer. An
average increase of “standardized” (by Eq.

Description and interpretation of climate indices based on PCA.

For PC1 it is shown that the values corresponding to coastal regions in the reference climate will “move” to the eastern part of the Baltic states in the future projections. The expected changes in PC2 are the largest, and the maximum values of PC2 for the reference climate (in southern Lithuania) are lower than the minimum values for the future climate (in central Estonia). The statistics in Table 6 show that the reference range of this index does not overlap with the range of future values. The climate corresponding to the reference values of PC3 in western Lithuania (the Zemaiciai Highland) will in the future be observable on plateaus in the central and northeastern parts of the Baltic states.

The methodology used in this study has been able to reduce 24 climate variables to three new indices that more efficiently and compactly represent the main features of the climate in the Baltic countries. The methodology can also be applied to future climate data and therefore the impacts of climate change can be analyzed. Additional analysis is needed for the interpretation of the acquired indices.

Some insight into the possible interpretation of the acquired climate indices can be gained from the literature. The spatial distribution of PC1 is similar to the spatial patterns of the mean start date of winter (see results for Estonia in Jaagus and Ahas, 2000) with higher PC1 values corresponding to later winters.

As PC2 is mainly linked to temperature, the patterns exhibited by PC2 can be expected to be similar to the spatial distribution of phenological events for which temperature is the main driving factor. For example, the spatial pattern of PC2 shows similarities to spring and summer start dates in the Baltic Sea region and to more specific phenological events, such as apple tree blossoming and the beginning of the vegetation of rye (Jaagus and Ahas, 2000) or strawberry blooming and harvest (Bethere et al., 2016). In general, higher values of PC2 correspond to places with earlier phenological processes.

High values of winter precipitation and high temperatures in spring can be interpreted in the context of spring floods; however, additional analysis is needed to account for the snow cover. The spatial distribution of PC3 is similar to the map of average annual precipitation in the study region (Jaagus et al., 2010). Interestingly, the precipitation in autumn months (September–October) has a small contribution to PC3 (Table 5).

Conclusions based on spatial pattern and correlation coefficient analysis are summarized in Table 7.

The methodology could be further improved to better link the acquired indices with phenological processes or seasons by either rotating the acquired principal components (Jolliffe, 2002) or performing correlation or regression analysis with other variables, such as crop yield (Cai et al., 2013). This approach would be especially useful in the case of PC1, for which analysis is currently complicated due to both changes in seasonality and the constant increase affecting PC1 values. Another approach that could be used to describe the spatial variability of the climate in the Baltic states is clustering based on the chosen principal component values (Fovell and Fovell, 1993; Forsythe et al., 2015).

If variables other than temperature or precipitation are used for the principal component analysis, in some cases the standardization procedure should be modified. However, it should be taken into account that when more than one data set is used, e.g., when past and future climate is compared, the same values used for standardization should be applied to all of them.

Most of the spatial variability in monthly average temperature and precipitation over the Baltic countries can be represented by three principal components for both past and future climate. These components can be considered climate indices, in which higher values correspond to locations with (1) climate with less distinct seasons, (2) warmer climate, and (3) climate with more precipitation. Each component has a distinct spatial pattern. The index related to seasonality exhibits a clear east–west (or inland) gradient with less distinct seasonality at the seaside (west). The second index (warmer climate) shows a north–south gradient with a warmer climate in the south. This index also reflects orography with colder climate in hilly regions. The third index reflects the overall precipitation. Its spatial distribution is mainly dominated by elevation, with maxima at the highlands and less precipitation in the plains and at the seaside. A specific standardization of the data also allows for the calculation of such indices for the future climate. Change in the climate indices in the future implies less distinct seasons and a warmer and wetter climate.

Although there is significant change in the magnitude of the indices between the future and reference periods, the change in spatial distribution is relatively small. For the first and third components, regions can be identified in which the future climate will be similar to the current climate in other regions.

We used publicly accessible data (before bias correction;
the bias correction is described in detail in Sennikovs and Bethers, 2009).
RCMs are from the ENSEMBLES project:

The authors declare that they have no conflict of interest.

This article is part of the special issue “Multiple drivers for Earth system changes in the Baltic Sea region”. It is a result of the 1st Baltic Earth Conference, Nida, Lithuania, 13–17 June 2016.

The research was supported by the Latvian state research program “The value and dynamic of Latvia's ecosystems under changing climate” (EVIDEnT).

The ENSEMBLES data used in this work were funded by the EU FP6 Integrated Project ENSEMBLES (contract number 505539), and support is gratefully acknowledged. Edited by: Anna Rutgersson Reviewed by: two anonymous referees