Today, many processes at the Earth's surface are constantly
monitored by multiple data streams. These observations have become central to
advancing our understanding of vegetation dynamics in response to climate
or land use change. Another set of important applications is monitoring
effects of extreme climatic events, other disturbances such as fires, or
abrupt land transitions. One important methodological question is how to
reliably detect anomalies in an automated and generic way within multivariate
data streams, which typically vary seasonally and are interconnected across
variables. Although many algorithms have been proposed for detecting
anomalies in multivariate data, only a few have been investigated in the
context of Earth system science applications. In this study, we
systematically combine and compare feature extraction and anomaly detection
algorithms for detecting anomalous events. Our aim is to identify suitable
workflows for automatically detecting anomalous patterns in multivariate
Earth system data streams. We rely on artificial data that mimic typical
properties and anomalies in multivariate spatiotemporal Earth observations
like sudden changes in basic characteristics of time series such as the
sample mean, the variance, changes in the cycle amplitude, and trends. This
artificial experiment is needed as there is no “gold standard” for the
identification of anomalies in real Earth observations. Our results show that
a well-chosen feature extraction step (e.g., subtracting seasonal cycles, or
dimensionality reduction) is more important than the choice of a particular
anomaly detection algorithm. Nevertheless, we identify three detection algorithms
(

The Earth system can be conceptualized as a system of highly interconnected
subsystems (e.g., atmosphere, biosphere, hydrosphere, lithosphere). Each of
these subsystems can be monitored and characterized by multiple variables.
Technological progress over the past decades has led to a boost in satellite
technologies

Of particular importance is the analysis of extreme events like droughts,
fires, heat waves, or floods, which are expected to change in a future climate

The flood of observational data is accompanied by a similar increase in data
from Earth system models

In observations, anomalous events are often detected using extreme event
detection methods suitable for univariate data streams

Multivariate approaches in geoscience make use of anomalies occurring
simultaneously in multiple data streams, often referred to as coincidences or
co-exceedances

Interestingly, there are multiple industrial applications that likewise
require anomaly detection. In this context, anomaly detection has become a
standard procedure in the wake of Harold Hotelling's publication of the

The objective of this study is to provide an overview and comparison of
anomaly detection algorithms and their combination with feature extraction
techniques for identifying multivariate anomalies in EOs. Spatiotemporal EOs
are therefore stored in the Earth system data cube, which is a
four-dimensional array of latitudes, longitudes, time, and different measurement
variables. To detect multivariate anomalies in EOs, we define an anomaly to
be any consecutive spatiotemporal part of the data cube that differs with
respect to the mean, the variance, the amplitude of the seasonal cycle, or
trends from the normal rest of the data cube. We adapt algorithms from
SPC and novelty detection. The study is
structured as follows: first, we create a series of artificial Earth system
data cubes that try to mimic a series of real world features (in terms of
multiple variables, seasonal cycles, and correlation structure, etc.). We are
aware that these artificial data cubes are not real simulations of Earth
system data cubes. However, relying on artificial data in this paper is
motivated by the fact that a meaningful quantitative evaluation of
unsupervised anomaly detection algorithms and feature extraction techniques
in real Earth observation data is difficult due to the lack of
ground-truth data

Ground truth for detecting anomalies in multivariate data is rare, in
particular for detecting anomalies in real EOs. Thus, we generate
artificial data that represent common properties of EOs, including anomalies.
In particular, we focus on the existence of seasonality, correlations among
variables, and non-Gaussian distributions. Data generation assumes that each
subsystem of the Earth has uncorrelated intrinsic properties, i.e., it is
dominated by a few independent components. Consequently, generating these
independent components (which cannot directly be monitored) is the first
step. We then derive variables that contain elements of all independent
components and correspond to the observable measurements as a set of
correlated variables (Fig.

Combination of three independent component cubes to derive 10
correlated variables

More precisely, as a basic version we create three independent components for the
artificial data, each consisting of a signal (Gaussian, SD

Visualization of the four different event types

Our standard data cube

Anomalous events are introduced in the independent components only and then propagated from the independent component to some of the variables in the data cube with random weights. The anomalies are contiguous in space and time. The center of the anomaly is assigned randomly. The challenge is to detect the propagated anomaly through the unsupervised algorithms, i.e., without using the information about the spatiotemporal location of the anomaly. With this data cube generation scheme, we can generate anomalies by controlling the type of the anomalous event (event type), the magnitude of the anomalous event, and the spatiotemporal location.

We create four data cubes using the following temporary event types:

a shift in the baseline, i.e., shift of the running mean of a time series
(BaseShift)
(Fig.

an onset of a trend in the time series (TrendOnset) (Fig.

a change in the amplitude of the mean seasonal cycle of a time series (MSCChange)
(Fig.

a change in the variance of the time series (VarianceChange) (Fig.

Apart from the basic data cubes, we want to test the influence of certain data properties on the anomaly detection algorithm. In order to do so, we create data cubes, each with one added data property, i.e., we increase the number of independent components (MoreIndepComponents) or use a squared dependency among independent components (NonLinearDep) instead of a linear one. Furthermore, typical EO variables are often driven by extrinsic forcings, i.e., the Earth's solar system orbit, rotation, and axis tilt. Thus, we add a seasonal cycle modifying the signal (SeasonalCycle). In a global context, the mean is rarely constant; we therefore introduce a linear latitudinal trend into the baseline (LatitudinalGradient). In the basic case, the signal of our independent components follows a Gaussian distribution. In the more complicated versions, we also implement alternative scenarios with Laplacian (doubly exponentially) distributed signals (LaplacianNoise) and signals that exhibit spatiotemporal correlation with red noise (CorrelatedNoise). Signal-to-noise ratio is 0.3 in the basic version, one additional data property increases the signal-to-noise ratio to 1.0 (NoiseIncrease). Also, the shape and duration of anomalous events differ. We double (LongExtremes) or reduce the temporal duration of the anomalous events (ShortExtremes) and change the spatial shape from rectangular to randomly affecting neighboring grid cells (RandomWalkExtreme).

Each data cube with a specific type of the event is generated 20 times, each
time with a different magnitude of the anomalous event
(Appendix

Our experiment comprises 36 different event-type combinations of data
properties, each repeated 20 times with varying event magnitudes
(Appendix

Code to reproduce the data farm is provided in the Data Availability section.

.The idea of this study is to elaborate workflows that contain both data
preprocessing via feature extraction and algorithms for the detection of
anomalous events (Fig.

Data processing for detecting multivariate anomalies. We extract relevant features from each artificial data cube before applying the detection algorithms. The detection algorithms output some anomaly score, which we evaluate against the known extent of the event using the area under the curve (AUC). Feature extraction elements on the right-hand side are understood as options and can be combined with each other.

Feature extraction is a process to derive information from the data and
condense it into nonredundant characteristic patterns. This may facilitate
data interpretation

Subtracting the median seasonal cycle (sMSC) is one way to
deseasonalize time series. Deseasonalization may be instrumental in detecting
anomalous events across different seasons. The remaining part of the time
series is often referred to as anomalies in the climatological sense.
These anomalies are used here as an input feature. Please note, that the
climatological anomalies are only the difference between the mean behavior
and thus are not to be mixed up with anomalies (strange or rare regions in
the data, closely related to extreme events) as detected through the
(multivariate) anomaly detection algorithms
(Sect.

Computing the moving window variance (mwVAR) is a popular technique
for detecting trends in the variance in univariate time series

Time delay embedding (TDE) increases the feature vector

Principal component analysis (PCA) is a data rotation, used to find
an orthogonal (uncorrelated) subspace of the data of

Independent component analysis (ICA) can be regarded as a nonlinear
alternative to PCA; it has become a standard technique of data-based process
monitoring. We use one ICA variant that tries to separate different
sources of data by maximizing the negentropy, a measure of non-Gaussianity of
the data

We use the fastICA
algorithm implemented in the julia package MultivariateStats.jl
(

Exponentially weighted moving average (EWMA) is one way of reducing
the noise of the time series and taking temporal information into account. It
is common in the context of classical multivariate SPC to detect only
“significant” outliers

There is of course a multitude of alternative approaches available in the
literature, but we focus on the previously summarized ones as they are widely
used and efficiently implemented. Furthermore, different feature extraction
methods can also be combined (Fig.

We use several detection algorithms that we implemented in the Julia package
MultivariateAnomalies (

Univariate approach (UNIV) is a simple approach to define extremes in
univariate data by identifying all points above (or below) a certain
quantile. This so-called “peak-over-threshold” approach can be transferred
to deal with multiple univariate data streams. In this case, one would
consider a data point to be extreme if one or several of the univariate
variables are above (or below) a certain quantile threshold of the marginal
distributions of each single variable. (here: globally)

Hotelling's

Apart from computing weighted distances to the mean (like

Recurrences (REC). Within the framework of the theory of nonlinear
dynamical systems, each state of a dynamical system will revisit a particular
region in its phase space, if waiting for a sufficiently long time

The distance matrix

Kernel density estimation (KDE) is a standard technique for
estimating densities based on column means of the kernel matrix

Support vector data description (SVDD) models the distribution of
the training data with an enclosing hypersphere in a high-dimensional kernel
feature space

kernel Null Foley–Sammon
Transform (KNFST) maps the training data
into a so-called null space, in which the training samples have zero
variance, i.e., all training samples are mapped to the same point called the
target value

Given the large number of potential combinations of feature extraction and
anomaly detection algorithms, we need an objective criterion to compare the
performances of the numerous possible workflows. We use the area under the
receiver operator characteristics curve (AUC) as our measure of
detection skill for a specific event type

For each data cube with a given event magnitude and event type we compute the
AUC for each data property, feature extraction, and algorithm
combination. This leads to an entire catalogue of possible combinations,
namely

AUC difference with respect to the UNIV control in
the experimental factors feature extraction and detection algorithm
for the event types

One way of summarizing the results of such a large number of combinations is
treating the AUC values as the outcomes of an experiment in which
the different design decisions (e.g., feature extraction techniques, anomaly
detection algorithms) are the experimental factors. As a control treatment we
introduce the simplest possible approach to detecting the anomaly: UNIV
approach on the selected event type, without any further data properties
(e.g., short extremes or increased measurement noise) on the event type and
without prior feature extraction. In order to assess the (averaged) effect of
each experimental factor, we fit a linear mixed-effect model

Additionally, we compute the resampling variation in parameter estimation of
the anomaly detection algorithms (RVP) as mean difference of the
maximum AUC and minimum AUC for each resampling

Summarizing the output of several anomaly detection algorithms is one way to
create more robust results

In the following, we present the performance of the workflows in subsections
corresponding to feature extraction techniques
(Sect.

Feature extraction techniques are often more important than the detection
algorithm itself (Fig.

Shifts in the baseline are simulated to mimic
extreme events. Increasing the magnitude (in terms of standard deviations) of
a BaseShift makes it easier to detect the event
(Fig.

Results look very similar to those of
BaseShift, except that temporal smoothing with EWMA has a
stronger positive effect than for BaseShift. This may be related to
the fact that events for TrendOnset are longer than those for
BaseShift. Since the algorithms used in this work are not
specifically designed to detect the onset of linear trends, we speculate that
their capability to detect such anomalies may be related to their ability to
detect base shifts. While algorithms specifically designed to detect changes
in trends

In the detection of MSCChange, most feature
extraction algorithms showed some skill in the detection of an amplitude
increase, while only a subset of these also succeeded in detecting decreases
in amplitude (Fig.

The algorithms used are hardly able to detect any
decrease in variance (Fig.

Seasonality occurs in most EOs. Not accounting
for the seasonal cycle has a negative impact on the AUC
(Appendix

In contrast to the investigated combinations of feature extraction methods,
we can identify three of the tested algorithms performing on average almost
equally well for most event types given a suitable feature extraction as
discussed before (Sect.

Average AUC difference of the anomaly detection algorithms from the UNIV control for each event type.

These techniques exhibit overall the highest AUC and
lowest RVP (Table

In most of the cases, KNN-Gamma performance is better
than the UNIV control, but only as good as the UNIV control
for detecting TrendOnset. This may be due to the fact that for
TrendOnset, the mean distance to the KNN does not change,
unless considering a very large number of KNN values or excluding a large
fraction of temporally near data points to be within the KNN. When
excluding TrendOnset, the mean performance increases to 0.019, which
is comparable to KDE and REC. In addition, we observe even
superior performance of KNN-Gamma compared to KDE and
REC for difficult data properties (e.g.,
MoreIndepComponents, CorrelatedNoise; Fig. S2). In contrast,
KNN-Delta does not yield high AUC, probably because we do
not construct anomalies in the data cube explicitly with a direction that is
accounted for by KNN-Delta (mean length of the vectors to its
KNN). The finding that simple algorithms like KNN-Gamma (or
KDE,

These techniques perform on average worse than or equally
as well as UNIV. Also, the
RVP is highest among the algorithms (Table

We explicitly do not want to state that KNFST and SVDD are
generally worse algorithms, i.e., they are just not built for these massive
numbers of data. KNFST and SVDD outperform other algorithms in very
different settings (novelty detection in images)

exhibits good performance for detecting starting trends and shifts in
the mean. However, it also exhibits the third largest RVP
(Table

The residual time series obtained by subtracting the median seasonal
cycle from

The selection of algorithms for computing the ensemble is a compromise
between accurate detection of and diversity amongst the selected algorithms

Overall, ensemble building improves the anomaly detection rate. The mean
AUC of each of the ensemble members (3d:

AUC difference of the ensembles of anomaly detection
algorithms to the UNIV control. Ensembles are computed out of the
four
best algorithms (4b, KDE, REC, KNN-Gamma,

The utility of distance-based outlier detection
algorithms as used in this paper is often questioned in the context of high-dimensional data

Within the parameterization process, several
heuristic choices are made. We exclude five time steps to be counted as
recurrences or

Our version of the artificial data farm was generated to test different algorithms for their capability to deal with typical properties of EO data. The workflows were chosen to be as generic as possible, and therefore their application to real data with slightly different properties should be made as easy as possible. Nevertheless, several points have to be considered, when applying the algorithms to real EOs.

A typical preprocessing of EOs is to center variables to
zero mean and standardize to unit variance (also known as

Especially variables presenting a signal from the biosphere are known to
exhibit heteroscedasticity, e.g., the variance during growing season is
substantially larger than during the rest of the year
(Fig.

Furthermore, anomalies are also overestimated when using a reference period
for the estimation of the variance

Regarding the parameterization process of the algorithms, we use fixed
parameters for

Our aim is to identify suitable methods for detecting anomalies in highly multivariate, correlated, and seasonally varying data streams as they are common in Earth system science. In particular, we are interested in detecting shifts in mean (extremes), changes in the amplitude of the seasonal cycle, temporal changes in the variance, and onsets of trends. We test a wide range of workflows (i.e., combining feature extraction techniques and anomaly detection algorithms). All experiments are based on artificial data, designed to mimic real world Earth observations.

We can show that, on average over different anomaly types and data
properties, three multivariate anomaly detection algorithms (KDE,
REC, KNN-Gamma) outperform univariate extreme event
detection as well as other multivariate approaches (mean AUC
compared to univariate control:

However, we also find for the considered type of events that including a
suitable feature extraction technique in the detection workflow is often more
important than the choice of the event detection algorithm itself. However, we
find that the feature extraction has to be explicitly designed for the event
type of interest, i.e., time delay embedding (for detecting changes in the
cycle amplitude) and exponential weighted moving average (for detecting
trends and long extremes and removing uncorrelated noise in the signal)
increases the detection rate of the anomalous events. Including features of
the variance within a moving window works partly for detecting increases in
the variance but fails to detect a decrease in the variance due to the
relatively high observational noise level. In general, if the data comprise
seasonality, subtracting them and using the remaining time series as the input
feature is essential. Furthermore, we improve the detection rate of
multivariate anomalies in highly correlated data streams by adding a
dimensionality reduction method to the workflow (in line with results of

The proposed workflows are capable of dealing with common properties of Earth observations like seasonality, nonlinear dependencies, and (to a certain degree) non-Gaussian distributions and noise. Nevertheless, they have to be applied with care to Earth observations, i.e., standardization issues along with strong heteroscedastic patterns (e.g., in biosphere variables of northern latitudes) may lead to an overestimation of anomalies. Future work will explore the potential of the identified workflows in rediscovering known and potentially unknown extremes as well as other anomalies in a set of real Earth system science data streams. We anticipate that an automated application of our workflows might enable the establishment of automated Earth system process control in a very generic manner.

The artificial data farm can be created after cloning in

Within the generation process, we assume that the signal

For a basic version, three independent components

Parameter settings for the generation of the artificial data farm. Details are given for each event type and data property (in brackets).

Using this data generation scheme, a standard data cube

In the basic version we create four data cubes, each with a different temporary event type, including

shift in the baseline, i.e., shift of the running mean of a time series (BaseShift)
(Fig.

change in the variance of the time series (VarianceChange)
(Fig.

change in the amplitude of the mean seasonal cycle of a time series
(MSCChange) (Fig.

onset of a trend in the time series (TrendOnset)
(Fig.

Regarding the data properties, some of the event type data property
combinations are excluded (Table

AUC versus event magnitude for all combinations (grey) and
the univariate control (red). Columns of the matrix represent different event
types; rows represent data properties. Additional colored workflows represent
the workflows with the five highest mean values for the magnitudes

Effect of the data properties on the three best detection algorithms
(KDE, REC, KNN-Gamma) presented as AUC
difference of the UNIV control for the event types

MF and MDM designed the study in collaboration with FG, AB, JD, MR, and ER; MF implemented the algorithms, including contributions from FG, PB, and ER; MF wrote the paper with contributions from all co-authors.

The authors declare that they have no conflict of interest.

This research has received funding from the International Max Planck Research School for Global Biogeochemical Cycles (IMPRS), the European Space Agency via the STSE project CAB-LAB, and the BACI project, a European Union's Horizon 2020 research and innovation programme under grant agreement no. 64176. We thank Simone Girst for her kind language check. Reik Donner and one anonymous referee provided valuable suggestions for improvement. The article processing charges for this open-access publication were covered by the Max Planck Society. Edited by: Sagnik Dey Reviewed by: Reik Donner and one anonymous referee