The PCA Trick with Time-Series. PCA can be used to reject cyclic… | by John Mark Agosta | Dec, 2020
PCA can be used to reject cyclic time-series behavior, and this works for anomaly detection.
Detecting an anomaly typically means thresholding a signal, to alarm when the signal is out-of-range. For something like an assembly line, where tolerances are precise, the difference between normal and abnormal is clear. But network traffic has a noisome characteristic that makes this hard. It varies with large daily cycles as customers’ activity peaks and wanes. One could try to tenderly tease out this daily variation by modelling it. However this trick using Principal Component Analysis (PCA) avoids that hard work.
The periodic components embedded in a set of concurrent time-series can be isolated by Principal Component Analysis (PCA), to uncover any abnormal activity hidden in them.¹ This is putting the same math commonly used to reduce feature sets to a different purpose. PCA and similar dimension reduction methods may be part of your every-day data science toolkit, but I bet you never thought of using PCA for this.
Here’s the problem it solves.
Say we’ve got a cluster of networked machines running a distributed web service. Customer traffic flows through various machines, depending on their function, and each records a set of performance variables, such as memory and cpu usage. Because of the common external origin of the traffic — -to wit, the customer load — -these variables share periodic fluctuations on top of whatever innate source of variation one may seek to detect. That variation could be due to software or hardware faults among the machines — -the stuff that you’re monitoring for.
Any stationary time-series can be expressed as sums of sines and cosine functions, in what is called a Fourier expansion. These periodic functions are a natural way to analyze stationary time-series in a fundamental way that will become clear. My first foray to solve the problem was to extract the Fourier expansion explicitly, much as a tool like Facebook’s Prophet forecaster does. With the extracted components in hand, I then attempted to weave together a set responsible for the external traffic, as a predictor of normal traffic, deviations from which could be classified as abnormal. But, all this effort turns out to be unnecessary.
The linear algebra of PCA.
Recall from linear algebra that one may construct a basis for any vector space, meaning a set of independent vectors that span the space, of which any other vector in the space is a unique linear combination. All bases for the space have the same size: This size defines the dimension of the space. PCA discovers a basis with two desirable properties. First, among the many possible arbitrary bases, the eigenvalues of the subspace of the first d PCA components minimize the reconstruction error of the original signals. More precisely, for any d less than the full dimension, this subspace d gives the best d-dimensional approximation of the feature vectors. As a bonus, since the covariance matrix whose eigenvalues are recovered is symmetric, the PCA eigenvectors are orthogonal. This property is well known, previously as the Karhunen-Loeve expansion, originally published in the 1940s.
Secondly, note the Fourier expansion also forms an orthonormal basis of eigenvectors invariant to translation. This means that the components don’t change as the time-index is shifted. This is important because, besides removing the concern about how the starting point of the time-series affects the results (it doesn’t) it means the Fourier components are the eigenvectors of stationary time-series. In fact, any basis of a stationary time-series can arguably be expressed as a combination of Fourier components.
A bit of math shows how the translation invariance arises. Any exponential function has this desired property
where x can be a complex number, and h is a translation. In words, this states the exponential function is the same before and after displacement by h. Of course, for our purposes, this property carries over to the familiar pair of trigonometric functions because
Thus trigonometric functions are eigenvectors in translation, meaning any periodic signal has the same eigen-decomposition if shifted in time. Consequently — -and one might say, magically, if there is a common periodic component to the set of time-series variables, PCA will find it and the Fourier components will appear in the PCA results.
How this works.
In our case, we are interested in the PCA maximum variation subspace as a way to identify the components of the periodic signal. In short, instead of throwing away the least variation basis vectors among the d as we do for dimension reduction, we will ignore the largest (in the sense of variance) basis vectors until we are left with those that are normally “quiet”, where the anomalies appear.
Our original signals are considered as vectors, one for each time-series, with length equal to n the number of time-steps. This n is not to be confused with the dimension of the set of signal features — -think of the signals matrix as the table of p features of n samples.
Here’s the R code to convert the
signals time-series matrix into the
prcomp() is part of the built-in mva R package.
Let’s have a look.
We can see how PCA can reject periodic noise with a simulation. We create a combination of few sine and cosine features, with a pinch of noise, and run it through a PCA decomposition. The result are another set of orthogonal time-series that pick up the periodicities of the input set in the first two components, leaving the residual in the rest. Here are five noisy features with different phases, and their five PCA components:
Read More …