In many real-world problems, large outliers and pervasive noise are commonplace, and one may not have access to clean training data. Accordingly, anomaly detection is useful to detect and remove anomalies to provide noise-free data for further analysis. Robust Principal Component Analysis (Robust PCA) is an example of such a method that computes a splitting of data into a sparse anomaly part and the remaining part can be projected on a linear low-dimensional manifold.
Our work consists of both real-world applications and methodology development. We employed Robust PCA to the cyber security domain to analyze dimensionality of data and detect anomalies in data arising from BBN/Raytheon’s high fidelity simulation on networks. Robust PCA provides unique features to identify different DDoS attacks without knowing a ack labels of data.
Besides, we generalize Robust PCA from discovering linear manifolds to non-linear relationships in the data. In the recent literature, deep autoencoders and other deep neural networks, have demonstrated their effectiveness in exploring non-linear features across many problem domains. Our extension combines deep autoencoders and Robust PCA which not only maintain a deep autoencoders’ ability to discover nonlinear features but can also eliminate noise. We present generalizations of our results to grouped sparsity norms which distinguish anomalies from structured corruptions, such as a collection of instances having more corruptions than their fellows. Leveraging grouped norms allows our method to detect row-wise outliers. Both denoising and outlier detecting increase the robustness of standard deep autoencoders, we named our novel methods “Robust Deep Autoencoder (RDA)”. This work has been published as a full paper on the research track of the KDD’17 conference.
Further, we propose a model consisting of a hierarchical collection of RDAs which maintains the spirit of stacked denoising autoencoders and hierarchical neural networks. Last, by allowing any advanced autoencoder to replace the standard autoencoders used previously in the RDA framework, we demonstrate grouped norm regularized autoencoders which is more faithful to the ideas of classic dimension reduction.
Prof. Randy C. Paffenroth, Worcester Polytechnic Institute. Adviser.
Prof. Xiangnan Kong, Worcester Polytechnic Institute.
Prof. Yanhua Li, Worcester Polytechnic Institute.
Dr. Partha Pal, BBN/Raytheon Company.