ECE Section : Electrical and Computer Engineering : College of Science and Engineering : University of Minnesota, Twin Cities

Kernel Learning for Correlation Analysis of Big Data

S.Y. Kung, Princeton University

Abstract
It has been a major challenge to develop learning paradigms to effectively tackle the 3V issues - volume, velocity and variability of big data[Laney2001].   Algorithmically,   intuitive but powerful correlation analysis has been proposed to unravel the information hidden in big data[Schonberger2012]. Kernel learning represents a major technical forefront for correlation analysis, whereas the sample-based correlation matrix is represented by a kernel matrix with the correlation of two samples modeled by a kernel function. It is from the kernel’s perspective, we shall shed some light on big data’s learning paradigms:
1. Volume/Velocity of data:   We shall present some promising computational frameworks for managing large-scale and high-dimensional datasets.    Based on a notion of WEC[Kung2014], various iterative pruning methods have been proposed to quickly weed out non-support training vectors, thus substantially mitigating the learning complexity. Alternatively, divide-and-conquer techniques partition an application into many fragments, each of which executable on any processing node in a cluster, e.g. Google’s MapReduce adopted by open-source Hadoop.
2. Visualizing of data:   Visualization of massive and messy data plays a vital role in bridging the apparent messiness and hidden knowledge in big data. PCA is the classical method for dimension reduction and visualization.   PCA   is however by definition unsupervised and scale dependent. Consequently, many practical applications must opt for a more powerful Discriminant Component Analysis (DCA),   computable as a closed-form solution with maximal SNR metric. Furthermore, DCA can be viewed as the supervised and scale-invariant counterpart of PCA.
3. Variability of data: Big data are often imprecise and even incomplete sometimes. For such “messy” data , an implicit principle behind big data is to have all the data examined, i.e. embracing imprecise and incomplete ones. This should help facilitate best correlation/trend analysis, which in turn yields optimal prediction and prioritization. In this talk, the learning principle will be exemplified by a novel and effective scheme (per experimental results) -- Kernel Approach to Incomplete Data Analysis (KAIDA) -- built upon a notion of pairwise partial correlation.

Bio Sketch
S.Y. Kung is a Professor at Department of Electrical Engineering in Princeton University. His research areas include machine learning, data mining and analysis, statistical estimation, system identification, wireless communication, VLSI array processors, genomic signal processing, and multimedia information processing. He was a founding member of several Technical Committees (TC) of the IEEE Signal Processing Society, and was appointed as the first Associate Editor in VLSI Area (1984) and later the first Associate Editor in Neural Network (1991) for the IEEE Transactions on Signal Processing. He has been a Fellow of IEEE since 1988. He served as a Member of the Board of Governors of the IEEE Signal Processing Society (1989-1991). Since 1990, he has been the Editor-In-Chief of the Journal of VLSI Signal Processing Systems. He was a recipient of IEEE Signal Processing Society's Technical Achievement Award for the contributions on "parallel processing and neural network algorithms for signal processing" (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); a recipient of IEEE Signal Processing Society's Best Paper Award for his publication on principal component neural networks (1996); and a recipient of the IEEE Third Millennium Medal (2000). He has authored and co-authored more than 500 technical publications and numerous textbooks including ``VLSI Array Processors'', Prentice-Hall (1988); ``Digital Neural Networks'', Prentice-Hall (1993) ; ``Principal Component Neural Networks'', John-Wiley (1996); ``Biometric Authentication: A Machine Learning Approach'', Prentice-Hall (2004); and ``Kernel Methods and Machine Learning”, Cambridge University Press (2014).