University of Minnesota
Institute of Technology
http://www.it.umn.edu
612-624-2006
myU OneStop



Electrical and Computer Engineering

Kernel Learning for Correlation Analysis of Big Data

S.Y. Kung, Princeton University

Abstract
It has been a major challenge to develop learning paradigms to effectively tackle the 3V issues - volume, velocity and variability of big data[Laney2001].   Algorithmically,   intuitive but powerful correlation analysis has been proposed to unravel the  information hidden in big data[Schonberger2012].  Kernel learning represents a major technical forefront for correlation analysis, whereas  the sample-based correlation matrix is  represented by  a kernel matrix  with the correlation of two samples modeled by  a kernel function.  It is from the kernel’s perspective, we shall shed some light on big data’s learning  paradigms:
1. Volume/Velocity  of data:   We shall present some promising computational frameworks  for managing large-scale and high-dimensional datasets.    Based on a notion of WEC[Kung2014], various iterative pruning methods have been proposed to quickly weed  out non-support training vectors, thus substantially mitigating the learning complexity. Alternatively,  divide-and-conquer  techniques partition an application into many fragments, each of which  executable on any processing node in a cluster, e.g.  Google’s MapReduce adopted by open-source Hadoop.  
2. Visualizing of data:   Visualization of massive and messy data plays a vital role in bridging the apparent messiness and hidden knowledge in big data.  PCA is the classical method for dimension reduction and visualization.   PCA   is  however by definition unsupervised and  scale dependent.  Consequently, many practical applications must opt for a more powerful Discriminant Component Analysis (DCA),   computable as a closed-form solution with maximal SNR metric.  Furthermore,  DCA can be viewed  as  the  supervised and scale-invariant  counterpart of PCA.
3. Variability of data:  Big data are often imprecise and even incomplete sometimes.  For such  “messy”  data ,  an implicit principle  behind big data is to have all the data examined, i.e.  embracing  imprecise and incomplete ones.  This should help facilitate best correlation/trend analysis, which in turn yields optimal prediction and prioritization.  In this talk, the learning principle will be exemplified by a novel and effective scheme (per experimental results) -- Kernel Approach to  Incomplete Data Analysis (KAIDA) --  built upon a notion of pairwise partial correlation.

 

Bio Sketch
S.Y. Kung is a Professor at Department of Electrical Engineering in Princeton University.   His research areas include machine learning,  data mining and analysis, statistical estimation, system  identification, wireless communication, VLSI array processors, genomic signal processing, and multimedia information processing.   He  was a founding member of several Technical Committees (TC) of the IEEE Signal Processing Society, and was appointed as the first Associate Editor in VLSI Area (1984) and later the first Associate Editor in Neural Network (1991) for the IEEE Transactions on Signal Processing.   He has been  a Fellow of IEEE since 1988.  He served as a Member of the Board of Governors of the IEEE Signal Processing Society (1989-1991).  Since 1990,  he has been the Editor-In-Chief of the Journal of VLSI Signal Processing Systems.  He was a recipient of IEEE Signal Processing Society's Technical Achievement Award for the contributions on "parallel processing and neural network algorithms for signal processing" (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); a recipient of IEEE Signal Processing Society's Best Paper Award for his publication on principal component neural networks (1996); and a recipient of the IEEE Third Millennium Medal (2000).    He has authored and co-authored more than 500 technical publications and numerous textbooks including  ``VLSI Array Processors'', Prentice-Hall (1988); ``Digital Neural Networks'', Prentice-Hall (1993) ; ``Principal Component Neural Networks'', John-Wiley (1996);  ``Biometric Authentication: A Machine Learning Approach'', Prentice-Hall (2004); and  ``Kernel Methods and  Machine Learning”, Cambridge University Press (2014).