Arya Mazumdar (Research)


  • Information Theory and Coding Theory: Applications to Storage, Security and Biology.

  • Distributed and Networked Systems.


Complete List; DBLP

Students and Advising

Active Projects

Reliability in Large-Scale Storage

With the advent of large scale distributed storage systems, cloud computing and commercial data storage applications, there is a renewed interest in the coding and information theory that governs the reliability issues of data. In these large networked databases, faster updates and quick repair requirements have to be integrated with reliable data storage protocols. These requirements bring new dimensions and parameters in the traditional optimization problems of information theory.

In this project we propose, for the first time, a model of large-scale storage that accounts for the topology of storage networks. Previously, storage-topology was never a concern of the code designers. Further, we study the update-efficiency and local-repair (fast recovery) properties of error-correcting codes suitable for storage. For all of these, we will analyze the fundamental limits of systems; as well as propose explicit (fast algorithmic) construction of codes. Several tools from graph and network theory, combinatorics and optimization theory will be used to find performance limits and devise coding algorithms.

Bioinformatics and Related Applications

We have recently started to look at potential application of graph-based codes (LDPC codes), iterative decoding algorithms and Network coding to both predictions of Gene functionality as well as cell fate. Traditionally many of these systems such as Gene regulatory networks are modeled by artificial Neural nets.

Data Compression + Error-Correction

Algorithms for lossy/lossless compression and error-correcting codes have been at the core of the digital revolution. This project focuses on the particular set of applications in which both lossy compression and noise resilience are required. Examples include storage of high resolution imagery on non-perfect semiconductor (flash) memory and real-time video surveillance over jammed or noisy channels.

The state-of-the art solution is “separation”: serial concatenation of an off-the-shelf compression algorithm with an off-the-shelf error-correcting code. However, as shown recently by the investigators, for worst-case guarantees the separated solution is far from being (even asymptotically) optimal. This provides the principal motivation for a multifaceted investigation of the combinatorial, geometric, algebraic and information theoretic aspects of the joint source-channel coding problem.