### What is distilled sensing? (back to top)

Distilled sensing (DS)
is a multi-step, selective (adaptive) sampling procedure
for recovering sparse signals from noisy observations. DS results in
dramatic quantifiable
improvements over the best non-adaptive sensing methods for the
estimation and detection of sparse signals in noise.

The intuition behind the procedure is simple: given a collection of noisy data, it is easier to determine where signal isn't than where it is. DS is the formalization of this idea --- an iterative sensing and refinement procedure where, at each step, measurement resources are systematically focused toward relevant signal components and away from locations where no signal components are present. The action of the refinement steps is reminiscent of purification by distillation --- hence the name distilled sensing.

The intuition behind the procedure is simple: given a collection of noisy data, it is easier to determine where signal isn't than where it is. DS is the formalization of this idea --- an iterative sensing and refinement procedure where, at each step, measurement resources are systematically focused toward relevant signal components and away from locations where no signal components are present. The action of the refinement steps is reminiscent of purification by distillation --- hence the name distilled sensing.

### How does it work? (back to top)

Suppose the signal of
interest is a sparse vector --- a vector having relatively few non-zero
components.
When trying to recover (detect, estimate, etc.) vectors of this form,
the most effective and efficient acquisition
procedures are those that concentrate sensing resources only on the
relevant (non-zero) signal components. Of
course the locations of the relevant components will generally not be
known ahead
of time. The upshot is that DS effectively learns the signal component
locations (with an increasing level of certainty), by
utilizing information from earlier
measurements.

But just how much information can be gleaned from a collection of noisy data? The short answer is, even when the observations are so noisy (or the signal is so weak) that you cannot easily identify a small set of locations that are likely to contain signal components, it is comparatively easy to identify a large set of locations that are unlikely to contain signal components. In other words, you may not be able to guarantee that observations of signal components are among the largest observations, but you can be very certain that they are not among the smallest.

Consider the example below. The sparse signal depicted in panel (a) has three nonzero components, which are identified in blue throughout the example. Panel (b) represents a collection of observations of the signal, each subject to independent zero-mean Gaussian noise with unit variance. From the noisy data, it is difficult to identify a small subset of locations that correspond to true signal components. Indeed, only one of the 5 largest noisy observations actually corresponds to a signal component!

On the other hand, it is comparatively easy to identify where the signal isn't. Since the signal consists of only positive components, the unlikely locations can be identified using a very nonaggressive threshold (here, thresholding at zero suffices). Panel (c) below illustrates this selection procedure --- locations identified with the green arrows may contain signal components, while those identified with red x's likely do not.

Now, subsequent samples can be focused directly onto the subset of locations deemed most likely to contain signal components. For example, instead of taking one new sample per entry, for the same sample budget we can instead collect two new samples at each of the more probable locations, and average to reduce the noise. The result of this observation step is depicted in panel (d) below.

The key idea is that (with high probability) each refinement step retains most of the locations corresponding to non-zero signal components but only about half of the locations where the signal is zero. This phenomenon is not a coincidence --- thresholding at the median of the noise distribution will (on average) reject half of the observations corresponding to zero components of the signal.

Thus, at each step we can spend proportionally more sensing resources (samples, time, etc.) on indices that are likely to contain signal components, while ignoring an increasing number of locations that likely do not contain signal components. And since each refinement step reduces the number of locations or indices in question by a factor of about 2, the focusing of sensing resources can result in an exponential increase in the SNR of the signal observations at each step when the signal is sparse!

The overall DS procedure consists of a number of these measurement and refinement steps. Then, depending on the specific recovery goal (estimation, detection, etc.) the final set of output observations can be subjected to any one of a number of standard recovery procedures.

But just how much information can be gleaned from a collection of noisy data? The short answer is, even when the observations are so noisy (or the signal is so weak) that you cannot easily identify a small set of locations that are likely to contain signal components, it is comparatively easy to identify a large set of locations that are unlikely to contain signal components. In other words, you may not be able to guarantee that observations of signal components are among the largest observations, but you can be very certain that they are not among the smallest.

Consider the example below. The sparse signal depicted in panel (a) has three nonzero components, which are identified in blue throughout the example. Panel (b) represents a collection of observations of the signal, each subject to independent zero-mean Gaussian noise with unit variance. From the noisy data, it is difficult to identify a small subset of locations that correspond to true signal components. Indeed, only one of the 5 largest noisy observations actually corresponds to a signal component!

On the other hand, it is comparatively easy to identify where the signal isn't. Since the signal consists of only positive components, the unlikely locations can be identified using a very nonaggressive threshold (here, thresholding at zero suffices). Panel (c) below illustrates this selection procedure --- locations identified with the green arrows may contain signal components, while those identified with red x's likely do not.

Now, subsequent samples can be focused directly onto the subset of locations deemed most likely to contain signal components. For example, instead of taking one new sample per entry, for the same sample budget we can instead collect two new samples at each of the more probable locations, and average to reduce the noise. The result of this observation step is depicted in panel (d) below.

The key idea is that (with high probability) each refinement step retains most of the locations corresponding to non-zero signal components but only about half of the locations where the signal is zero. This phenomenon is not a coincidence --- thresholding at the median of the noise distribution will (on average) reject half of the observations corresponding to zero components of the signal.

Thus, at each step we can spend proportionally more sensing resources (samples, time, etc.) on indices that are likely to contain signal components, while ignoring an increasing number of locations that likely do not contain signal components. And since each refinement step reduces the number of locations or indices in question by a factor of about 2, the focusing of sensing resources can result in an exponential increase in the SNR of the signal observations at each step when the signal is sparse!

The overall DS procedure consists of a number of these measurement and refinement steps. Then, depending on the specific recovery goal (estimation, detection, etc.) the final set of output observations can be subjected to any one of a number of standard recovery procedures.

### Distilled sensing in action (back to top)

Building upon the simple
positive-component signal model used above, the following example from
astronomical imaging (excerpted from our most recent paper)
illustrates the improvement that DS can provide. The goal is to locate
the
stars, using noisy observations that are obtained subject to a fixed
sensing resource budget. Here, the budget could correspond to pixel
samples, sensing
time, etc.

The figure below illustrates recovery from non-adaptive sensing. The noise-free star image is shown in the top panel, and the second panel shows a collection of noisy observations that result from the spreading of sensing resources uniformly over the entire image. The "recovery" shown in the third panel is the result of a procedure that identifies as signal components any locations whose noisy observation exceeds a threshold. Here, we use the data-dependent threshold prescribed by the false discovery rate (FDR) controlling Benjamini and Hochberg procedure at FDR level 0.05.

The next figure shows the result of applying DS. The top panel depicts the noise-free image, while the subsequent panels show the sequence of observations that result from 5 steps of the DS procedure (4 focusing steps). The same recovery procedure (based on FDR thresholding at level 0.05) is applied to the final set of observations, and the result is shown in the bottom panel.

Here's a side-by-side comparison of the reconstructions:

Notice many more stars are recovered using DS. This improvement can be quantified in terms of the false-discovery proportion (FDP), which is the number of falsely-discovered stars relative to the total number of discoveries, and the non-discovery proportion (NDP), which is the number of missed stars relative to the total number of stars. The goal of a given sensing and recovery procedure is to make both the FDP and NDP small.

The figure below illustrates the results of 10 independent trials of the star recovery problem, where the FDR-controlling thresholding recovery procedure is applied to a collection of non-adaptive samples (solid lines), and 10 trials using the same recovery procedure applied to the output of the DS procedure (dashed lines).

For the same FDP, the recovery based on DS results in a much lower NDP. In other words, DS results in far fewer "misses" for the same number of "false alarms." This is precisely what is observed visually in the star recovery images.

The figure below illustrates recovery from non-adaptive sensing. The noise-free star image is shown in the top panel, and the second panel shows a collection of noisy observations that result from the spreading of sensing resources uniformly over the entire image. The "recovery" shown in the third panel is the result of a procedure that identifies as signal components any locations whose noisy observation exceeds a threshold. Here, we use the data-dependent threshold prescribed by the false discovery rate (FDR) controlling Benjamini and Hochberg procedure at FDR level 0.05.

The next figure shows the result of applying DS. The top panel depicts the noise-free image, while the subsequent panels show the sequence of observations that result from 5 steps of the DS procedure (4 focusing steps). The same recovery procedure (based on FDR thresholding at level 0.05) is applied to the final set of observations, and the result is shown in the bottom panel.

Here's a side-by-side comparison of the reconstructions:

Notice many more stars are recovered using DS. This improvement can be quantified in terms of the false-discovery proportion (FDP), which is the number of falsely-discovered stars relative to the total number of discoveries, and the non-discovery proportion (NDP), which is the number of missed stars relative to the total number of stars. The goal of a given sensing and recovery procedure is to make both the FDP and NDP small.

The figure below illustrates the results of 10 independent trials of the star recovery problem, where the FDR-controlling thresholding recovery procedure is applied to a collection of non-adaptive samples (solid lines), and 10 trials using the same recovery procedure applied to the output of the DS procedure (dashed lines).

For the same FDP, the recovery based on DS results in a much lower NDP. In other words, DS results in far fewer "misses" for the same number of "false alarms." This is precisely what is observed visually in the star recovery images.

### Distilled sensing papers (back to top)

J. Haupt, R. Castro, and
R. Nowak, "Distilled sensing: Adaptive
sampling for sparse detection and estimation," January 2010,
submitted.
(**PDF**)

J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, "Compressive distilled sensing: Sparse recovery using adaptivity in compressive measurements," Proc. 43rd Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, November 2009. (**PDF**)

J. Haupt, R. Castro, and R. Nowak, "Distilled sensing: selective sampling for sparse signal recovery," in D. van Dyk and M. Welling (Eds.), Proceedings of The 12th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, JMLR: W&CP 5, pp 216-223, Clearwater Beach, Florida, April 2009. (**PDF**)**PDF**)**PDF**)

J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, "Compressive distilled sensing: Sparse recovery using adaptivity in compressive measurements," Proc. 43rd Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, November 2009. (

J. Haupt, R. Castro, and R. Nowak, "Distilled sensing: selective sampling for sparse signal recovery," in D. van Dyk and M. Welling (Eds.), Proceedings of The 12th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, JMLR: W&CP 5, pp 216-223, Clearwater Beach, Florida, April 2009. (