January 2013 Newsletter
Placing you one click away from the best new CAD research!

EDITORIAL

Sapatnekar, S. S.
Editorial
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387704

ANNUAL LIST OF REVIEWERS

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387673

KEYNOTE PAPER

Gupta, P.; Agarwal, Y.; Dolecek, L.; Dutt, N.; Gupta, R. K.; Kumar, R.; Mitra,
S.; Nicolau, A.; Rosing, T. S.; Srivastava, M. B.; Swanson, S.; Sylvester, D.
Underdesigned and Opportunistic Computing in Presence of Hardware Variability
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387697

Microelectronic circuits exhibit increasing variations in performance, power
consumption, and reliability parameters across the manufactured parts and
across use of these parts over time in the field. These variations have led to
increasing use of overdesign and guardbands in design and test to ensure yield
and reliability with respect to a rigid set of datasheet specifications. This
paper explores the possibility of constructing computing machines that
purposely expose hardware variations to various layers of the system stack
including software. This leads to the vision of underdesigned hardware that
utilizes a software stack that opportunistically adapts to a sensed or modeled
hardware. The envisioned underdesigned and opportunistic computing (UnO)
machines face a number of challenges related to the sensing infrastructure and
software interfaces that can effectively utilize the sensory data. In this
paper, we outline specific sensing mechanisms that we have developed and their
potential use in building UnO machines.

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Gong, F.; Basir-Kazeruni, S.; He, L.; Yu, H. 
Stochastic Behavioral Modeling and Analysis for Analog/Mixed-Signal Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387696

It has become increasingly challenging to model the stochastic behavior of
analog/mixed-signal (AMS) circuits under large-scale process variations. In
this paper, a novel moment-matching-based method has been proposed to
accurately extract the probabilistic behavioral distributions of AMS circuits.
This method first utilizes Latin hypercube sampling coupling with a correlation
control technique to generate a few samples (e.g., sample size is linear with
number of variable parameters) and further analytically evaluate the high-order
moments of the circuit behavior with high accuracy. In this way, the arbitrary
probabilistic distributions of the circuit behavior can be extracted using
moment-matching method. More importantly, the proposed method has been
successfully applied to high-dimensional problems with linear complexity. The
experiments demonstrate that the proposed method can provide up to 1666X
speedup over crude Monte Carlo method for the same accuracy.

EMBEDDED SYSTEMS

Ahn, J.; Choi, K. 
Isomorphism-Aware Identification of Custom Instructions With I/O Serialization 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387693

Extensible processors have been widely used to achieve the conflicting demands
for performance improvement, low power consumption, and flexibility. As
extensible processors have become more popular, several algorithms have been
proposed for automatically identifying instruction-set extensions in order to
reduce the effort of manual design and verification. However, most of them
focus on finding large and complex instructions that are used only once, rather
than repeatedly used ones. Moreover, some other approaches that consider
recurrence are limited to finding small instructions. This paper proposes a
novel algorithm that considers the instruction reusability as well as
input/output (I/O) serialization. In order to overcome the high complexity of
the problem, we develop a canonical-form construction algorithm for fast
isomorphism detection on directed acyclic graphs and an incremental template
generation algorithm that identifies the best custom instruction in terms of a
user-defined fitness function. Moreover, our algorithm serializes I/O
operations so that the numbers of inputs and outputs of custom instructions are
not limited by the microarchitecture. This paper also proposes an algorithm for
multiple custom instructions utilizing a well-known iterative selection
algorithm. Last, it presents a hybrid algorithm composed of our algorithm and
the previous algorithm that does not consider reusability. Experimental results
show that our isomorphism-aware algorithm achieves significant improvement over
previous approaches in terms of algorithm runtime, as well as performance gain
obtained by custom instructions.

EMERGING TECHNOLOGIES

Bhoj, A. N.; Joshi, R. V.; Jha, N. K.
Efficient Methodologies for 3-D TCAD Modeling of Emerging Devices and Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387694

Over the past decade, 3-D process simulation, which is central to the 3-D
Technology Computer-Aided Design (3-D TCAD) approach, has severely limited the
scope and applicability of TCAD to circuits with a small number of field-effect
transistors, owing to its prohibitively high computational costs for large
layouts. Due to rapidly changing process recipes and shorter production cycles
in the industry, designĐtime optimization and iterative layout-3-D TCAD
exploration for yield-critical or yield-characterizing circuits, such as static
random-access memories (SRAMs), ring oscillators, and others, is currently
impossible in a practical time frame. In this paper, we architect a novel
layout/process/device-independent TCAD methodology in the Sentaurus tool suite
to overcome the process simulation barrier for accurate 3-D TCAD structure
generation. We adopt an automated structure synthesis (SS) approach, thereby
bypassing the need for repetitive 3-D process simulations for different layouts
or different versions of the same layout. Results for 32-nm bulk process
simulations versus SS and 32-nm silicon-on-insulator (SOI) hardware
measurements versus corresponding synthesized structures indicate that the
method is an excellent substitute to 3-D process simulation of large layouts,
with extremely favorable time and memory scaling behavior. Finally, the
robustness and scalability of the proposed abstractions are highlighted through
the synthesis of 22-nm SOI 6T FinFET SRAMs and ring oscillator structures.

Luo, Y.; Chakrabarty, K.; Ho, T.-Y.
Error Recovery in Cyberphysical Digital Microfluidic Biochips
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387703

Droplet-based digital microfluidics technology has now come of age, and
software-controlled biochips for healthcare applications are starting to
emerge. However, today's digital microfluidic biochips suffer from the drawback
that there is no feedback to the control software from the underlying hardware
platform. Due to the lack of precision inherent in biochemical experiments,
errors are likely during droplet manipulation; error recovery based on the
repetition of experiments leads to wastage of expensive reagents and
hard-to-prepare samples. By exploiting recent advances in the integration of
optical detectors (sensors) into a digital microfluidics biochip, we present a
physical-aware system reconfiguration technique that uses sensor data at
intermediate checkpoints to dynamically reconfigure the biochip. A
cyberphysical resynthesis technique is used to recompute electrode-actuation
sequences, thereby deriving new schedules, module placement, and droplet
routing pathways, with minimum impact on the time-to-response.

MODELING AND SIMULATION

Aadithya, K. V.; Demir, A.; Venugopalan, S.; Roychowdhury, J.
Accurate Prediction of Random Telegraph Noise Effects in SRAMs and DRAMs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387692

With aggressive technology scaling and heightened variability, circuits such as
SRAMs and DRAMs have become vulnerable to random telegraph noise (RTN). The
bias dependence (i.e., non-stationarity), bi-directional coupling, and high
inter-device variability of RTN present significant challenges to understanding
its circuit-level effects. In this paper, we present two computer-aided design
(CAD) tools, SAMURAI and MUSTARD, for accurately estimating the impact of
non-stationary RTN on SRAMs and DRAMs. While traditional (stationary) analysis
is often overly pessimistic (e.g., it overestimates RTN-induced SRAM failure
rates), the predictions made by SAMURAI and MUSTARD are more reliable by virtue
of non-stationary analysis.

Lin, I.-C.; Lin, C.-H.; Li, K.-H.
Leakage and Aging Optimization Using Transmission Gate-Based Technique 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387701

Negative bias temperature instability (NBTI), which can degrade the switching
speed of PMOS transistors, has become a major reliability challenge. Reducing
leakage consumption is one of the major design goals. The gate replacement (GR)
technique is an effective way to reduce both the NBTI effect and leakage. This
technique, however, has less flexibility because the replaced gate can only
produce one output value and careful algorithms are needed to decide the output
value of the replaced gate. In this paper, we propose a novel transmission
gate-based technique to minimize NBTI-induced degradation and leakage. This
technique, which can offer logic 1 for NBTI mitigation and logic 0 for leakage
reduction, provides higher flexibility, as compared to the GR technique.
Simulation results show that our proposed technique has up to 20x and 2.16x, on
average, improvement on NBTI-induced degradation with comparable leakage power
reduction. With a 19.19% area penalty, combining our technique and the GR can
reduce 17.92% of the total leakage power and 32.36% of NBTI-induced circuit
degradation.

Firouzi, F.; Kiamehr, S.; Tahoori, M. B.
Power-Aware Minimum NBTI Vector Selection Using a Linear Programming Approach
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387695

Transistor aging is a major reliability concern for nanoscale CMOS technology
that can significantly reduce the operation lifetime of very large-scale
integration chips. Negative bias temperature instability (NBTI) is a major
contributor to transistor aging that affects pMOS transistors. On the other
hand, leakage power is becoming a dominant factor of the total power with
successive technology scaling. Since the input combinations applied to a logic
core have a significant impact on both NBTI and leakage power, input vector
control can be used to optimize both phenomena during idle cycles. In this
paper, we present an efficient input vector selection technique based on linear
programming for cooptimizing the NBTI-induced delay degradation and leakage
power consumption during standby mode. Since the NBTI-induced delay degradation
and leakage power are not affected by the input vector in the same direction,
we provide a pareto curve based on both phenomena. A suitable point from such a
pareto curve is chosen based on circuit conditions and requirements during
runtime.

Jung, J.; Kim, T.
Statistical Viability Analysis for Detecting False Paths Under Delay Variation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387698

How long does an integrated circuit take to produce its result? To answer the
question, we must tackle the difficult and complex false path detection problem
first. The viability analysis is one of the most sophisticated approaches to
the false path detection problem. On the other side, as the technology scales
down, the gate delay variation has made a significant impact on the circuit
reliability. Nevertheless, so far the previous timing analyzers have invariably
used the worst-case gate delay in their false path detection algorithms,
missing some important false or true path timing behavior. In this paper, we
propose a solid method of viability analysis under delay variation to solve the
false path detection problem under delay variation, which has never been
addressed by the prior works of timing analysis. In addition to the thorough
theoretical results, to cope with the runtime problem in evaluating the
viability for large circuits in practice, we propose an efficient viability
evaluation technique that is able to soothe the complexity of the numbers of
input vectors. We tested the proposed method on ISCAS benchmark circuits and
carry bypass adders under delay variation, and showed its effectiveness and
usefulness on the false path aware statistical timing analysis.

SYSTEM-LEVEL DESIGN

Gupta, V.; Mohapatra, D.; Raghunathan, A.; Roy, K.
Low-Power Digital Signal Processing Using Approximate Adders
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387646

Low power is an imperative requirement for portable multimedia devices
employing various signal processing algorithms and architectures. In most
multimedia applications, human beings can gather useful information from
slightly erroneous outputs. Therefore, we do not need to produce exactly
correct numerical outputs. Previous research in this context exploits error
resiliency primarily through voltage overscaling, utilizing algorithmic and
architectural techniques to mitigate the resulting errors. In this paper, we
propose logic complexity reduction at the transistor level as an alternative
approach to take advantage of the relaxation of numerical accuracy. We
demonstrate this concept by proposing various imprecise or approximate full
adder cells with reduced complexity at the transistor level, and utilize them
to design approximate multi-bit adders. In addition to the inherent reduction
in switched capacitance, our techniques result in significantly shorter
critical paths, enabling voltage scaling. We design architectures for video and
image compression algorithms using the proposed approximate arithmetic units
and evaluate them to demonstrate the efficacy of our approach. We also derive
simple mathematical models for error and power consumption of these approximate
adders. Furthermore, we demonstrate the utility of these approximate adders in
two digital signal processing architectures (discrete cosine transform and
finite impulse response filter) with specific quality constraints. Simulation
results indicate up to 69% power savings using the proposed approximate adders,
when compared to existing implementations using accurate adders.

TEST

Kim, T.-Y.; Kim, T.
Resource Allocation and Design Techniques of Prebond Testable 3-D Clock Tree
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387699

In 3-D stacked integrated circuit (IC) manufacturing, for the acceptable high
yield, it is essential to stack only known good dies by testing the individual
dies at the prebond stage. While the postbonded 3-D IC is operated by a low
power 3-D clock tree, the prebond testing requires a 2-D clock tree on each
die. The previous work provided a prebond testable 3-D clock tree synthesis
solution by allocating through-silicon via (TSV) buffers and redundant trees
with transmission gates. However, no optimizations on the allocation and design
of the resources have been addressed. In this paper, we propose practically
viable clock tree optimization techniques under prebond testability: 1)
TSV-buffer-aware topology generation techniques that enable an economical
buffer allocation by preventing (potentially ̉badÓ) TSV buffers; 2)
delay-locked loop (DLL)-based 2-D clock network design method that offers a
diverse exploration of 2-D clock tree synthesis and resource allocation for
prebond die testing; and 3) a new circuit design technique of transmission
gates that completely removes its control line. Compared to the existing
topology generation algorithms, our proposed TSV-buffer-aware topology
generation uses 68%Đ88% fewer TSVs, 36%Đ58% less wire resource, and 35%Đ69%
fewer buffers while consuming 17%Đ43% less clock power for the benchmark
circuits, and our proposed method of clock tree exploration provides many
alternative structures of a 2-D clock tree, considering the resource balance
between DLLs and wires. In addition, the use of our self-controlled clock
transmission gate enables a drastic reduction of the total wirelength, which
amounts to 18% on average.

Lien, W.-C.; Lee, K.-J.; Hsieh, T.-Y.; Chakrabarty, K.; Wu, Y.-H.
Counter-Based Output Selection for Test Response Compaction
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387700

Output selection is a recently proposed test response compaction method, where
only a subset of output response bits is selected for observation. It can
achieve zero aliasing, full X-tolerance, and high diagnosability. One critical
issue for output selection is how to implement the selection hardware. In this
paper, we present a counter-based output selection scheme that employs only a
counter and a multiplexer, hence involving very small area overhead and simple
test control. The proposed scheme is ATPG-independent and thus can easily be
incorporated into a typical design flow. Two efficient output selection
algorithms are presented to determine the desired output responses, one using a
single counter operation for simpler test control and the other using more
counter operations for achieving a better test-response reduction ratio.
Experimental results show that for stuck-at faults in large ISCAS'89 and ITC'99
benchmark circuits, 48%Đ90% reduction ratios on test responses can be achieved
with only one counter and one multiplexer employed. Even better results, i.e.,
76%Đ95% reductions, can be obtained for transition faults. It is also shown
that the diagnostic resolution of this method is almost the same as that
achieved by observing all output responses.

SHORT PAPERS

Lu, S.-K.; Huang, H.-H.; Huang, J.-L.; Ning, P.
Synergistic Reliability and Yield Enhancement Techniques for Embedded SRAMs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387702

Single isolated fault (SIF) stands for about 60%Đ70% of the total number of
defects and is rather redundancy hungry since a spare row or a column is
required for repairing each SIF. Therefore, manufacturing yield will decrease
if we do not allocate sufficient spare resources. In this paper, instead of the
traditional fault replacement techniques, synergistic techniques that integrate
both fault replacement and fault masking techniques are proposed. With our
approaches, SIFs are masked instead of the traditional replacement for
repairing. For other minor fault types (e.g., faulty rows and faulty columns),
the fault replacement technique is used as usual. According to simulation
results, repair rates can be improved significantly. The proposed techniques
can be integrated with the conventional built-in self-repair with nearly
negligible hardware overhead.