March 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-03.txt 


CALL FOR PAPERS: Special Section on Three-dimensional Integrated Circuits and Microarchitectures
(Deadline: March 15, 2012)  

Call for papers available at http://www.umn.edu/~tcad/special_sections/TCAD-3D-CFP.pdf 

Guest Editors
Yuan Xie, Pennsylvania State University, yuanxie@cse.psu.edu
Gabriel H. Loh, AMD Research. Gabe.loh@amd.com

REGULAR PAPERS

EMBEDDED SYSTEMS

Ejlali, A. Al-Hashimi, B. M. Eles, P. 
Low-Energy Standby-Sparing for Hard Real-Time Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152774

Time-redundancy techniques are commonly used in real-time systems to achieve
fault tolerance without incurring high energy overhead. However, reliability
requirements of hard real-time systems that are used in safety-critical
applications are so stringent that time-redundancy techniques are sometimes
unable to achieve them. Standby sparing as a hardware-redundancy technique can
be used to meet high reliability requirements of safety-critical applications.
However, conventional standby-sparing techniques are not suitable for
low-energy hard real-time systems as they either impose considerable energy
overheads or are not proper for hard timing constraints. In this paper we
provide a technique to use standby sparing for hard real-time systems with
limited energy budgets. The principal contribution of this paper is an online
energy-management technique which is specifically developed for standby-sparing
systems that are used in hard real-time applications. This technique operates
at runtime and exploits dynamic slacks to reduce the energy consumption while
guaranteeing hard deadlines. We compared the low-energy standby-sparing (LESS)
system with a low-energy time-redundancy system (from a previous work). The
results show that for relaxed time constraints, the LESS system is more
reliable and provides about 26% energy saving as compared to the
time-redundancy system. For tight deadlines when the time-redundancy system is
not sufficiently reliable (for safety-critical application), the LESS system
preserves its reliability but with about 49% more energy consumption.

HIGH-LEVEL SYNTHESIS

Sarbishei, O. Radecka, K. Zilic, Z. 
Analytical Optimization of Bit-Widths in Fixed-Point LTI Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152780

Analyses of range and precision are important for high-level synthesis and
verification of fixed-point circuits.  Conventional range and precision
analysis methods mostly focus on combinational arithmetic circuits and suffer
from major inefficiencies when dealing with sequential linear-time-invariant
circuits. Such problems mainly include inability to analyze precision when
quantization of constant coefficients is taken into account, and lacking
efficient word-length optimization algorithms to handle both variables and
constants, while satisfying the error metrics. The algorithms presented in this
paper solve these problems. Experiments illustrate the efficiency and
robustness of our algorithms.

MODELING AND SIMULATION

Gao, M. Ye, Z. Wang, Y. Yu, Z. 
Efficient Full-Chip Statistical Leakage Analysis Based on Fast Matrix Vector Product
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152775

Power consumption has become a major concern since the integrated circuit
industry entered the nanometer design regime. Due to the increasing process
variation, deterministic leakage power analysis becomes inadequate and thus
statistical analysis is required. The challenges of statistical leakage
analysis are that the huge number of random variables make trivial computation
of the variance in O(N^{2}) time impractical for realistic designs and that
knowing only the first two moments is not sufficient to obtain the distribution
of the full-chip leakage. In this paper, we introduce efficient linear time
algorithms for statistical leakage analysis. To enable those algorithms, a fast
matrix vector product technique is crucial, being applied not only to compute
the second moment of the total leakage, but also, combined with a comonotonic
approximation, to estimate the distribution function of the total leakage
power. The computational complexity of the proposed algorithms is provably
O(N), and the experimental result is presented with detailed discussion,
indicating promising improvement in terms of accuracy.

Wei, C.-J. Chen, H. Chen, S.-J. 
Design and Implementation of Block-Based Partitioning for Parallel Flip-Chip Power-Grid Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152783

Power-grid analysis is one of the critical design steps to ensure circuit
reliability and achieve performance targets for very large scale integration
systems. With each new technology generation, the circuit size has decreased
and the power density has increased. Consequently, power-grid analysis has
become ever more complex with greater CPU runtime and memory usage
requirements. For a state-of-the-art power-grid design with more than
100-million nodes, it is often desirable to partition the power grid into
smaller regions and analyze them in parallel by exploiting the locality of
flip-chip packages. However, the traditional area-based partitioning strategy
may not be best suited to analyze the DC current and ohmic IR voltage drop of a
design that has irregular power rails and nonuniform power consumption because
such nonuniformity affects the locality of power supply network and the
accuracy of analysis.  In this paper, we will present the analysis of a
flip-chip design with 136-million nodes and propose a block-based partitioning
scheme to improve the accuracy of parallel power-grid analysis.


Lee, J. Chen, D. Balakrishnan, V. Koh, C.-K. Jiao, D. 
A Quadratic Eigenvalue Solver of Linear Complexity for 3-D
Electromagnetics-Based Analysis of Large- Scale Integrated Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152776

It is of critical importance to efficiently and accurately predict global
resonances of a 3-D integrated circuit system that involves arbitrarily shaped
lossy conductors and inhomogeneous materials. A quadratic eigenvalue solver of
linear complexity and electromagnetic accuracy is developed in this paper to
fulfill this task. Without sacrificing accuracy, the proposed eigenvalue solver
has shown a clear advantage over state-of-the-art eigenvalue solvers in fast
CPU time. It successfully solves a quadratic eigenvalue problem of over 2.5
million unknowns associated with a large-scale 3-D on-chip circuit embedded in
inhomogeneous materials in 40 min on a single 3 GHz 8222SE AMD Opteron
processor.

PHYSICAL DESIGN

Seomun, J. Shin, I. Shin, Y. 
Synthesis of Active-Mode Power-Gating Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152781

Active leakage is transient, which can be suppressed by design techniques such
as dual-Vt. Active-mode power- gating (AMPG) can further reduce active leakage
by power-gating groups of gates that perform computations with results that are
not loaded due to clock-gating. AMPG involves several challenges; the grouping
of gates must take circuit timing into account, and current switches need to be
sized to preserve power network integrity as well as circuit timing. We propose
solutions to these problems in the content of the entire process of
synthesizing AMPG circuits. The physical design of AMPG circuits is also
difficult due to the large number of virtual ground rails that must be mutually
isolated. We address these issues by integrating placement with power network
synthesis.  Experiments on several test circuits implemented in 45-nm
technology demonstrate the effectiveness of AMPG in the circuits that we
synthesized, in terms of power consumption, area, wirelength, and timing.

SYSTEM-LEVEL DESIGN

Kahng, A. B. Kang, S. Kumar, R. Sartori, J. 
Recovery-Driven Design: Exploiting Error Resilience in Design of Energy-Efficient Processors
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152777

Conventional computer-aided design (CAD) methodologies optimize a processor
module for correct operation and prohibit timing violations during nominal
operation. We propose recovery-driven design, a design approach that optimizes
a processor module for a target timing error rate (ER) instead of correct
operation. The target ER is chosen based on how many errors can be gainfully
tolerated by a hardware or software error resilience mechanism.  We show that
significant power benefits are possible from a recovery-driven design approach
that deliberately allows errors caused by voltage overscaling to occur during
nominal operation, while relying on an error resilience technique to tolerate
these errors. We present a detailed evaluation and analysis of such a CAD
methodology that minimizes the power of a processor module for a target ER. We
show how this design-level methodology can be extended to design
recovery-driven processorsÑprocessors that are optimized to take advantage of
hardware or software error resilience. We also discuss a gradual slack
recovery-driven design approach that optimizes for a range of ERs to create
soft processorsÑprocessors that have graceful failure characteristics and the
ability to trade throughput or output quality for additional energy savings
over a range of ERs. We demonstrate significant power benefits over
conventional designÑ11.8% on average over all modules and ER targets, and up to
29.1% for individual modules. Processor-level benefits were 19.0%, on average.
Benefits increase when recovery-driven design is coupled with an error
resilience mechanism or when the number of available voltage domains increases.

Kakoulli, E. Soteriou, V. Theocharides, T. 
Intelligent Hotspot Prediction for Network-on-Chip-Based Multicore Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152782

Hotspots are network-on-chip (NoC) routers or modules in multicore systems
which occasionally receive packetized data from other networked element
producers at a rate higher than they can consume it. This adverse phenomenon
may greatly reduce the performance of NoCs, especially when wormhole
flow-control is employed, as backpressure can cause the buffers of neighboring
routers to quickly fill-up leading to a spatial spread in congestion. This can
cause the network to saturate prematurely where in the worst scenario the NoC
may be rendered unrecoverable.  Thus, a hotspot prevention mechanism can be
greatly beneficial, as it can potentially enable the interconnection system to
adjust its behavior and prevent the rise of potential hotspots, subsequently
sustaining NoC performance.  The inherent unevenness of traffic patterns in an
NoC-based general-purpose multicore system such as a chip multiprocessor, due
to the diverse and unpredictable access patterns of applications, produces
unexpected hotspots whose appearance cannot be known a priori, as application
demands are not predetermined, making hotspot prediction and subsequently
prevention difficult. In this paper, we present an artificial neural
network-based (ANN) hotspot prediction mechanism that can be potentially used
in tandem with a hotspot avoidance or congestion-control mechanism to handle
unforeseen hotspot formations efficiently. The ANN uses online statistical data
to dynamically monitor the interconnect fabric, and reactively predicts the
location of an about to-be-formed hotspot(s), allowing enough time for the
multicore system to react to these potential hotspots. Evaluation results
indicate that a relatively lightweight ANN-based predictor can forecast hotspot
formation(s) with an accuracy ranging from 65% to 92%.

SHORT PAPERS

Maffezzoni, P. 
Stochastic Analysis of Switched-Capacitor Circuits for Sampled Data Converters
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152779

This paper describes an original simulation-based method to derive the
stochastic properties of the output noise of switched-capacitor circuits which
are used in sampled-data converters. The method relies on a linear time-varying
approximation of the large-signal transient response of the switched circuits.
It is shown how switched-capacitor- circuit noise and quantization noise, due
to the presence of harsh comparators, can be analyzed in a unified frame where
the data converter is modeled as a discrete-time system.

Li, Z. Zhou, Y. N. Shi, W. 
Time Algorithm for Optimal Buffer Insertion of Nets With  Sinks
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152778

Buffer insertion is an effective technique to reduce interconnect delay. In
this paper, we give a simple O(mn) time algorithm for optimal buffer insertion,
where m is the number of sinks and n is the number of buffer positions. When m
is small, our algorithm is a significant improvement over the recent
O(nlog^{2}n) time algorithm by Shi and Li, and the O(n^2) time algorithm of van
Ginneken. For b buffer types, our algorithms runs in O(b^2 n+bmn) time, an
improvement of the recent O(bn^{2}) algorithm by Li and Shi. The improvement is
made possible by an innovative linked list that can perform addition of a wire,
addition of a buffer in amortized O(1) time, and smart design of pointers. We
then present the extension of our algorithm for the buffer cost minimization
problem, which improves the previous best algorithm. On industrial test cases,
the new algorithms is faster than previous best algorithms by an order of
magnitude.

Yang, J.-S. Touba, N. A. 
Efficient Trace Signal Selection for Silicon Debug by Error Transmission Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152784

In this paper, a technique is presented for selecting signals to observe during
silicon debug. Internal signals are used to analyze, understand, and debug
circuit misbehavior. An automated procedure to select which signals to observe
is proposed to facilitate early detection of circuit malfunction and to enhance
the utilization of hardware resources for storage. Signals that are most often
sensitized to possible errors are observed in sequential circuits. Given a
functional input vector set, an error transmission matrix is generated by
analyzing which flip-flops are sensitized to other flip-flops. Relatively
independent flip-flops are identified and a set of signals that maximally cover
the possible error sites with given constraints are identified through integer
linear programming. Experimental results show that the proposed approach can
rapidly and precisely identify the nonconforming chip behavior and thereby can
speed up the post-silicon debug process.

Das, S. Banerjee, A. Dasgupta, P. 
Early Analysis of Critical Faults: An Approach to Test Generation From Formal Specifications
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152773

This paper presents a formal methodology for test generation from formal
specifications. Our method can be used for test generation for critical faults
in component-based designs. Test generation for critical faults is done
entirely using formal specifications and therefore the theory inherently
guarantees that a generated test will be applicable to any implementation of
the specifications. The theory makes fault analysis possible at an abstract
level of design where the complete logic is not specified.