December 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-12.txt 

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Lin, C.-W.; Lin, J.-M.; Chiu, Y.-C.; Huang, C.-P.; Chang, S.-J. 
Mismatch-Aware Common-Centroid Placement for Arbitrary-Ratio Capacitor Arrays
Considering Dummy Capacitors
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349434

Switched capacitors are commonly used in analog circuits to increase the
accuracy of analog signal processing and lower power consumption. To take full
advantage of switched capacitors, it is very important to achieve accurate
capacitance ratios in the layout of the capacitor arrays, which are affected by
systematic and random mismatches. A good capacitor placement should have a
common-centroid structure with the highest possible degree of dispersion to
mitigate mismatches. Several dummy units should be inserted to make the
placement shape more square and compact. This paper proposes a
simulated-annealing-based approach for mismatch-aware common-centroid placement
under the above constraints. A pair-sequence representation is used to record a
placement, and a couple of associated operations are developed to find better
solutions. The experimental results show that the proposed placements achieve
smaller oxide-gradient-induced mismatch and larger overall correlation
coefficients (i.e., higher degree of dispersion) than those of previous works.

FPGAS AND RECONFIGURABLE COMPUTING

Ansaloni, G.; Tanimura, K.; Pozzi, L.; Dutt, N. 
Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable
Arrays
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349426

Coarse-grained reconfigurable arrays (CGRAs) are a promising class of
architectures conjugating flexibility and efficiency. Devising effective
methodologies to map applications onto CGRAs is a challenging task, due to
their parallel execution paradigm and constrained hardware resources. In order
to handle complex applications, it is important to devise efficient strategies
to partition a kernel into pieces that obey resource constraint and
methodologies to schedule them on the underlying hardware. In this paper, we
tackle these problems by proposing algorithms to address partitioning based on
recursive searches over abstract trees. A novel scheduling strategy is also
described that, leveraging differences in delays of various operations, is able
to efficiently map operations on CGRA architectures. Experimental evidence on
kernels derived from a diverse set of data flow graphs and EEMBC benchmarks
demonstrate the efficacy of the described methods, which, when combined,
achieve a higher runtime performance on a given mesh size than state-of-the-art
approaches (as much as 38% for the benchmark applications considered).

HIGH-LEVEL SYNTHESIS

Del Barrio, A. A.; Hermida, R.; Memik, S. O.; Mendias, J. M.; Molina, M. C.
Multispeculative Addition Applied to Datapath Synthesis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349430

Addition is the key arithmetic operation in most digital circuits and
processors. Therefore, their performance and other parameters, such as area and
power consumption, are highly dependent on the adders' features. In this paper,
we present multispeculation as a way of increasing adders' performance with a
low area penalty. In our proposed design, dividing an adder into several
fragments and predicting the carry-in of each fragment enables computing every
addition in two very short cycles at the most, with 99% or higher probability.
Furthermore, based on multispeculation principles, we propose a new strategy
for implementing addition chains and hiding most of the penalty cycles due to
mispredictions, while keeping at the same time the resource sharing
capabilities that are sought in high-level synthesis. Our results show that it
is possible to build linear and logarithmic adders more than 4.7x and 1.7x
faster than the nonspeculative case, respectively. Moreover, this is achieved
with a low area penalty (38% for linear adders) or even an area reduction (-8%
for logarithmic adders). Finally, applying multispeculation principles to
signal processing benchmarks that use addition chains will result in 25%
execution time reduction, with an additional 3% decrease in datapath area with
respect to implementations with logarithmic fast adders.

MODELING AND SIMULATION

Sun, S.; Feng, Y.; Dong, C.; Li, X.
Efficient SRAM Failure Rate Prediction via Gibbs Sampling
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349433

Statistical analysis of SRAM has emerged as a challenging issue because the
failure rate of SRAM cells is extremely small. In this paper, we develop an
efficient importance sampling algorithm to capture the rare failure event of
SRAM cells. In particular, we adapt the Gibbs sampling technique from the
statistics community to find the optimal probability distribution for
importance sampling with a low computational cost (i.e., a small number of
transistor- level simulations). The proposed Gibbs sampling method applies an
integrated optimization engine to adaptively explore the failure region in a
Cartesian or spherical coordinate system by sampling a sequence of 1-D
probability distributions. Several implementation issues such as 1-D random
sampling and starting point selection are carefully studied to make the Gibbs
sampling method efficient and accurate for SRAM failure rate prediction. Our
experimental results of a 90 nm SRAM cell demonstrate that the proposed Gibbs
sampling method achieves 1.4 Ð 4.9x runtime speedup over other state-of-the-art
techniques when a high prediction accuracy is required (e.g., the relative
error defined by the 99% confidence interval reaches 5%). In addition, we
further demonstrate an important example for which the proposed Gibbs sampling
algorithm accurately estimates the correct failure probability, while the
traditional techniques fail to work.

PHYSICAL DESIGN

Lak, Z.; Nicolici, N.
On Using On-Chip Clock Tuning Elements to Address Delay Degradation Due to Circuit Aging
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349436

Lifetime performance of digital integrated circuits degrades as a consequence
of circuit aging. In the past few years, there has been extensive research to
reduce the impact of aging by different design techniques, or to predict the
degradation and adapt the circuit accordingly. In this paper, we explore a
novel perspective to this problem by exploiting the presence of clock tuning
elements in high-performance designs. By combining on-chip sensors to predict
setup or hold-time violations with the clock tuning elements, we provide an
effective self-tuning mechanism for each circuit sample. The proposed method
can operate in-system to prolong the circuit's maximum performance in its
unique operating environment.


Chang, H.-Y.; Jiang, I. H.-R.; Chang, Y.-W.
Timing ECO Optimization Via BŽzier Curve Smoothing and Fixability Identification
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349427

Due to the rapidly increasing design complexity in modern integrated circuit
design, more and more timing failures are detected at late stages. Without
deferring time-to-market, metal-only engineering change order (ECO) is an
economical technique to correct these late-found failures. Typically, a design
might need to undergo many ECO runs in design houses; consequently, the usage
of spare cells for ECO is of significant importance. In this paper, we aim at
timing ECO by using as few spare cells as possible. We observe that a path with
good timing is desired to be geometrically smooth. Unlike negative slack and
gate delay used in most prior work, we propose a new metric of timing
criticality, fixability, by considering the smoothness of timing violating
paths. To measure the smoothness of a path, we use the BŽzier curve as the
golden path. Furthermore, in order to concurrently fix timing violations, we
derive a propagation property to divide violating paths into independent
segments. Based on BŽzier curve smoothing, fixability identification, and
the propagation property, we develop an efficient algorithm to fix timing
violations. Experimental results show that we can effectively resolve all
timing violations with significant speedups over the state-of-the-art works.

SYSTEM-LEVEL DESIGN

Srinivasan, S.; Ganeshpure, K. P.; Kundu, S. 
A Wavelet-Based Spatio-Temporal Heat Dissipation Model for Reordering of Program Phases to Produce 
Temperature Extremes in a Chip
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349437

Localized heating leads to generation of thermal hotspots that affect the
performance and reliability of an integrated circuit (IC). Functional workloads
determine the locations and temperatures of hotspots on a die. In this paper,
we present a systematic approach for developing a synthetic workload to
maximize the temperature of a target hotspot.  Our approach is based on the
observation that hotspot temperature is determined not only by the current
activity in that region, but also by the past activities in the surrounding
regions. Accordingly, we develop a wavelet-based canonical spatio-temporal heat
dissipation model for program traces, and use a novel integer linear
programming formulation to rearrange program phases to generate target worst
case hotspot temperature. Program phase behavior is rooted in the static
structure of programs. In this case, the initial set of program phases is
extracted from the SPEC 2000 benchmark. We apply this formulation to target
another well-known problem of maximizing the temperature between a pair of
coordinates in an IC. Experimental results show that by taking the
spatio-temporal effect into account, we can raise the temperature of a hotspot
higher than what is otherwise possible. Hotspot temperature maximization is
important in design verification and testing.

TEST

Constantin, N. G.; Kwok, K. H.; Shao, H.; Cismaru, C.; Zampardi, P. J.
Formulations and a Computer-Aided Test Method for the Estimation of IMD Levels in an Envelope 
Feedback RFIC Power Amplifier
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349429

This paper presents new formulations, together with an efficient computer-aided
test approach intended for radio frequency integrated circuit power amplifiers
(PAs), allowing the estimation of linearity requirements for the circuit blocks
typically found in the error signal path of an envelope feedback amplifier. The
formulations are based on a three-tone excitation, allowing analysis of
intermodulation distortion (IMD) within the feedback system using parameterized
peak-to-average envelope voltage. They are also based on a fifth-degree
representation, and may be extended to higher degrees of nonlinearities in the
RF PA block, enabling IMD analysis of envelope feedback amplifiers at low
power. The approach proposed in this paper circumvents the difficulty of
measuring error signals during closed-loop operation for troubleshooting
purposes. This approach is also very useful for computer-aided test setups
intended for development work independent of the often idealized circuit
simulation environment.

Janicki, J.; Kassab, M.; Mrugalski, G.; Mukherjee, N.; Rajski, J.; Tyszer, J.
EDT Bandwidth Management in SoC Designs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349438

This paper presents preemptive test application schemes for system-on-a-chip
(SoC) designs with embedded deterministic test-based compression. The schemes
seamlessly combine new test data reduction techniques with test scheduling
algorithms and novel test access mechanisms devised for both input and output
sides. In particular, they allow cores to interface with automatic test
equipment through an optimized number of channels. They are well suited for SoC
devices comprising both nonisolated cores, i.e., blocks that occasionally need
to be tested simultaneously, and completely wrapped modules. Experimental
results obtained for large industrial SoC designs illustrate feasibility of the
proposed test application schemes and are reported herein.

Kochte, M. A.; Elm, M.; Wunderlich, H.-J.
Accurate X-Propagation for Test Applications by SAT-Based Reasoning
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349431

Unknown or X-values during test applications may originate from uncontrolled
sequential cells or macros, from clock or A/D boundaries, or from tristate
logic. The exact identification of X-value propagation paths in logic circuits
is crucial in logic simulation and fault simulation. In the first case, it
enables the proper assessment of expected responses and the effective and
efficient handling of X-values during test response compaction. In the second
case, it is important for a proper assessment of fault coverage of a given test
set and consequently influences the efficiency of test pattern generation. The
commonly employed n-valued logic simulation evaluates the propagation of
X-values only pessimistically, i.e., the X-propagation paths found by n-valued
logic simulation are a superset of the actual propagation paths. This paper
presents an efficient method for overcoming this pessimism and for determining
accurately the set of signals that carry an X-value for an input pattern. As
examples, it investigates the influence of this pessimism on the two
applications, X-masking and stuck-at fault coverage assessment. The
experimental results on benchmark and industrial circuits assess the pessimism
of classic algorithms and show that these algorithms significantly overestimate
the signals with X-values. The experiments show that overmasking of test data
during test compression can be reduced by an accurate analysis. In stuck-at
fault simulation, the coverage of the test set is increased by the proposed
algorithm without incurring any overhead.

SHORT PAPERS

Viraraghavan, J.; Pandharpure, S. J.; Watts, J.
Statistical Compact Model Extraction: A Neural Network Approach
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349439

A technique for extracting statistical compact model parameters using
artificial neural networks (ANNs) is proposed. ANNs can model a much higher
degree of nonlinearity compared to existing quadratic polynomial models and,
hence, can even be used in sub-100-nm technologies to model leakage current
that exponentially depends on process parameters. Existing techniques cannot be
extended to handle such exponential functions. Additionally, ANNs can handle
multiple input multiple output relations very effectively. The concept applied
to CMOS devices improves the efficiency and accuracy of model extraction.
Results from the ANN match the ones obtained from SPICE simulators within 1%.

Maffezzoni, P.; Levantino, S. 
Phase-Noise Analysis and Simulation of LC Oscillator-Based Injection-Locked Frequency Dividers
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349435

This letter proposes a phase-noise analysis of injection-locked frequency
dividers that are based on LC oscillators.  The proposed analysis method relies
on the concept of the perturbation projection vector and allows one to
investigate how the output phase noise is affected by the amplitude and the
frequency of the input signal. Closed- form expressions for the output
phase-noise spectrum under different injection conditions are provided and
validated against the periodic noise analysis of a commercial circuit
simulator. Indeed, the proposed semianalytical method can provide insights and
guidelines to improve the circuit design.

Li, K. S.-M.; Liao, Y.-Y.
Layout-Aware Multiple Scan Tree Synthesis for 3-D SoCs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349432

An interconnect-driven layout-aware multiple scan tree (MST) synthesis
methodology for 3-D integrated circuits (ICs) is proposed. MSTs, also known as
scan forest, greatly reduce test data volume and test application time in
system-on-a-chip testing. Previous studies on layout-aware scan tree synthesis
only address 2-D layouts, so they cannot be directly applied to 3-D ICs. The
proposed algorithm effectively optimizes both test compression rate and routing
length under 3-D IC-induced constraints, and produces better results than all
previous known methods.

Chung, J.; Abraham, J. A. 
On Computing Criticality in Refactored Timing Graphs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349428

The maximum operator in statistical static timing analysis (SSTA) is a decent
approximation for timing sign-off, but often causes significant error in SSTA
applications. This paper presents a timing criticality computation method based
on non-maximum analytic operators in a parameterized SSTA. After an SSTA run,
the proposed method computes the criticality for all edges and nodes in a
single graph traversal. Although we do not employ the max operator in the
computation process, the error in the maximum operator still degrades the
accuracy of the computed criticality because the criticality is a joint
probability of expressions, including arrival times, which are computed by the
maximum operator during SSTA. To address this issue, we employ the refactoring
technique, which was recently proposed to reduce common path pessimism in
combinational circuits. This paper shows that refactoring is also very useful
in reducing the maximum-induced error in arrival times, and how existing
graph-based algorithms can be geared toward refactoring. Our experimental
results show that the proposed method reduces the error of the criticality
significantly compared to the conventional cutset-based method.