September 2011 Newsletter
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-09.txt 


REGULAR PAPERS

EMBEDDED SYSTEMS

Kinsman, A.B.  Nicolici, N.N. Automated Range and Precision Bit-Width
Allocation for Iterative Computations 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989989

As scientific computing becomes more widespread in environments where
form-factor considerations necessitate hardware acceleration, the problem of
selecting numerical data representations (bit-width allocation), key to
accelerator design, is faced with shortcomings in the existing techniques. To
address this problem for scientific computing dataflows, we propose a
methodology for determining custom hybrid fixed/floating-point data
representations for iterative computations.

LOGIC SYNTHESIS

Qian W. Riedel, M.D.  Zhou H. Bruck, J. Transforming Probabilities With
Combinational Logic 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989992

Schemes for probabilistic computation can exploit physical sources to generate
random values in the form of bit streams. Generally, each source has a fixed
bias and so provides bits with a specific probability of being one. If many
different probability values are required, it can be expensive to generate all
of these directly from physical sources. This paper demonstrates novel
techniques for synthesizing combinational logic that transforms source
probabilities into different target probabilities. We consider three scenarios
in terms of whether the source probabilities are specified and whether they can
be duplicated. In the case that the source probabilities are not specified and
can be duplicated, we provide a specific choice, the set {0.4, 0.5}; we show
how to synthesize logic that transforms probabilities from this set into
arbitrary decimal probabilities. Further, we show that for any integer n ł 2,
there exists a single probability that can be transformed into arbitrary base-n
fractional probabilities. In the case that the source probabilities are
specified and cannot be duplicated, we provide two methods for synthesizing
logic to transform them into target probabilities. In the case that the source
probabilities are not specified, but once chosen cannot be duplicated, we
provide an optimal choice.

MODELING AND SIMULATION

Mizunuma, H.  Lu Y.-C. Yang C.-L. Thermal Modeling and Analysis for 3-D ICs
With Integrated Microchannel Cooling 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989994

Integrated microchannel liquid-cooling technology is envisioned as a viable
solution to alleviate an increasing thermal stress imposed by 3-D stacked ICs.
Thermal modeling for microchannel cooling is challenging due to its complicated
thermal-wake effect, a localized temperature wake phenomenon downstream of a
heated source in the flow. This paper presents a fast and accurate thermal-wake
aware thermal model for integrated microchannel 3-D ICs. A combination of the
microchannel thermal-wake function and the channel merging technique achieves
more than 3300? speedup with less than 5% error in comparison with a commercial
numerical finite volume simulation tool. With the proposed model, we
characterize thermal behaviors of microchannel-cooled 3-D ICs and compare them
with the case of conventional air-cooled 3-D ICs. We also demonstrate
thermal-aware placements using our thermal model. It shows that the proposed
model can be used to reduce peak temperatures, which is considered important
for 3-D IC designs.

Gu C. QLMOR: A Projection-Based Nonlinear Model Order Reduction Approach Using
Quadratic-Linear Representation of Nonlinear Systems 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5991229

We present a projection-based nonlinear model order reduction method, named
model order reduction via quadratic-linear systems (QLMOR). QLMOR employs two
novel ideas: 1) we show that nonlinear ordinary differential equations, and
more generally differential-algebraic equations (DAEs) with many commonly
encountered nonlinear kernels can be rewritten equivalently in a special
representation, quadratic-linear differential algebraic equations (QLDAEs), and
2) we perform a Volterra analysis to derive the Volterra kernels, and we adapt
the moment-matching reduction technique of nonlinear model order reduction
method (NORM) to reduce these QLDAEs into QLDAEs of much smaller size. Because
of the generality of the QLDAE representation, QLMOR has significantly broader
applicability than Taylor-expansion-based methods since there is no
approximation involved in the transformation from original DAEs to QLDAEs.
Because the reduced model has only quadratic nonlinearities, its computational
complexity is less than that of similar prior methods. In addition, QLMOR,
unlike NORM, totally avoids explicit moment calculations, hence it has improved
numerical stability properties as well. We compare QLMOR against prior methods
on a circuit and a biochemical reaction-like system, and demonstrate that
QLMOR-reduced models retain accuracy over a significantly wider range of
excitation than Taylor-expansion-based methods. QLMOR, therefore, demonstrates
that Volterra-kernel based nonlinear MOR techniques can in fact have far
broader applicability than previously suspected, possibly being competitive
with trajectory-based methods (e.g., trajectory piece-wise linear reduced order
modeling) and nonlinear-projection based methods (e.g., maniMOR).

Zhuo C.  Chopra, K.  Sylvester, D.  Blaauw, D. Process Variation and
Temperature-Aware Full Chip Oxide Breakdown Reliability Analysis 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989995

Gate oxide breakdown (OBD) is a key factor limiting the useful lifetime of an
integrated circuit. Unfortunately, the conventional approach for full chip OBD
reliability analysis assumes a uniform oxide thickness and worst-case
temperature for all devices. In practice, however, gate oxide thickness varies
from die-to-die and within-die and hence may cause different reliability for
different devices even chips. Moreover, due to the increased across-die
temperature variation, such difference may be exacerbated. Thus, as the
precision of variation control worsens, an alternative reliability analysis
approach is needed. In this paper, we propose a statistical framework for
chip-level gate OBD reliability analysis while considering both die-to-die and
within-die components of thickness variations as well as the across-die
temperature variation. The thickness of each device is modeled as a distinct
random variable and thus the full chip reliability estimation problem is
defined on a huge sample space of several million devices. We observe that the
chip-level OBD reliability function is independent of the relative location of
the individual devices. This enables us to transform the problem such that the
resulting representation can be expressed in terms of much fewer random
variables. Using this transformation, we present a computationally efficient
and accurate approach for estimating the full chip reliability while
considering spatial correlations of gate oxide thickness as well as temperature
variation. We show that, compared to Monte Carlo simulation, the proposed
method incurs an error of only around 1% while improving the runtime by more
than three orders of magnitude.

PHYSICAL DESIGN

Lin Y.-H.  Chang S.-H. Li Y.-L. Critical-Trunk-Based Obstacle-Avoiding
Rectilinear Steiner Tree Routings and Buffer Insertion for Delay and Slack
Optimization 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989988

For modern designs, delay optimization significantly facilitates success in
design closure owing to its more realistic metric than wirelength in routing.
Obstacle-avoiding rectilinear Steiner tree (OARST) construction is an essential
routing problem. With the trends toward Internet protocol-block-based
system-on-chip designs, OARST with buffer insertion has been surveyed to
diminish the delay of long wires. Previous works on performance-driven (PD)
OARST without and with buffer insertion can only handle small circuits. This
paper develops a novel routing algorithm in obstacle-avoiding spanning graph to
construct OARST with optimized delay efficiently. The proposed multisource
single-target maze routing is first employed to identify the critical trunks,
and the critical-trunk-based tree growth mechanism connects the unconnected
pins to critical trunks under delay constraints of every sink. We apply the
proposed critical-trunk-based tree growth mechanism to solve PD and
slack-driven (SD) OARST problems. The proposed algorithms are extended to
consider buffer insertion during PD and SD OARST constructions. Experimental
results demonstrate that the proposed algorithms achieve an average 25.84%
improvement in the maximum delay over obstacle-avoiding rectilinear Steiner
minimal tree in the PD OARST problem and successfully solve 66.67% worst
negative slack violations in the SD OARST problem. Compared to the simultaneous
routing and buffer insertion approach, the proposed buffer-aware (BA) algorithm
generates satisfactory timing results with almost identical wire length (WL).
Moreover, the proposed BA SD OARST algorithm utilizes less WL than the BA
rectilinear Steiner tree construction does by 17.99% on average. The runtime
comparison with previous works shows the efficiency and scalability of this
paper.

Tolbert, J.R.  Zhao X.  Lim S. K. Mukhopadhyay, S. Analysis and Design of
Energy and Slew Aware Subthreshold Clock Systems 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989993

In this paper, we analyze the effect of clock slew in subthreshold circuits.
Specifically, we address the issue that variations in clock slew at the
register control can cause serious timing violations. We show that clock slew
variations can cause frequency targets to deviate by as much as 28% from the
design goals. Based on these observations, we recognize the importance of clock
slew control in subthreshold circuits. We propose a systematic approach to
design the clock tree for subthreshold circuits to reduce the clock slew
variations while minimizing the energy dissipation in the tree. The combined
approach, including the wire sizing and dynamic nodal capacitance control, can
achieve better slew control (and better timing control) at lower energy in
subthreshold circuits.

SYSTEM-LEVEL DESIGN

Ayoub, R.  Indukuri, K.  Rosing, T.S. Temperature Aware Dynamic Workload
Scheduling in Multisocket CPU Servers 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989984

In this paper, we propose a multitier approach for significantly lowering the
cooling costs associated with fan subsystems without compromising the system
performance. Our technique manages the fan speed by intelligently allocating
the workload at the core level as well as at the CPU socket level. At the core
level we propose a proactive dynamic thermal management scheme. We introduce a
new predictor that utilizes the band-limited property of the temperature
frequency spectrum. A big advantage of our predictor is that it does not
require the costly training phase and still maintains high accuracy. At the
socket level, we use control theoretic approach to develop a stable scheduler
that reduces the cooling costs further by providing a better thermal
distribution. Our thermal management scheme incorporates runtime workload
characterization to perform efficient thermally aware scheduling. The
experimental results show that our approach delivers an average cooling energy
savings of 80% compared to the state of the art techniques. The reported
results also show that our formal technique maintains stability while heuristic
solutions fail in this aspect.

Thong, J.  Nicolici, N. An Optimal and Practical Approach to Single Constant
Multiplication 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989990

Existing optimal algorithms are limited to constants of up to 19 bits. Our
algorithm requires less than 10 s on average to find a solution for a 32 bit
constant. Optimality is guaranteed via an exhaustive search. We analyze two
common SCM frameworks and the corresponding search strategies that each
framework facilitates. Combining the strengths of both frameworks, we obtain
highly aggressive pruning. The various strategies used in our algorithm and
their underlying intuition are discussed extensively in this paper.

Gebhardt, D.  You J.  Stevens, K.S. Design of an Energy-Efficient Asynchronous
NoC and Its Optimization Tools for Heterogeneous SoCs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989987

The energy usage of on-chip interconnects is a concern for many system-on-chips
targeting portable battery-powered devices. We have designed and evaluated a
network-on-chip (NoC) for such an application, including tools to optimize for
power and communication latency. Our asynchronous (clockless) network operates
with efficient two-phase bundled-data links and four-phase routers. The
topology and router floorplan is determined by our tool, ANetGen, which
optimizes the network for energy and latency using simulated annealing and
force-directed placement methods. We compare our solutions against a
traditional synchronous NoC as specified by the COSI-2.0 framework and ORION
2.0 router and wire energy models. Traffic is simulated with SystemC functional
models, and messages are generated with a "bursty" self-similar b-model.
Results indicate our asynchronous network was more energy-efficient, lower in
area, and provided comparable or superior message latency.

TEST

Bounceur, A.  Mir, S.  Stratigopoulos, H.-G. Estimation of Analog Parametric
Test Metrics Using Copulas 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5991228

A new technique for the estimation of analog parametric test metrics at the
design stage is presented in this paper. This technique employs the copulas
theory to estimate the distribution between random variables that represent the
performances and the test measurements of the circuit under test (CUT). A
copulas-based model separates the dependencies between these random variables
from their marginal distributions, providing a complete and scale-free
description of dependence that is more suitable to be modeled using well-known
multivariate parametric laws. The model can be readily used for the generation
of an arbitrarily large sample of CUT instances. This sample is thereafter used
for estimating parametric test metrics such as defect level (or test escapes)
and yield loss. We demonstrate the usefulness of the proposed technique to
evaluate a built-in-test technique for a radio frequency low noise amplifier
and to set test limits that result in a desired tradeoff between test metrics.
In addition, we compare the proposed technique with previous ones that rely on
direct density estimation.

SHORT PAPERS

Eggersgluss, S.  Drechsler, R. Efficient Data Structures and Methodologies for
SAT-Based ATPG Providing High Fault Coverage in Industrial Application 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989985

ATPG based on Boolean satisfiability (SAT) turned out to be a robust
alternative to classical structural automatic test pattern generation (ATPG)
algorithms performing very well especially for hard-to-detect faults but suffer
from the overhead for easy-to-detect faults. In this letter, we propose new
efficient data structures and methodologies for SAT-based ATPG. The novel
incremental SAT solving technique dynamic clause activation which makes use of
structural information using dedicated data structures forms the core of a new
flexible SAT-based ATPG approach. Experimental results on large industrial
circuits show a significant performance gain and a removal of the limitations.
At the same time, the robustness of SAT-based ATPG can even be strengthened
resulting in very high fault efficiency and increased fault coverage for
transition faults.

Pomeranz, I. Scan Shift Power of Functional Broadside Tests
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989991

The power dissipation during the application of scan-based tests can be
significantly higher than during functional operation. An exception is the
second, fast functional capture cycles of functional broadside tests, where it
is guaranteed that the power dissipation will not exceed that possible during
functional operation. The power dissipation during the other clock cycles of
functional broadside tests is studied here for the first time. The clock cycles
under consideration are referred to as scan shift cycles. This paper describes
a test generation procedure that limits the power dissipation during scan shift
cycles of functional broadside tests. Experimental results for benchmark
circuits demonstrate the extent to which the power dissipation during scan
shift cycles can be limited without affecting the transition fault coverage.

Erdogan, E. S.  Ozev, S. A Multi-Site Test Solution for Quadrature Modulation
RF Transceivers 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989986

In this letter, we present a 2x-site test solution for radio frequency
transceivers using only baseband signals for analysis. We perform all
operations on communication standard-compliant signal packets, thereby putting
the device under the normal operating conditions. The transmitter on one device
under test (DUT) is coupled with a receiver on another DUT to form a complete
transmitter-to-receiver path. Parameters of the two devices are decoupled from
one another by carefully modeling the system and using signal processing
techniques. Simulation as well as measurement results confirm the high accuracy
of the proposed technique.