October 2011 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-10.txt 


KEYNOTE PAPER

Reddi, V. J. Brooks, D. Resilient Architectures via Collaborative Design:
Maximizing Commodity Processor Performance in the Presence of Variations
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022007

Unintended variations in circuit lithography and undesirable fluctuations in
circuit operating parameters such as supply voltage and temperature are
threatening the continuation of technology scaling that microprocessor
evolution relies on. Although circuit-level solutions for some variation
problems may be possible, they are prohibitively expensive and impractical for
commodity processors, on which not only the consumer market but also an
increasing segment of the business market now depends. Solutions at the
microarchitecture level and even the software level, on the other hand,
overcome some of these circuit-level challenges without significantly raising
costs or lowering performance. Using examples drawn from our Alarms Project and
related work, we illustrate how collaborative design that encompasses circuits,
architecture, and chip-resident software leads to a cost-effective solution for
inductive voltage noise, sometimes called the $dI/dt$ problem. The strategy
that we use for assuring correctness while preserving performance can be
extended to other variation problems.

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Mukherjee, S. Dasgupta, P. Mukhopadhyay, S. Auxiliary Specifications for
Context-Sensitive Monitoring of AMS Assertions
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022013
As research on developing assertion languages for the analog and mixed-signal
(AMS) domain gains in momentum, it is increasingly being felt that extensions
of existing assertion languages like property specification language and
SystemVerilog assertions into the AMS domain are not adequate for expressing
the analog design intent. This is largely due to the intricacy of the analog
behavioral intent which cannot be captured purely in terms of logic. In this
paper, we show that by using auxiliary forms of formal specifications such as
abstract state machines and real- valued functions, called auxiliary functions,
as references for AMS assertions, it becomes possible to model complex AMS
behavioral properties. In addition, we present complexity results for the
satisfiability problem of such specifications. This approach leverages the
growing adoption of AMS behavioral modeling in the industry. This paper also
shows that the use of auxiliary state machines allows us to separate out the
scope of different analog assertions leading to significant performance gains
in the assertion checking overhead.

Liu, B. Zhao, D. Reynaert, P. Gielen, G. E. Synthesis of Integrated Passive
Components for High-Frequency RF ICs Based on Evolutionary Computation and
Machine Learning Techniques
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022011
State-of-the-art synthesis methods for microwave passive components suffer from
the following drawbacks. They either have good efficiency but highly depend on
the accuracy of the equivalent circuit models, which may fail the synthesis
when the frequency is high, or they fully depend on electromagnetic (EM)
simulations, with a high solution quality but are too time consuming. To
address the problem of combining high solution quality and good efficiency, a
new method, called memetic machine learning-based differential evolution
(MMLDE), is presented.  The key idea of MMLDE is the proposed online surrogate
model-based memetic evolutionary optimization mechanism, whose training data
are generated adaptively in the optimization process. In particular, by using
the differential evolution algorithm as the optimization kernel and EM
simulation as the performance evaluation method, high-quality solutions can be
obtained. By using Gaussian process and artificial neural network in the
proposed search mechanism, surrogate models are constructed online to predict
the performances, saving a lot of expensive EM simulations. Compared with
available methods with the best solution quality, MMLDE can obtain comparable
results, and has approximately a tenfold improvement in computational
efficiency, which makes the computational time for optimized component
synthesis acceptable. Moreover, unlike many available methods, MMLDE does not
need any equivalent circuit models or any coarse-mesh EM models. Experiments of
60 GHz syntheses and comparisons with the state-of-art methods provide evidence
of the important advantages of MMLDE.

MODELING AND SIMULATION

Xiong, X. Wang, J. Dual Algorithms for Vectorless Power Grid Verification Under
Linear Current Constraints
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022016
Vectorless power grid verification makes it possible to evaluate worst-case
voltage noises without enumerating possible current waveforms. Under linear
current constraints, the vectorless power grid verification problem can be
formulated and solved as a linear programming (LP) problem. However, previous
approaches suffer from long runtime due to the large problem size. In this
paper, we design two dual algorithms that efficiently evaluate voltage noises
in a resistor/capacitor power grid. Our algorithms combine novel dual
approaches to solve the LP problem, and a preconditioned conjugate gradient
power grid analyzer. The first algorithm exploits the structure of the problem
to simplify its dual problem into a convex problem, which is then solved by the
cutting-plane method. The second algorithm formulates a reduced-size LP problem
for each node to compute upper bound of voltage noise.  Experimental results
show that both algorithms are highly efficient.

Bhat, H. S. Osting, B. 2-D Inductor-Capacitor Lattice Synthesis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022033
We consider a general class of 2-D passive propagation media, represented as a
planar graph where nodes are capacitors connected to a common ground and edges
are inductors. Capacitances and inductances are fixed in time but vary in
space. Kirchhoff's laws give the time dynamics of voltage and current in the
system. By harmonically forcing input nodes and collecting the resulting
steady-state signal at output nodes, we obtain a linear, analog device that
transforms the inputs to outputs. We pose the lattice synthesis problem: given
a linear transformation, find the inductances and capacitances for an
inductor-capacitor circuit that can perform this transformation. Formulating
this as an optimization problem, we numerically demonstrate its solvability
using gradient-based methods. By solving the lattice synthesis problem for
various desired transformations, we design several devices that can be used for
signal processing and filtering.

Ghasemazar, M. Pedram, M. Optimizing the Power-Delay Product of a Linear
Pipeline by Opportunistic Time Borrowing
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022006
In this paper, we present and solve the problem of power-delay optimal soft
linear pipeline design. The key idea is to use soft-edge flip-flops to allow
time borrowing among consecutive stages of the pipeline in order to provide the
timing-critical stages with more time and trade this timing slack for power
saving. We formulate the problem of optimally designing the soft-edge
flip-flops and setting the clock frequency and supply voltage so as to minimize
the power-delay product of a linear pipeline under different scenarios using
both deterministic and statistical static delay models. In our first problem
formulation, timing violations are avoided by respecting deterministic worst
case path delay bounds. Next, the same problem is formulated for a scenario
where stage delays are assumed to be random variables, and we minimize the
power-delay product while keeping the probability of timing violations bounded.
The soft-edge flip flops are equipped with dynamic error detection (and
correction) circuitry to detect and fix the errors that might arise from
over-clocking. Although the system is capable of recovering from error, there
is a tradeoff between performance and power saving, which is exploited to
further minimize the power-delay product of the pipeline in our third
formulation. Experimental results demonstrate the efficacy of our proposed
algorithms for solving each of the aforesaid problems.

SYSTEM-LEVEL DESIGN

Chan, J. Hendry, G. Bergman, K. Carloni, L. P. Physical-Layer Modeling and
System-Level Design of Chip- Scale Photonic Interconnection Networks
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022004
Photonic technology is becoming an increasingly attractive solution to the
problems facing today's electronic chip- scale interconnection networks. Recent
progress in silicon photonics research has enabled the demonstration of all the
necessary optical building blocks for creating extremely high-bandwidth density
and energy-efficient links for on-chip and off-chip communications. From the
feasibility and architecture perspective however, photonics represents a
dramatic paradigm shift from traditional electronic network designs due to
fundamental differences in how electronics and photonics function and behave.
As a result of these differences, new modeling and analysis methods must be
employed in order to properly realize a functional photonic chip-scale
interconnect design. In this paper, we present a methodology for characterizing
and modeling fundamental photonic building blocks which can subsequently be
combined to form full photonic network architectures. We also describe a set of
tools which can be utilized to assess the physical-layer and system-level
performance properties of a photonic network. The models and tools are
integrated in a novel open-source design and simulation environment. We present
a case study of two different photonic networks-on-chip to demonstrate how our
improved understanding and modeling of the physical- layer details of photonic
communications can be used to better understand the system-level performance
impact.

Jang, W. Pan, D. Z. Application-Aware NoC Design for Efficient SDRAM Access
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022008
In systems-on-chip (SoCs), a microprocessor demands guaranteed synchronous
dynamic random access memory (SDRAM) latency whereas most of the other cores
are served as a best-effort packet. However, a priority service for the
guaranteed latency causes the SDRAM utilization and latency of an overall
system to be degraded critically. In addition, the data size of SDRAM requested
by various cores is not matched with an SDRAM access granularity such that the
SDRAM utilization and latency are further deteriorated. In this paper, we
propose an application-aware networks-on-chip (NoCs) design for an efficient
SDRAM access, which can consider memory latency demands and memory access
granularities in various applications. In order to provide short latency for
priority memory requests with few penalties, memory request packets are
scheduled by our guaranteed SDRAM service router that includes a hybrid flow
controller of priority-first and priority-equal algorithms. In addition, our
SDRAM access granularity matching NoC design further improves the memory
performance by splitting a memory request packet to several short memory
request packets and then controlling the short memory request packets with a
partially open-page mode and an auto-precharge operation in a memory subsystem.
Experimental results show that our cost-effective application-aware NoC design
significantly improves, on average, memory latency for latency-sensitive cores
up to 32.8%, overall memory latency up to 7.8%, and memory utilization up to
3.4%, compared to the state-of-the-art SDRAM-aware NoC design.

TEST

Lee, K.-J. Lien, W.-C. Hsieh, T.-Y. Test Response Compaction via Output Bit
Selection
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022010
The conventional output compaction methods based on XOR-networks and/or linear
feedback shift registers may suffer from the problems of aliasing,
unknown-values, and/or poor diagnosability. In this paper, we present an
alternative method called the output-bit-selection method to address the test
compaction problem. By observing only a subset of output responses, this method
can effectively deal with all the above-mentioned problems. Efficient
algorithms that can identify near optimum subsets of output bits to cover all
detectable faults in very large circuits are developed. Experimental results
show that less than 10% of the output response bits of an already very compact
test set are enough to achieve 100% single stuck-at fault coverage for most
ISCAS benchmark circuits. Even better results are obtained for ITC 99 benchmark
circuits as less than 3% of output bits are enough to cover all stuck-at faults
in these circuits. The increase ratio of selected bits to cover other types of
faults is shown to be quite small if these faults are taken into account during
automatic test pattern generation. Furthermore, the diagnosis resolution of
this method is almost the same as that achieved by observing all output
response bits.

Li, M. Hsiao, M. S. 3-D Parallel Fault Simulation With GPGPU
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022009
General purpose computing on graphical processing units (GPGPU) is a paradigm
shift in computing that promises a dramatic increase in performance. GPGPU also
brings an unprecedented level of complexity in algorithmic design and software
development. In this paper, we present an efficient parallel fault simulator,
${rm FSimGP}^{2}$, that exploits the high degree of parallelism supported by a
state-of-the-art graphic processing unit (GPU) with the NVIDIA compute unified
device architecture. A novel 3-D parallel fault simulation technique is
proposed to achieve extremely high computation efficiency on the GPU. Global
communication is minimized by concentrating as much work as possible on the
local device's memory. We present results on a GPU platform from NVIDIA (a
GeForce GTX 285 graphics card) that demonstrate a speedup of up to 63times and
4times compared to two other GPU-based fault simulators and up to 95times over
a state-of-the-art algorithm on conventional processor architectures.

VERIFICATION

Shen, S. Qin, Y. Xiao, L. Wang, K. Zhang, J. Li, S. A Halting Algorithm to
Determine the Existence of the Decoder
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022015
Complementary synthesis automatically synthesizes the decoder circuit of an
encoder. It determines the existence of the decoder by checking whether the
encoder's input can be uniquely determined by its output. However, this
algorithm will not halt if the decoder does not exist. To solve this problem, a
novel halting algorithm is proposed.  For every path of the encoder, this
algorithm first checks whether the encoder's input can be uniquely determined
by its output. If yes, the decoder exists; otherwise, this algorithm checks if
this path contains loops, which can be further unfolded to prove the
non-existence of the decoder for all those longer paths. To illustrate its
usefulness and efficiency, this algorithm has been run on several complex
encoders, including PCI-E and Ethernet. Experimental results indicate that this
algorithm always halts properly by distinguishing correct encoders from
incorrect ones, and it is more than three times faster than previous ones.

SHORT PAPERS

Yuan, L. Leventhal, S. R. Gu, J. Qu, G. TALk: A Temperature-Aware Leakage
Minimization Technique for Real-Time Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022017
Due to the increasing chip temperature and the strong dependency of leakage
power on temperature, thermal aware power management has received a
considerable amount of attention recently in energy efficient system design. In
this paper, we propose a temperature-aware intra-task scheduling algorithm to
minimize leakage energy in real-time systems. The basic idea of our algorithm
is to run tasks at full speed when the chip temperature is low or the work
urgency is high, and switch the processor to a low-power state when the chip
temperature is high or the workload is light. Our offline algorithm can achieve
the optimal leakage reduction for a given task with the worst-case execution
time, while the online algorithm has a very low runtime complexity. The
simulation results show that the online algorithm is able to reduce 34% of
total leakage energy on average in both real-world and artificial benchmarks.
Finally, we demonstrate how to combine our algorithm with existing dynamic
voltage scaling technique to optimize the overall energy consumption.


Wei, T. Chen, X. Hu, S. Reliability-Driven Energy-Efficient Task Scheduling for
Multiprocessor Real-Time Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022005
This paper proposes a reliability-driven task scheduling scheme for
multiprocessor real-time embedded systems that optimizes system energy
consumption under stochastic fault occurrences. The task scheduling problem is
formulated as an integer linear program where a novel fault adaptation variable
is introduced to model the uncertainties of fault occurrences. The proposed
scheme, which considers both the dynamic power and the leakage power, is able
to handle the scheduling of independent tasks and tasks with precedence
constraints, and is capable of scheduling tasks with varying deadlines.
Experimental results have demonstrated that the proposed reliability-driven
parallel scheduling scheme achieves energy savings of more than 15% when
compared to the approach of designing for the corner case of fault occurrences.

Maffezzoni, P. D'Amore, D. Analysis of Phase Diffusion Process in Oscillators
Due to White and Colored- Noise Sources
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022012
This paper presents an original phase-domain analysis of the diffusion process
which occurs in oscillators due to small-signal white and colored-noise
sources. It is shown that the proposed analysis is able to predict correctly
the oscillator's power spectrum also very closely to the fundamental harmonic.
The correctness of the analysis is verified through comparisons with
commercially available tools. 

Pomeranz, I. Subsets of Primary Input Vectors in Sequential Test Generation for
Single Stuck-at Faults
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022014
The complexity of deterministic sequential test generation for a target fault
in a circuit with $n$ primary inputs is determined by the need to explore a
search space that consists of $2^{n}$ primary input vectors at every time unit.
This paper studies the possibility of reducing the complexity of deterministic
sequential test generation by using subsets of primary input vectors of limited
sizes during test generation for target faults. It considers a test generation
procedure that uses subsets of primary input vectors of size $N$, for
increasing values of $N$ starting with $N=1$ .  The subsets consist of primary
input vectors from the test sequence already generated, and of random primary
input vectors. The results indicate that all or most of the detectable single
stuck-at faults in benchmark circuits can be detected using small subsets of
primary input vectors.