July 2012 Newsletter
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-07.txt 

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Liu, B.; Deferm, N.; Zhao, D.; Reynaert, P.; Gielen, G. G. E. 
An Efficient High-Frequency Linear RF Amplifier Synthesis Method Based on
Evolutionary Computation and Machine Learning Techniques
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218230 

Existing radio frequency (RF) integrated circuit (IC) design automation methods
focus on the synthesis of circuits at a few GHz, typically less than 10 GHz.
That framework is difficult to apply to RF IC synthesis at mm-wave frequencies
(e.g., 60Ð100 GHz). In this paper, a new method, called efficient machine
learning-based differential evolution, is presented for mm-wave frequency
linear RF amplifier synthesis. By using electromagnetic (EM) simulations to
evaluate the key passive components, the evaluation of circuit performances is
accurate and solves the limitations of parasitic-included equivalent circuit
models and predefined layout templates used in the existing synthesis
framework. A decomposition method separates the design variables that require
expensive EM simulations and the variables that only need cheap circuit
simulations. Hence, a low-dimensional expensive optimization problem is
generated. By the newly proposed core algorithm integrating adaptive population
generation, naive Bayes classification, Gaussian process and differential
evolution, the generated low-dimensional expensive optimization problem can be
solved efficiently (by the online surrogate model), and global search (by
evolutionary computation) can be achieved. A 100 GHz three-stage differential
amplifier is synthesized in a 90 nm CMOS technology. The power gain reaches 10
dB with more than 20 GHz bandwidth. The synthesis costs only 25 h, having a
comparable result and a nine times speed enhancement compared with directly
using the EM simulator and global optimization algorithms.

EMERGING TECHNOLOGIES

Dong, X.; Xu, C.; Xie, Y.; Jouppi, N. P. 
NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging
Nonvolatile Memory
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218223 

Various new nonvolatile memory (NVM) technologies have emerged recently. Among
all the investigated new NVM candidate technologies, spin-torque-transfer
memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and
resistive random-access memory (ReRAM) are regarded as the most promising
candidates. As the ultimate goal of this NVM research is to deploy them into
multiple levels in the memory hierarchy, it is necessary to explore the wide
NVM design space and find the proper implementation at different memory
hierarchy levels from highly latency-optimized caches to highly
density-optimized secondary storage. While abundant tools are available as
SRAM/DRAM design assistants, similar tools for NVM designs are currently
missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM
performance, energy, and area estimation, which supports various NVM
technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is
successfully validated against industrial NVM prototypes, and it is expected to
help boost architecture-level NVM-related studies.

MODELING AND SIMULATION

Xie, L.; Davoodi, A. 
Post-Silicon Failing-Path Isolation Incorporating the Effects of Process
Variations
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218222 

This paper introduces a novel approach for isolating the failing paths for a
fabricated chip. Failing paths are defined as the paths violating the timing
constraint at the post-silicon stage, in the presence of process variations. To
achieve this goal, a framework is suggested in which, first, a large number of
statistically critical paths are extracted at the pre-silicon stage. These
paths are those with the highest probability to violate the timing and
therefore can form a ÒgoodÓ candidate set for failing paths. Next, a small set
of representative paths are selected from those in the candidate set. These
selected paths have their delays highly correlated with the delays of the
remaining statistically critical paths. By directly measuring the delays of
these selected representative paths at the post-silicon stage, the post-silicon
delays of the remaining candidate failing paths can be predicted accurately
that allows further isolating the failing paths for each fabricated chip.
Simulation results show that up to a few thousand candidate failing paths can
be accurately predicted using the post-silicon delays of less than 150
representative paths in the presence of more than 1000 independent process
parameter variations for the ISCAS89 benchmark circuits. For each fabricated
chip, the isolated failing paths are guaranteed to include all the ÒactualÓ
failing paths in the candidate set. The proportion of the ÒactualÓ nonfailing
paths from the isolated failing paths is shown to be very small.

Paik, S.; Han, I.; Kim, S.; Shin, Y. 
Clock Gating Synthesis of Pulsed-Latch Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218237 

Pulsed-latch circuits, in which latches are triggered by a short pulse, can
reduce power consumption as well as increasing performance; and they can
largely be designed using conventional computer-aided design tools. We explore
the automatic synthesis of clock-gating logic for pulsed-latch circuits in
which gating is implemented by enabling and disabling several pulse generators.
The key problem is to arrange that each group of latches contains physically
close latches, so that a short pulse from a pulse generator is delivered
safely, and to ensure that the latches in a group have similar Boolean gating
conditions because their clock is gated and ungated together. The resulting
gating conditions should be implemented using as little extra logic as
possible; for this purpose we rely on Boolean division, with an internal node
of existing logic being used as the divisor. The proposed clock gating
synthesis is assessed in 45-nm technology.

Chen, Q.; Weng, S.-H.; Cheng, C.-K. 
A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale
Time-Domain Circuit Simulation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218220 

Fast full-chip time-domain simulation calls for advanced numerical integration
techniques with capability to handle the systems with (tens of) millions of
variables resulting from the modified nodal analysis (MNA). General MNA
formulation, however, leads to a differential algebraic equation (DAE) system
with singular coefficient matrix, for which most of explicit methods, which
usually offer better scalability than implicit methods, are not readily
available. In this paper, we develop a practical two-stage strategy to remove
the singularity in MNA equations of large-scale circuit networks. A topological
index reduction is first applied to reduce the DAE index of the MNA equation to
one. The index-1 system is then fed into a systematic process to eliminate
excess variables in one run, which leads to a nonsingular system. The whole
regularization process is devised with emphasis on exact equivalence, low
complexity, and sparsity preservation, and is thus well suited to handle
extremely large circuits.

PHYSICAL DESIGN

Lee, M.-C.; Shi, Y.; Chang, S.-C. 
Efficient Wakeup Scheduling Considering Both Resource Usage and Timing Budget
for Power Gating Designs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218224 

Power gating has been a very effective way to reduce power leakage. Normally,
wakeup scheduling is required to control the turn-on times of sleep transistors
to limit the surge current during the wakeup process. In this paper, a voltage
sensor is adopted to compare the virtual ground voltage with the predesigned
reference voltages, and use the result to determine the turn-on times of sleep
transistors. We then propose a novel wakeup scheduling formulation that
considers the tradeoff between wakeup times and hardware resources incurred by
the voltage sensor. To address this problem, on the one hand, an efficient
algorithm is proposed to find a wakeup scheduling with minimum wakeup time
under the resource constraint. On the other hand, the algorithm to find a
wakeup scheduling with minimum resource usage under the timing budget is
presented. Experimental results show that with little increase in wakeup times,
our algorithm can achieve significant hardware resource reduction for power
gating designs.

Liu, C.-H.; Kuo, S.-Y.; Lee, D. T.; Lin, C.-S.; Weng, J.-H.; Yuan, S.-Y. 
Obstacle-Avoiding Rectilinear Steiner Tree Construction: A Steiner-Point-Based
Algorithm
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218228 

For the obstacle-avoiding rectilinear Steiner minimal tree (OARSMT) problem, we
present a Steiner-point-based algorithm that achieves the best practical
performance among existing heuristics. We first propose a new concept of
Steiner point locations, creating a linear-space routing graph with
satisfactory Steiner point candidates to resolve the bottleneck of most
existing heuristics. Then, we propose a Steiner-point-based framework to yield
a solution, which is close to the key to the handling of the OARSMT problem.
Experimental results show that this algorithm achieves excellent solution
quality and speed performance at the same time. We also extend the
Steiner-point-based framework to the obstacle-avoiding preferred direction
Steiner tree problem with a good performance.

SYSTEM-LEVEL DESIGN

Majumder, T.; Borgens, M. E.; Pande, P. P.; Kalyanaraman, A. 
On-Chip Network-Enabled Multicore Platforms Targeting Maximum Likelihood
Phylogeny Reconstruction
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218233 

In phylogenetic inference, which aims at finding a phylogenetic tree that best
explains the evolutionary relationship among a given set of species,
statistical estimation approaches such as maximum likelihood (ML) and Bayesian
inference provide more accurate estimates than other nonstatistical approaches.
However, the improved quality comes at a higher computational cost, as these
approaches, even though heuristic driven, involve optimization over
multidimensional real continuous space. The number of possible search trees in
ML is at least exponential, thereby making runtimes on even modest-sized
datasets to clock up to several million CPU hours. Evaluation of these trees,
involving node-level likelihood vector computation and branch-length
optimization, can be partitioned into tasks (or kernels), providing the
application with the potential to benefit from hardware acceleration. The range
of hardware acceleration architectures tried so far offer limited degree of
fine-grain parallelism. Network-on-chip (NoC) is an emerging paradigm that can
efficiently support integration of massive number of cores on a chip. In this
paper, we explore the design and performance evaluation of 2-D and 3-D NoC
architectures for RAxML, which is one of the most widely used ML software
suites. Specifically, we implement the computation kernels of the top three
functions consuming more than 85% of the total software runtime. Simulations
show that through appropriate choice of NoC architecture, and novel core
design, allocation and placement strategies, our NoC-based implementation can
achieve individual function-level speedups of 390x to 847x, speed up the
targeted kernels in excess of 6500x, and provide end-to-end runtime reductions
up to 5x over state-of-the-art multithreaded software.

Shen, H.; Hamayun, M.-M.; Petrot, F. 
Native Simulation of MPSoC Using Hardware-Assisted Virtualization
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218234 

Integration of multiple heterogeneous processors into a single system-on-a-chip
is a clear trend in embedded devices. Designing and verifying these devices
requires high-speed and easy-to-build simulation platforms. Among the software
simulation approaches, native simulation is a good candidate since the embedded
software is executed natively on the host machine, and no instruction set
simulator development effort is necessary. However, existing native simulation
approaches are such that the simulated software shares the memory space of the
modeled hardware modules and the host operating system, making impractical the
support of legacy code running on the target platform. To overcome this issue
seldom mentioned in the literature, we propose the addition of a transparent
address space translation layer to separate the target address space from the
host simulator one. For this, we exploit the hardware-assisted virtualization
technology now available on most general-purpose processors. Experiments show
that this solution does not degrade the native simulation speed, while keeping
the ability to accomplish software performance evaluation.

TEST

Natarajan, V.; Choi, H. W.; Banerjee, A.; Sen, S.; Chatterjee, A.; Srinivasan,
G.; Taenzler, F.; Bhattacharya, S. 
Low Cost EVM Testing of Wireless RF SoC Front-Ends Using Multitones
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218236 

Error-vector-magnitude (EVM) is a system level specification that determines
the overall modulation quality and exhibits strong correlation to the inherent
nonidealities of a radio frequency (RF) system. In production testing, EVM
tests incur significant cost due to the large number of symbols required to
ensure test quality. In our approach, EVM is decomposed into its deterministic
(due to static impairments: IQ mismatch, gain, AM-AM and AM-PM) and random (due
to dynamic impairments: VCO phase noise, thermal noise) components. The static
impairments are computed from the device under test (DUT) response to an
optimized multitone test input. The dynamic impairments are computed using
signal processing algorithms from the DUT test response to the same test input.
The EVM of the RF system is then derived from the computed static and dynamic
impairments, respectively. Experimental results show that significant reduction
in test time is possible without compromising EVM test quality.

Chen, M.; Orailoglu, A. 
On Diagnosis of Timing Failures in Scan Architecture
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218221 

Excessive test mode power-ground noise in nanometer scale chips causes large
delay uncertainties in scan chains, resulting in a highly elevated rate of
timing failures. The hybrid timing violation types in scan chains, compounded
by their possibly intermittent manifestations, invalidate the conventional
assumptions in scan chain fault behavior, significantly increasing the
ambiguity and difficulty in diagnosis. In this paper, we propose a novel
methodology to identify the root cause of scan chain timing failures. The
proposed work addresses the challenge of diagnosing multiple permanent or
intermittent timing faults in scan chains and the associated clock trees, which
closely approximate the realistic failure mechanisms observed in silicon.
Instead of relying on fault simulation that is incapable of approximating the
intermittent fault manifestation, the proposed technique characterizes the
impact of timing faults by analyzing the phase movement of scan patterns.
Extracting fault-sensitive statistical features of phase movement information
provides strong signals for the precise identification of fault locations and
types. The manifestation probability of each fault is furthermore computed
through a mathematical transformation framework which accurately models the
behavior of multiple faults as a Markov chain. The identification of failing
scan cells enables a further examination of the possible delay defects in the
scan clock buffers, which ascertains the possible root causes of the observed
scan chain failures. The proposed scheme characterizes the timing impact of the
defective clock buffers by extracting the change in the delay distribution of
the clock paths, enabling the effective pruning of unrealistic fault hypotheses
that would result in highly deviant timing behavior. Simulation results have
confirmed that the proposed methodology can yield highly accurate diagnosis
results for complex fault manifestations.

Stratigopoulos, H.-G. 
Test Metrics Model for Analog Test Development
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218239 

The trend nowadays is to integrate more and more functionalities into a single
chip. This, however, has serious implications in the testing cost. Especially
for the analog circuits, the testing cost tends to be very high, despite the
fact they occupy a small fraction of the area of the chip. Therefore, to reduce
this cost, there is a high interest to replace the most demanding tests by
alternative measurements. However, such replacement may inadvertently result in
accepting faulty chips or rejecting functional chips. In this paper, we present
a method for estimating such test metrics in the general scenario where a
single test is replaced by a single measurement. The method is based on the
extreme value theory and the statistical blockade algorithm. It can be readily
applied during the test development phase to obtain estimates of the test
metrics and corresponding confidence intervals with parts-per-million
precision. For this purpose, the method requires a small number of selective
simulations that we can afford to run in practice.

SHORT PAPERS

Poikonen, J. H.; Lehtonen, E.; Laiho, M. 
On Synthesis of Boolean Expressions for Memristive Devices Using Sequential
Implication Logic
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218235 

We determine an explicit procedure for representing any Boolean expression in a
recursive form which can be realized using memristive devices, and demonstrate
how the truth value of any Boolean expression can be determined using no more
than two computing memristive devices. We present an algorithm which can be
used to significantly reduce the number of implications required for
representing a specific Boolean expression, and consider device-specific
benefits in terms of reducing the lengths of computational sequences.

Liang, Y.-Y.; Kuo, T.-Y.; Wang, S.-H.; Mak, W.-K. 
ALMmap: Technology Mapping for FPGAs With Adaptive Logic Modules
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218225

Modern field programmable gate arrays like Altera's Stratix Series have adopted
the adaptive logic module (ALM) structure due to its potential performance and
area advantages. An ALM can implement a single logic function or can be
fractured into two smaller lookup tables (LUTs). In this paper, we propose an
ALM mapping algorithm, ALMmap, for area minimization with bounded depth. We
revamp the traditional iterative cut-based mapping flow and introduce a
procedure for bounded depth mapping generation with dynamic area recovery that
effectively combines cut selection, mapping, and area recovery together. In
addition, we introduce a new procedure for computing cut set for ALM
minimization under a depth constraint. The notion of area flow which has been
used successfully for cut selection to reduce LUT count is revised for cut
selection to reduce ALM count. ALMmap obtains depth optimal solutions that are
25.6% and 11.6% smaller, on average, than those produced by a classical mapper
and WireMap, respectively.

Smy, T.; Gunupudi, P.  Robust Simulation of Opto-Electronic Systems by
Alternating Complex Envelope Representations
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6218238 

The increased use of optics in information processing and transmission systems
motivates the development of self-consistent opto-electronic transient
simulators. This letter presents a technique to significantly improve the DC
and transient convergence behavior of a modified nodal analysis based
optoelectronic simulation framework by allowing optical device models to be
ÒswitchedÓ between a linear formulation using real and imaginary fields to a
non-linear magnitude and phase representation. Configurable multilayer filters
and optical ring modulators are used as examples to demonstrate the
effectiveness of this approach in obtaining robust simulation convergence.