September 2012 Newsletter 
Placing you one click away from the best new CAD research!


Call for Nominations for Editor-in-Chief of the IEEE Transactions on CAD

The IEEE Council on Design Automation (CEDA) invites nominations for the
position of Editor-in-Chief (EiC) for the IEEE Transactions on Computer-Aided
Design (TCAD) of Integrated Circuits and Systems. Self-nominations are
permitted.  The term of duty of the new EiC begins on January 1, 2014 and ends
December 31, 2015. A period of approximately three months prior to the
beginning of the term will be spent on enabling a smooth transition from the
current EiC to the new EiC. 

A Selection Committee will evaluate all nominations and make recommendations to
the CEDA Executive Committee for appointment. After the EiC is chosen, the
Selection Committee will identify the Deputy EiC, in agreement with the newly
appointed EiC. The EiC will then constitute an own Editorial Board of Associate
Editors. 

Nominations should be sent electronically to Kristin Steen
(admin@ieee-ceda.com) and should be received by June 27, 2013 to ensure full
consideration. The nomination package should include: 
*	Name, postal address, email address and telephone numbers of the
and the publication for which the candidate is nominated.
*	Written confirmation that the candidate intends to accept a nomination
for the EiC position.
*	Curriculum vitae of the nominee, including the track record with IEEE
service and publications details, with particular reference to TCAD. 
*	Position statement of the nominee indicating the view of the candidate
on the future of IEEE TCAD, the opportunities and challenges, and any plans
that the candidate wants to pursue.

The nominee must be a member in good standing of the IEEE. Desirable qualities
of the candidates include: An established track record of technical
accomplishments, leadership, integrity and ethical standards, demonstrable
organizational and management skills, and an energetic eagerness to continue
moving the journal forward toward visibly higher levels of accomplishment and
contributions to the relevant technical community. 

REGULAR PAPERS

EMBEDDED SYSTEMS

Guo, Y. ; Zhuge, Q. ; Hu, J. ; Yi, J. ; Qiu, M. ; Sha, E.H.-M.
Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516677

Scratch pad memories (SPM) are attractive alternatives for caches on multicore
systems since caches are relatively expensive in terms of area and energy
consumption. The key to effectively utilizing SPMs on multicore systems is the
data placement algorithm. In this paper, two polynomial time algorithms,
regional data placement for multicore (RDPM) and regional data placement for
multicore with duplication (RDPM-DUP), have been proposed to generate
near-optimal data placement with minimum total cost. There is only one copy for
each data in RDPM, while RDPM- DUP allows data duplication. Experimental
results show that the proposed RDPM algorithm alone can reduce the time cost of
memory accesses by 32.68% on average compared with existing algorithms. With
data duplication, the RDPM-DUP algorithm further reduces the time cost by
40.87%. In terms of energy consumption, the proposed RDPM algorithm with
exclusive copy can reduce the total cost by 33.47% on average. When RDPM-DUP is
applied, the improvement increases up to 38.15% on average.

EMERGING TECHNOLOGIES

Amy, M. ; Maslov, D. ; Mosca, M. ; Roetteler, M.
A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516700

We present an algorithm for computing depth-optimal decompositions of logical
operations, leveraging a meet-in- the-middle technique to provide a significant
speedup over simple brute force algorithms. As an illustration of our method,
we implemented this algorithm and found factorizations of commonly used quantum
logical operations into elementary gates in the Clifford+T set. In particular,
we report a decomposition of the Toffoli gate over the set of Clifford and T
gates. Our decomposition achieves a total T-depth of 3, thereby providing a 40%
reduction over the previously best known decomposition for the Toffoli gate.
Due to the size of the search space, the algorithm is only practical for small
parameters, such as the number of qubits, and the number of gates in an optimal
implementation.

HIGH-LEVEL SYNTHESIS

Sarbishei, O. ; Radecka, K. 
On the Fixed-Point Accuracy Analysis and Optimization of Polynomial Specifications
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516678

Fixed-point accuracy analysis and optimization of polynomial data-flow graphs
with respect to a reference model is a challenging task in many digital signal
processing applications. Range and precision analysis are two important steps
of this process to assign suitable integer and fractional bit-widths to the
fixed-point variables and constant coefficients in a design such that no
overflow occurs and a given error bound on maximum mismatch (MM) or
mean-square-error (MSE) and signal-to-quantization-noise ratio (SQNR) is
satisfied. This paper explores efficient optimization algorithms based on
robust analyses of MM and MSE/SQNR for fixed-point polynomial data-flow graphs.
Experimental results illustrate the robustness of our analyses and the
efficiency of the optimization algorithms compared to previous work.

MODELING AND SIMULATION

Kim, D. ; Ciesielski, M. ; Yang, S. 
MULTES: Multilevel Temporal-Parallel Event-Driven Simulation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516614

Multilevel temporal-parallel event-driven simulation is a new radically
different approach to simulation of designs described in Verilog HDL. It is
based on a concept of time-parallel simulation applied to gate-level timing
simulation. The simulation is performed in two steps: 1) fast reference
simulation that runs on a higher, reference- level design model (typically RTL)
and saves the design state at predetermined checkpoints; and 2) target
simulation, which runs on a lower, gate-level model and distributes the
simulation run slices to individual simulators. The paper addresses a number of
important issues that make this approach practical: 1) finding initial state
for each simulation slice; 2) resolving initial state mismatches; and 3)
handling designs with multiple asynchronous clocks. Experimental results
performed on industrial designs demonstrate the validity and efficiency of the
method in terms of its performance and the debugging efficiency.

Liu, H. ; Wong, N. 
Autonomous Volterra Algorithm for Steady-State Analysis of Nonlinear Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516712

We present a novel algorithm, named autonomous Volterra (AV), that achieves
efficient steady-state analysis of nonlinear circuits. With elegant analytic
forms and availability of efficient solvers, AV constitutes a competitive
steady-state algorithm besides the two mainstreams, namely, shooting Newton
(SN) and harmonic balance (HB).  Nonlinear systems are first captured in
nonlinear differential algebraic equations, followed by expansion into linear
Volterra subsystems. A key step of steady-state analysis lies in modeling each
Volterra subsystem with autonomous nonlinear inputs. The steady-state solution
of these subsystems then proceeds with a series of Sylvester equation solves,
completely avoiding the guesses of initial condition and time stepping as in
SN, as well as the uncertain length of Fourier series as in HB. Error control
in AV is also straightforward by monitoring the norms of the Sylvester equation
solutions. We further demonstrate that AV is readily parallelizable with
superior scalability toward large-scale problems.

Abrishami, H. ; Hatami, S. ; Pedram, M. 
Design and Multicorner Optimization of the Energy-Delay Product of CMOS
Flip-Flops Under the Negative Bias Temperature Instability Effect
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516601

With the CMOS transistors being scaled to 28 nm and lower, negative bias
temperature instability (NBTI) has become a major concern due to its impact on
pMOS transistor aging process and the corresponding reduction in the long-term
reliability of CMOS circuits. This paper investigates the effect of NBTI
phenomenon on the setup and hold times of CMOS flipÐflops. First, it is shown
that the NBTI effect tightens the setup and hold timing constraints imposed on
the flipÐflops in the design. Second, an efficient algorithm is introduced for
characterizing codependent setup and hold time contours of the flipÐflops.
Third, a multicorner optimization technique, which relies on mathematical
programming to find the best transistor sizes, is presented to minimize the
energy-delay product of the flipÐflops under the NBTI effect. Finally, the
proposed optimization technique is applied to true single-phase clock
flipÐflops to demonstrate its effectiveness.

PHYSICAL DESIGN

Huang, T. ; Young, E.F.Y. 
ObSteiner: An Exact Algorithm for the Construction of Rectilinear Steiner
Minimum Trees in the Presence of Complex Rectilinear Obstacles
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516642

In this paper, we present ObSteiner, an exact algorithm for the construction of
obstacle-avoiding rectilinear Steiner minimum trees (OARSMTs) among complex
rectilinear obstacles. This is the first paper to propose a geometric approach
to optimally solve the OARSMT problem among complex obstacles. The optimal
solution is constructed by the concatenation of full Steiner trees among
complex obstacles, which are proven to be of simple structures in this paper.
ObSteiner is able to handle complex obstacles, including both convex and
concave ones. Benchmarks with hundreds of terminals among a large number of
obstacles are solved optimally in a reasonable amount of time.

Kozik, A. 
Fully Dynamic Evaluation of Sequence Pair
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516638

In the electronic design automation field, as well as in other areas, problem
instances and solutions are often subject to discrete changes. The foundational
significance of efficient updates of the criterion value after dynamic updates,
instead of recomputing it from scratch each time, has attracted a lot of
research. In this paper, motivated by the significance of the sequence pair
(SP) representation for floorplanning, we develop a fully dynamic algorithm of
SP evaluation, that efficiently updates a criterion value after insertions and
deletions of SP elements and after modifications of element weights. Our result
is based on a new data structure for the predecessor problem, which maintains
the whole history of its dynamic modifications, and the path-compression
technique from the union-find problem, which efficiently support predecessor
queries. Numerical experiments showed that our algorithm exhibits linear-time
behavior and considerably reduces the time of SP evaluation, compared to other
approaches.

Athikulwongse, K. ; Yang, J.-S. ; Pan, D.Z. ; Lim, S.K. 
Impact of Mechanical Stress on the Full Chip Timing for Through-Silicon-Via-based 3-D ICs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516720

In this paper, we study the impact of through-silicon-via (TSV) and shallow
trench isolation (STI) stress on the timing variations of 3-D IC. We also
propose the first systematic TSV-STI-stress-aware timing analysis and show how
to optimize layouts for better performance. First, we generate a stress contour
map with an analytical radial stress model for TSV. We also develop a stress
model for STI from finite element analysis results. Then, depending on
geometric relation between TSVs, STI, and transistors, the tensile and
compressive stresses are converted to hole and electron mobility variations.
Mobility-variation-aware cell library and netlist are generated and
incorporated into an industrial engine for timing analysis of 3-D IC. We
observe that TSV stress and STI stress interact with each other, and rise and
fall time react differently to stress and relative locations with respect to
both TSVs and STIs.  Overall, TSV-STI-stress-induced timing variations can be
as much as +/-15% at the cell level. Thus, as an application to layout
optimization, we exploit the stress-induced mobility enhancement to improve
performance of 3-D ICs. We show that stress-aware layout perturbation could
reduce cell delay by up to 23.37% and critical path delay by 6.67% in our test
case.

Chakraborty, A. ; Pan, D.Z. 
Skew Management of NBTI Impacted Gated Clock Trees 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516611

Negative bias temperature instability (NBTI) has emerged as the dominant
failure mechanism for PMOS devices in nanometer integrated circuit (IC)
designs, thus limiting their lifetime. There are several existing research
works that mitigate impact of NBTI on gate delay and reliability. However, its
impact on one of the most important components of modern IC designÑthe clock
treeÑhas not been researched enough. Clock gating impacts the extent of
NBTI-induced Vth degradation of clock buffers leading to nonuniform NBTI
degradation and, thus, increased clock skew. In this paper, we propose a
practical design-time technique of modifying the clock gating implementation by
selecting NAND or NOR gate as output stage of integrated clock gating cells
with the objective of minimizing NBTI-induced clock skew. This selection
intelligently modulates the signal probability and delay equations of clock
signal paths with no extra hardware penalty. We formulate the skew minimization
problem as an integer linear program which determines the optimal NAND or NOR
assignment of clock gating buffer.  Experimental results demonstrate the
effectiveness of our method as the NBTI-induced clock skew is reduced by more
than 74% compared to the traditional method. The impact of voltage and
temperature variation on the proposed technique was analyzed and we observed
reduced but still significant reduction in clock skew under variation as
compared to the traditional clock gating technique.

SYSTEM-LEVEL DESIGN

Yeh, Y.-F. ; Lin, H.-C. ; Huang, C.-Y. 
An Ultrasynchronization Checking Method With Trace-Driven Simulation for Fast and Accurate MPSoC 
Virtual Platform Simulation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516693

Efficiency and accuracy are two critical concerns in multiprocessor system on
chip (MPSoC) virtual platform simulation. Traditional simulation approaches
that achieve high efficiency usually sacrifice accuracy. On the other hand, the
cycle-accurate simulation algorithms are generally slow in speed. In this
paper, we propose an ultrasynchronization checking method with an efficient
trace-driven simulation mechanism that can not only improve simulation speed,
but also maintain simulation accuracy. We build a SystemC-based MPSoC virtual
platform to evaluate the effectiveness of our approach. The experimental
results show that our proposed simulation scheme can improve simulation speed
up to 156x over the traditional clock-step simulation method. Furthermore, our
proposed method can also guarantee the cycle-accurate simulation result.

Jang, W. ; Pan, D.Z. 
Chemical-Mechanical Polishing-Aware Application-Specific 3D NoC Design
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516721

Three-dimensional (3D) integration with through-silicon vias (TSVs) is
promising in the integration of many cores into a single chip. Network-on-chip
(NoC) can efficiently manage the complicated 3D interconnections. However,
irregular and dense TSV arrays used as vertical links in 3D NoC cause severe
TSV height variation during silicon- thinning and chemical-mechanical polishing
(CMP) processes. It may lead to TSV bonding failure between silicon layers. In
this paper, we propose the first CMP-aware application-specific 3D NoC design
that minimizes such TSV height variation and thus reduces the bonding failure,
and meanwhile optimizes conventional NoC design objectives such as hop count,
wirelength, power consumption, and area. Our 3D NoC design assigns cores to
proper silicon layers, determines the 3D NoC topology, allocates routing paths,
and then floorplans all cores, routers, and TSV arrays in a CMP-aware manner.
The key idea behind this 3D NoC design flow is to determine the CMP-aware 3D
NoC topology where TSV arrays with low and uniform metal density are inserted
between adjacent layers.  Experimental results show that our CMP-aware 3D NoC
design achieves, on average, 17.9% lower TSV height variation, 15% lower hop
count, 2.3% shorter total wirelength, and 7.8% lower power consumption than the
previous state-of-the-art 3D NoC designs.

VERIFICATION

Hertz, S. ; Sheridan, D. ; Vasudevan, S. 
Mining Hardware Assertions With Guidance From Static Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516599

We present GoldMine, a methodology for generating assertions automatically in
hardware. Our method involves a combination of data mining and static analysis
of the register transfer level (RTL) design. The RTL design is first simulated
to generate data about the design's dynamic behavior. The generated data is
then mined for Òcandidate assertionsÓ that are likely to be invariants. The
data mining algorithm is a decision-tree-based supervised learning algorithm.
These candidate assertions are then passed through a formal verification engine
to filter out the spurious candidates. The assertions that are attested as true
by the formal engine are system invariants. These are then evaluated by a
process of designer ranking that is provided as feedback to the data mining
engine. We demonstrate the scalability of GoldMine by showing assertion
generation of the RTL of Sun's OpenSparc T2 many-threaded processor. Our
results show that GoldMine can generate complex, high coverage assertions for
sequential as well as combinational designs in RTL, thereby minimizing human
effort in this process. GoldMine assertions distill the random input stimulus
space and can be used for calibrating directed tests. They can be used in a
regression test suite of an evolving RTL. They are also useful in providing
differing perspectives from the designer, as well as hints to designers for
manually writing assertions.

SHORT PAPERS

Khlopotine, A.B. ; Jandhyala, V. ; Kirkpatrick, D. 
A Variant of Parallel Plane Sweep Algorithm for Multicore Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516597

Parallel algorithms used in VLSI physical design bring significant challenges
for their efficient and effective design and implementation. The rectangle
intersection problem is a subset of the plane sweep problem, a topic of
computational geometry and a component in design rule checking, parasitic
resistanceÐ capacitance extraction, and mask processing flows. A variant of a
plane sweep algorithm that is embarrassingly parallel and therefore easily
scalable on multicore machines and clusters, while exceeding the best-known
parallel plane sweep algorithms on real-world tests, is presented in this
letter.

Chang, C.-Y. ; Liao, K.-Y. ; Hsu, S.-C. ; Li, J.C.-M. ; Rau, J.-C. 
Compact Test Pattern Selection for Small Delay Defect
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516596

This letter proposes an algorithm that selects a small number of test patterns
for small delay defects from a large N- detect test set. This algorithm uses
static upper and lower bound analysis to quickly estimate the sensitized path
length so that the central processing unit (CPU) time can be reduced. By
ignoring easy faults, only a partial fault dictionary, instead of a complete
fault dictionary, is built for test pattern selection. Experimental results on
large International Test Conference benchmark circuits show that, with very
similar quality, the selected test set is 46% smaller and the CPU time is 42%
faster than that of timing-aware automated test pattern generation (ATPG). With
the proposed selection algorithm, small delay defect test sets are no longer
very expensive to apply.