August 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-08.txt 

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Eick, M.; Graeb, H. E. 
MARS: Matching-Driven Analog Sizing
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238393

This paper presents a new approach for automatic computation of matching
constraints for analog sizing. The method automatically analyzes a circuit
netlist to generate sizing constraints. These sizing constraints are considered
during a subsequent numerical optimization. It is the first method that
computes symmetry constraints for sizing, based on the hierarchical structure
of the circuit and a qualitative signal flow. As a unique feature, the method
is capable of handling multiple signal and feedback paths, leading to multiple
symmetry axes. It can be applied to linear and nonlinear circuits and any
available sizing algorithm. Experimental results show that the runtime of
analog sizing, as well as of result quality, are greatly improved.

EMBEDDED SYSTEMS

Qin, X.; Wang, W.; Mishra, P. 
TCEC: Temperature and Energy-Constrained Scheduling in Real-Time Multitasking Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238400

The ongoing scaling of semiconductor technology is causing severe increase of
on-chip power density and temperature in microprocessors. This urgently
requires both power and thermal management during system design.  In this
paper, we propose a model checking-based technique using extended timed
automata to solve the processor frequency assignment problem in a temperature
and energy-constrained multitasking system. We also develop a polynomial
time-approximation algorithm to address the state-space explosion problem
caused by symbolic model checker. Our approximation scheme is guaranteed to not
generate any false-positive answer, while it may return false-negative answer
in rare cases. Our method is universally applicable since it is independent of
any system and task characteristics. Experimental results demonstrate the
usefulness of our approach.

HIGH-LEVEL SYNTHESIS

Kong, B. Y.; Park, I.-C. 
FIR Filter Synthesis Based on Interleaved Processing of Coefficient Generation and Multiplier-Block 
Synthesis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238399

An efficient filter synthesis algorithm is proposed to minimize the number of
adders required in the design of finite- impulse response filters. Given a
specification, a filter can be synthesized by conducting two main steps:
coefficient generation and multiplier-block synthesis. While most of previous
works have focused on only one of the steps, the proposed algorithm integrates
the two steps in an interleaved manner so as to take into account the effect of
multiplier-block synthesis in generating coefficients. In addition, the concept
of sensitivity is developed to reduce the complexity of computing the variable
ranges of coefficients. Experimental results show that the proposed algorithm
outperforms previous algorithms in terms of adder cost and takes a relatively
short computation time.

MODELING AND SIMULATION

Weng, S.-H.; Chen, Q.; Cheng, C.-K. 
Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method With Adaptive Control
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238402

We propose an explicit numerical integration method based on matrix exponential
operator for transient analysis of large-scale circuits. Solving the
differential equation analytically, the limiting factor of maximum time step
changes largely from the stability and Taylor truncation error to the error in
computing the matrix exponential operator. We utilize Krylov subspace
projection to reduce the computation complexity of matrix exponential operator.
We also devise a prediction-correction scheme tailored for the matrix
exponential approach to dynamically adjust the step size and the order of
Krylov subspace approximation. Numerical experiments show the advantages of the
proposed method compared with the implicit trapezoidal method.

PHYSICAL DESIGN

Jung, M.; Mitra, J.; Pan, D. Z.; Lim, S. K. 
TSV Stress-Aware Full-Chip Mechanical Reliability Analysis and Optimization for 3-D IC
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238396

In this paper, we propose an efficient and accurate full-chip thermomechanical
stress and reliability analysis tool and design optimization methodology to
alleviate mechanical reliability issues in 3-D integrated circuits (ICs).
First, we analyze detailed thermomechanical stress induced by through-silicon
vias in conjunction with various associated structures such as landing pad and
dielectric liner. Then, we explore and validate the linear superposition
principle of stress tensors and demonstrate the accuracy of this method against
detailed finite element analysis simulations.  Next, we apply this linear
superposition method to full-chip stress simulation and a reliability metric
named the von Mises yield criterion. Finally, we propose a design optimization
methodology to mitigate the mechanical reliability problems in 3-D ICs. Our
numerical experimental results demonstrate the effectiveness of the proposed
methodology.

Chen, J.; Zhu, W. 
An Analytical Placer for VLSI Standard Cell Placement
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238405

Placement is the process of determining the exact locations of circuit elements
within a chip. It is a crucial step in very large scale integration (VLSI)
physical design, because it affects routability, performance, and power
consumption of a design. In this paper, we develop a new analytical placer to
solve the VLSI standard cell placement problem. The placer consists of two
phases, multilevel global placement (GP) and detailed cell placement (DP). In
the stage of GP, during the clustering stage, we use a nonlinear programming
technique and a best-choice clustering algorithm to take a global view of the
whole netlist and placement information, and then use an iterative local
refinement technique during the declustering stage to further distribute the
cells and reduce the wirelength. In the stage of DP, we develop a fast
legalization algorithm to make the solution by global placement legal and use a
cell order polishing to improve the legal solution. The proposed algorithm is
tested on the IBM standard cell benchmark circuits and Peko suites.
Experimental results show that our placer obtains high-quality results in a
reasonable running time.

Zhao, X.; Tolbert, J. R.; Mukhopadhyay, S.; Lim, S. K. 
Variation-Aware Clock Network Design Methodology for Ultralow Voltage (ULV) Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238404

This paper presents a design methodology for robust and low-energy clock
networks for ultralow voltage (ULV) circuits. We show that both clock slew and
skew play important roles in achieving high maximum operating frequency Fmax
and low clock energy in ULV circuits. In addition, clock networks in ULV
circuits are highly sensitive to process variations. We propose a
variation-aware methodology that controls both clock skew and slew to maximize
Fmax and minimize clock power. In addition, we implement dynamic programming
(DP)-based ULV clock routing and buffering methods (deferred merging and
embedding) for deterministic and statistical conditions.  Experimental results
show that our clock network design method achieves lower energy (more than 20%
savings) at comparable or even higher Fmax compared with the existing methods.

SYSTEM-LEVEL DESIGN

Vitkovskiy, A.; Soteriou, V.; Nicopoulos, C. 
A Dynamically Adjusting Gracefully Degrading Link-Level Fault-Tolerant Mechanism for NoCs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238398

The rapid scaling of silicon technology has enabled massive transistor
integration densities. Nanometer feature sizes, however, are marred by
increasing variability and susceptibility to wear-out. Billion-transistor
designs, such as chip multiprocessors (CMPs), are especially vulnerable to
defects. CMPs rely on a network-on-chip for all their communication needs. A
single link failure within this on-chip fabric can impede, halt, or even
deadlock, intertile communication, which can render the entire chip
multiprocessor useless. In this paper, we present a technique capable of
handling very large numbers of permanent wire failures that occur in parallel
links either at manufacture- time or at runtime (dynamically). As opposed to
marking an entire parallel link as faulty, whenever some wires fail, the
proposed methodology employs these partially-faulty links (PFLs) to continue
the transfer of informationÑ albeit at a gracefully degraded modeÑin order to
maintain network connectivity. Furthermore, the presented technique can
designate PFLs as fully-faulty when several wires fail, by utilizing
appropriate routing algorithms that bypass nonoperational links, while still
maintaining load-balance in the vicinity of PFLs. The proposed scheme employs
architectural support within the on-chip routers to detect link failures and
enable reconfiguration at the granularity of individual wires. Hardware
synthesis confirms the low-cost nature of the proposed architecture, and
full-system simulations using both synthetic network traffic and real workloads
demonstrate its efficacy.

Le, H. M.; Grosse, D.; Drechsler, R. 
Automatic TLM Fault Localization for SystemC
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238395

To meet today's time-to-market demands, catching bugs as early as possible
during the design of a system is essential. In electronic system level design
where SystemC has become the de-facto standard due to transaction level
modeling (TLM), many approaches for verification have been developed. They
determine an error trace that demonstrates the difference between the required
and the actual behavior of the system. However, the subsequent debugging
process is very time-consuming, in particular due to TLM-related faults caused
by complex process synchronization and concurrency. In this paper, we present
an automatic fault localization approach for SystemC TLM designs. We target
typical TLM faults, such as accidentally swapped blocking and nonblocking
transactions, erroneous event notification, or incorrect transaction data. The
approach determines parts of the design that can be changed such that the
intended behavior of the design is obtained by removing the contradiction given
by the error trace. Single, as well as multiple faults, is considered.
Techniques based on bounded model checking are used to find the faulty parts.
We demonstrate the quality of our approach by several experiments. As shown in
the experiments, the fault locations are identified very fast and hence a
significant acceleration for the design of SystemC TLM models is achieved.

TEST

Liu, X.; Xu, Q. 
On Signal Selection for Visibility Enhancement in Trace-Based Post-Silicon Validation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238403

Today's complex integrated circuit designs increasingly rely on post-silicon
validation to eliminate bugs that escape from pre-silicon verification. One
effective silicon debug technique is to monitor and trace the behaviors of the
circuit during its normal operation. However, due to the associated overhead,
designers can only afford to trace a small number of signals in the design.
Selecting which signals to trace is therefore a crucial issue for the
effectiveness of this technique. This paper proposes an automated trace signal
selection strategy with a new probability-based evaluation metric, which is
able to dramatically enhance the visibility in post-silicon validation.
Experimental results on benchmark circuits show that the proposed technique is
more effective than existing solutions.

Chung, J.; Xiong, J.; Zolotov, V.; Abraham, J. A. 
Testability-Driven Statistical Path Selection
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238392

In the face of large-scale process variations, statistical timing methodology
has advanced significantly over the last few years, and statistical path
selection takes advantage of it in at-speed testing. In deterministic path
selection, the separation of path selection and test generation is known to
require time consuming iteration between the two processes. This paper shows
that in statistical path selection, this is not only the case, but also the
quality of results can be severely degraded even after the iteration. To deal
with this issue, we consider testability in the first place by integrating a
satisfiability (SAT) solver, and this necessitates a new statistical path
selection method. We integrate the SAT solver in a novel way that leverages the
conflict analysis of modern SAT solvers, which provides more than 4X speedup
without special optimizations of the SAT solver for this particular
application. Our proposed method is based on a generalized path criticality
metric whose properties allow efficient pruning. Our experimental results show
that the proposed method achieves 47% better quality of results on average, and
up to 361X speedup compared to statistical path selection followed by test
generation.

SHORT PAPERS

Shen, S.; Qin, Y.; Wang, K.; Pang, Z.; Zhang, J.; Li, S. 
Inferring Assertion for Complementary Synthesis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238401

Complementary synthesis can automatically synthesize the decoder circuit of an
encoder. However, its user needs to manually specify an assertion on some
configuration pins to prevent the encoder from reaching the nonworking states.
To avoid this tedious task, this paper propose an automatic approach to infer
this assertion, by iteratively discovering and removing cases without decoders.
To discover all decoders that may exist simultaneously under this assertion, a
second algorithm based on functional dependency is proposed to decompose $BBR$,
the Boolean relation that uniquely determines the encoder's input, into all
possible decoders. To help the user select the correct decoder, a third
algorithm is proposed to infer each decoder's precondition formula, which
represents those cases that lead to this decoder's existence. Experimental
results on several complex encoders indicate that our algorithm can always
infer assertions and generate decoders for them. Moreover, when multiple
decoders exist simultaneously, the user can easily select the correct one by
inspecting their precondition formulas.

Foreman, E. A.; Habitz, P. A.; Cheng, M.-C.; Visweswariah, C. 
A Novel Method for Reducing Metal Variation With Statistical Static Timing Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238394

Process variation continues to increase with new technologies. With the advent
of statistical static timing analysis (SSTA), multiple independent sources of
variation can be modeled. This paper proposes a novel technique to reduce
variability of metal process variation in SSTA. This novel method maximizes
sensitivity cancellation to minimize variability. The developed methodology is
simulated with SSTA in 65-nm technology and shows a reduction in variability.

Adir, A.; Nahir, A.; Ziv, A. 
Concurrent Generation of Concurrent Programs for Post-Silicon Validation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238397

The continuing trend toward increased parallelism in processor design can be
seen in both the growing number of processor cores per system and in on-core
hardware mechanisms that assist parallelism, such as multithreading and cache
hierarchies. This complexity exacerbates the problem of ensuring the functional
correctness of such hardware systems. The growing importance of post-silicon
validation is leading to an emerging type of parallel application, namely, the
hardware exerciser. We describe a method for exercising parallel hardware by
generating pseudorandom concurrent test programs. The test generation is
carried out on the tested parallel platform and thus the generator itself is
also a concurrent program. We describe the challenges associated with this
technology and the approach used by the Threadmill hardware exerciser, a tool
developed for the post-silicon validation of the IBM POWER7 processor.