TCAD Newsletter - May 2011 Issue 
Placing you one click away from the best new CAD research!

REGULAR PAPERS

HIGH-LEVEL SYNTHESIS

Lee, G. Choi, K. Dutt, N. D. Mapping Multi-Domain Applications Onto
Coarse-Grained Reconfigurable Architectures

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752428

Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing
attention due to their performance and flexibility. However, their applications
have been restricted to domains based on integer arithmetic since typical CGRAs
support only integer arithmetic or logical operations. This paper introduces
approaches to mapping applications onto CGRAs supporting both integer and
floating-point arithmetic. After presenting an optimal formulation using
integer linear programming, we present a fast heuristic mapping algorithm. Our
experiments on randomly generated examples generate optimal mapping results
using our heuristic algorithm for 97% of the examples within a few seconds. We
observe similar results for practical examples from multimedia and 3-D graphics
benchmarks. The applications mapped on a CGRA show up to 120 times performance
improvement compared to software implementations, demonstrating the potential
for application acceleration on CGRAs supporting floating- point operations.

LOGIC SYNTHESIS

Yang, Y.-S. Sinha, S. Veneris, A. Brayton, R. K. Automating Logic
Transformations With Approximate SPFDs

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752413

During the very large scale integration design process, a synthesized design is
often required to be modified in order to accommodate different goals. To
preserve the engineering effort already invested, designers seek small logic
structural transformations to achieve these logic restructuring goals. This
paper proposes a systematic methodology to devise such transformations
automatically. It first presents a simulation-based formulation to approximate
sets of pairs of functions to be distinguished and avoid the memory/time
explosion issue inherent with the original representation. Then, it uses this
new data structure to devise the required transformations dynamically without
the need of a static dictionary model.  The methodology is applied to both
combinational and sequential designs with transformations at a single or
multiple locations. An extensive suite of experiments documents the benefits of
the proposed methodology when compared to existing practices.

Gowda, T. Vrudhula, S. Kulkarni, N. Berezowski, K. Identification of Threshold
Functions and Synthesis of Threshold Networks

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752412

This paper presents a new and efficient heuristic procedure for determining
whether or not a given Boolean function is a threshold function, when the
Boolean function is given in the form of a decision diagram. The decision
diagram based method is significantly different from earlier methods that are
based on solving linear inequalities in Boolean variables that derived from
truth tables. This method's success depends on the ordering of the variables in
the binary decision diagram (BDD). An alternative data structure, and one that
is more compact than a BDD, called a max literal factor tree (MLFT) is
introduced.  An MLFT is a particular type of factoring tree and was found to be
more efficient than a BDD for identifying threshold functions. The threshold
identification procedure is applied to the MCNC benchmark circuits to
synthesize threshold gate networks.

Cui, A. Chang, C.-H. Tahar, S. Abdel-Hamid, A. T. A Robust FSM Watermarking
Scheme for IP Protection of Sequential Circuit Design

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752433

Finite state machines (FSMs) are the backbone of sequential circuit design. In
this paper, a new FSM watermarking scheme is proposed by making the authorship
information a non-redundant property of the FSM. To overcome the vulnerability
to state removal attack and minimize the design overhead, the watermark bits
are seamlessly interwoven into the outputs of the existing and free transitions
of state transition graph (STG). Unlike other transition-based STG
watermarking, pseudo input variables have been reduced and made functionally
indiscernible by the notion of reserved free literal. The assignment of
reserved literals is exploited to minimize the overhead of watermarking and
make the watermarked FSM fallible upon removal of any pseudo input variable. A
direct and convenient detection scheme is also proposed to allow the watermark
on the FSM to be publicly detectable. Experimental results on the watermarked
circuits from the ISCAS'89 and IWLS'93 benchmark sets show lower or acceptably
low overheads with higher tamper resilience and stronger authorship proof in
comparison with related watermarking schemes for sequential functions.

MODELING AND SIMULATION

Ghani, N. H. A. Najm, F. N. Fast Vectorless Power Grid Verification Under an
RLC Model

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752432

As part of early system design, one must verify that the power grid provides
the underlying logic circuitry with voltage levels that are within specified
ranges. In this paper, we describe a vectorless verification approach that can
be applied early in the design process. We adopt an RLC model of the grid in
the framework of current constraints that capture uncertainty about circuit
details and activity. With just a few linear programs and one linear system
solve, our proposed approach provides tight conservative bounds on the maximum
and minimum worst-case voltage drops at every node of the grid. Results show
the accuracy and speed of our technique thus making it practical and scalable.

PHYSICAL DESIGN

Papadopoulou, E. Net-Aware Critical Area Extraction for Opens in VLSI Circuits
Via Higher- Order Voronoi Diagrams

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752411

We address the problem of computing critical area for open faults (opens) in a
circuit layout in the presence of multilayer loops and redundant interconnects.
The extraction of critical area is the main computational bottleneck in
predicting the yield loss of a very large scale integrated design due to random
manufacturing defects. We first model the problem as a geometric graph problem
and we solve it efficiently by exploiting its geometric nature. To model open
faults, we formulate a new geometric version of the classic min-cut problem in
graphs, termed the geometric min-cut problem. Then the critical area extraction
problem gets reduced to the construction of a generalized Voronoi diagram for
open faults, based on concepts of higher order Voronoi diagrams. The approach
expands the Voronoi critical area computation paradigm with the ability to
accurately compute critical area for missing material defects even in the
presence of loops and redundant interconnects spanning over multiple layers.
The generalized Voronoi diagrams used in the solution are combinatorial
structures of independent interest. 

Huang, T. Li, L. Young, E. F. Y. On the Construction of Optimal
Obstacle-Avoiding Rectilinear Steiner Minimum Trees

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752407

This paper presents an efficient method to solve the obstacle-avoiding
rectilinear Steiner tree (OARSMT) problem optimally. Our work is developed
based on the GeoSteiner approach in which full Steiner trees (FSTs) are first
constructed and then combined into a rectilinear Steiner minimum tree (RSMT).
We modify and extend the algorithm to allow obstacles in the routing region.
For each routing obstacle, we first introduce four virtual terminals located at
its four corners. We then give the definition of FSTs with blockages and prove
that they will follow some very simple structures. Based on these observations,
a two-phase approach is developed for the construction of OARSMTs. In the first
phase, we generate a set of FSTs with blockages. In the second phase, the FSTs
generated in the first phase are used to construct an OARSMT. Finally,
experiments on several benchmarks are conducted. Results show that the proposed
method is able to handle problems with hundreds of terminals in the presence of
multiple obstacles, generating an optimal solution in a reasonable amount of
time.

Zhao, X. Lewis, D. L. Lee, H.-H. S. Lim, S. K. Low-Power Clock Tree Design for
Pre-Bond Testing of 3-D Stacked ICs

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752408

Pre-bond testing of 3-D stacked integrated circuits (ICs) involves testing each
individual die before bonding. The overall yield of 3-D ICs improves with
pre-bond testability because manufacturers can avoid stacking defective dies
with good ones. However, pre-bond testability presents unique challenges to 3-D
clock tree design. First, each die needs a complete 2-D clock tree to enable
pre-bond test. Second, the entire 3-D stack needs a complete 3-D clock tree for
post-bond test and operation. In the case of a two-die stack, a straightforward
solution is to have two complete 2-D clock trees connected with a single
through- silicon-via (TSV). We show that this solution suffers from long
wirelength (WL) and high clock power consumption. Our algorithm improves on
this solution, minimizes the overall WL and clock power consumption, and
provides both pre-bond testability and post-bond operability with minimum skew
and constrained slew. Compared with the single-TSV solution, SPICE simulation
results show that our multi- TSV approach significantly reduces the clock power
by up to 15.9% for two-die and 29.7% for four-die stacks. In addition, the WL
is reduced by up to 24.4% and 42.0%.

Ren, H. Dutt, S. Effective Power Optimization Under Timing and Voltage-Island Constraints Via 
Simultaneous Vdd, Vth Assignments, Gate Sizing, and Placement

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752436

We present a physical-synthesis based power optimization technique that
simultaneously explores the four transforms, multiple Vdd, multiple Vth , cell
sizing, and placement, to find a minimum power solution under timing and other
constraints. The optimal selection of the design options of all transforms for
all cells in the circuit is solved using a new optimization technique called
discretized network flow that we have recently developed. Among the constraints
we consider, timing and the voltage-island constraints are the two most
important and complex ones. The voltage-island constraint specifies the maximum
allowed number of voltage islands in the layout, and the requirement that each
island be rectangular. We develop an approach that along with the option
selection process can simultaneously determine the voltage islands needed, as
well as satisfy all given constraints. Experimental results on ISCAS'89 and
Faraday benchmarks show that compared to an initial wire length (WL)-optimized
placement with high supply and low threshold voltage levels, we obtain a power
reduction by up to 42% and an average of 30% for the same delay as that of the
initial design. These improvements are also 44?  50% relatively better than the
improvements yielded by sequentially applying the four power reduction
transforms, which is the currently standard method for applying multiple
transforms. Finally, compared to an industry tool Synopsys IC Compiler (ICC)
that also applies all four transforms, our method reduces power by an
additional amount of up to 19%, and an average of 16%.

SYSTEM-LEVEL DESIGN

Mintarno, E. Skaf, J. Zheng, R. Velamala, J. B. Cao, Y. Boyd, S. Dutton, R. W.
Mitra, S. Self- Tuning for Maximized Lifetime Energy-Efficiency in the Presence
of Circuit Aging

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752409

This paper presents an integrated framework, together with control policies,
for optimizing dynamic control of self-tuning parameters of a digital system
over its lifetime in the presence of circuit aging. A variety of self-tuning
parameters such as supply voltage, operating clock frequency, and dynamic
cooling are considered, and jointly optimized using efficient algorithms
described in this paper. Our optimized self-tuning approach satisfies
performance constraints at all times, and maximizes a lifetime computational
power efficiency (LCPE) metric, which is defined as the total number of clock
cycles achieved over lifetime divided by the total energy consumed over
lifetime. We present three control policies: 1) progressive-worst-case-aging
(PWCA), which assumes worst-case aging at all times; 2)
progressive-on-state-aging (POSA), which estimates aging by tracking
active/sleep modes, and then assumes worst-case aging in active mode and long
recovery effects in sleep mode; and 3) progressive- real-time-aging-assisted
(PRTA), which acquires real-time information and initiates optimized control
actions. Various flavors of these control policies for systems with dynamic
voltage and frequency scaling (DVFS) are also analyzed. Simulation results on
benchmark circuits, using aging models validated by 45 nm measurements,
demonstrate the effectiveness and practicality of our approach in significantly
improving LCPE and/or lifetime compared to traditional one-time worst-case
guardbanding. We also derive system design guidelines to maximize self-tuning
benefits.

Palesi, M. Ascia, G. Fazzino, F. Catania, V. Data Encoding Schemes in Networks
on Chip

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752410

An ever more significant fraction of the overall power dissipation of a
network-on-chip (NoC) based system-on-chip (SoC) is due to the interconnection
system. In fact, as technology shrinks, the power contribute of NoC links
starts to compete with that of NoC routers. In this paper, we propose the use
of data encoding techniques as a viable way to reduce both power dissipation
and energy consumption of NoC links. The proposed encoding scheme exploits the
wormhole switching techniques and works on an end-to-end basis. That is, flits
are encoded by the network interface (NI) before they are injected in the
network and are decoded by the destination NI. This makes the scheme
transparent to the underlying network since the encoder and decoder logic is
integrated in the NI and no modification of the routers architecture is
required. We assess the proposed encoding scheme on a set of representative
data streams (both synthetic and extracted from real applications) showing that
it is possible to reduce the power contribution of both the self-switching
activity and the coupling switching activity in inter-routers links.  As
results, we obtain a reduction in total power dissipation and energy
consumption up to 37% and 18%, respectively, without any significant
degradation in terms of both performance and silicon area.  

SHORT PAPERS

Kavousianos, X. Chakrabarty, K. Generation of Compact Stuck-At Test Sets
Targeting Unmodeled Defects

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752435

This letter presents a new method to generate compact stuck-at test sets that
offer high defect coverage.  The proposed method first selects the most
effective patterns from a large $N$-detect repository, by using a new output
deviation-based metric. Then it embeds complete coverage of stuck-at faults
within these patterns, and uses the proposed metric to further improve their
defect coverage. Results show that the proposed method outperforms a recently
proposed competing approach in terms of unmodeled defect coverage. In many
cases, higher defect coverage is obtained even than much larger N-detect test
sets for several values of N. Finally, results provide the insight that,
instead of using N-detect testing with as large N as possible, it is more
efficient to combine the output deviations metric with multi-detect testing to
get high-quality, compact test sets.