February 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-02.txt 

SPECIAL SECTION ON THE 2011 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN

Hu, J. Koh, C. K. 
Guest Editorial
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132644

Yuan, K. Yu, B. Pan, D. Z. 
E-Beam Lithography Stencil Planning and Optimization With Overlapped Characters
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132658

Electronic beam lithography (EBL) is one of the promising emerging technologies
in the sub-22 nm regime. In EBL, the desired circuit patterns are directly shot
into the wafer, which overcomes the diffraction limit of light in the current
optical lithography system. However, the low throughput becomes its key
technical hurdle. In the conventional EBL system, each rectangle in the layout
will be projected by one electronic shot through a variable shape beam (VSB).
This could be extremely slow. As an improved EBL technology, character
projection (CP) shoots complex shapes, so-called characters, in one time, by
putting them into a predesigned stencil. However, only a limited number of
characters can be employed, due to the area constraint. Those patterns, not
contained by any character, are still required to be written by VSB. A key
problem is how to select an optimal set of characters and pack them on the CP
stencil to minimize total processing time. In this paper, we investigate a
problem of electronic beam lithography stencil design with overlapped
characters. Different from previous works, besides selecting appropriate
characters, their placements on the stencil are also optimized in our
framework. Specifically, we propose a Hamilton-path-based iterative algorithm
to handle 1-D stencil design problem, and an effective simulated annealing
framework for the generalized 2-D case with an efficient look-ahead sequence
pair evaluation technique.  The experimental results show that, compared to
conventional stencil design methodology without overlapped characters, we are
able to reduce total projection time by 51%.

Wang, S.-H. Liang, Y.-Y. Kuo, T.-Y. Mak, W.-K. 
Power-Driven Flip-Flop Merging and Relocation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132656

We propose a power-driven flip-flop (FF) merging and relocation approach that
can be applied after conventional timing-driven placement and before clock
network synthesis. It targets to reduce the clock network size and thus the
clock power consumption while controlling the switching power of the nets
connected to the FFs by selectively merging FFs into multibit FFs and
relocating them under timing and placement density constraints. The
experimental results are very encouraging. For a set of benchmarks, our
approach reduced the switching capacitance of clock network by 36%Ð43% after
gated clock tree synthesis. Finally, the total switching capacitance of clock
network and nets connected to the FFs is reduced by 24%Ð29%.

Jiang, I. H.-R. Chang, C.-L. Yang, Y.-M. 
INTEGRA: Fast Multibit Flip-Flop Clustering for Clock Power Saving
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132648

Clock power is the major contributor to dynamic power for modern integrated
circuit design. A conventional single- bit flip-flop cell uses an inverter
chain with a high drive strength to drive the clock signal. Clustering several
such cells and forming a multibit flip-flop can share the drive strength,
dynamic power, and area of the inverter chain, and can even save the clock
network power and facilitate the skew control. Hence, in this paper, we focus
on postplacement multibit flip-flop clustering to gain these benefits.
Utilizing the properties of Manhattan distance and coordinate transformation,
we model the problem instance by two interval graphs and use a pair of
linear-sized sequences as our representation. Without enumerating all possible
combinations, we identify only partial sequences that are necessary to cluster
flip-flops, thus leading to an efficient clustering scheme. Moreover, our fast
coordinate transformation also makes the execution of our algorithm very
efficient. The experiments are conducted on industrial circuits. Our results
show that concise representation delivers superior efficiency and
effectiveness. Even under timing and placement density constraints, clock power
saving via multibit flip-flop clustering can still be substantial at
postplacement.

Lee, D.-J. Markov, I. L. 
Obstacle-Aware Clock-Tree Shaping During Placement
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132650

Traditional integrated circuit (IC) design flows optimize clock networks before
signal-net routing are limited by the quality of register placement. Existing
publications also reflect this bias and focus mostly on clock routing. The few
known techniques for register placement exhibit significant limitations and do
not account for recent progress in large-scale placement and obstacle-aware
clock-network synthesis. In this paper, we integrate clock network synthesis
within global placement by optimizing register locations. We propose: 1)
obstacle-aware virtual clock-tree synthesis; 2) arboreal clock-net contraction
force with virtual-node insertion, which can handle multiple clock domains and
gated clocks; and 3) an obstacle-avoidance force (OAF). Our work is validated
on large benchmarks with numerous macroblocks. Experimental results indicate
that our software implementation, called Lopper, prunes clock-tree branches to
reduce their length by 30.0%Ð36.6% and average total dynamic power consumption
by 6.8%Ð 11.6% versus conventional wirelength-driven approaches. SPICE-driven
simulations show that our methods improve robustness of clock trees.

Lu, J. Mao, X. Taskin, B. 
Integrated Clock Mesh Synthesis With Incremental Register Placement
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132652

A clock mesh planning and synthesis method is proposed which significantly
reduces the power dissipation on the network while considering the power
density and timing slack simultaneously. The proposed method is performed at
the postplacement stage and consists of three major steps: 1) feasible moving
region construction of each register considering timing slack; 2) mesh grid
wire generation and placement; and 3) incremental register placement for stub
wire minimization considering power density and timing slack. The advantages of
the proposed method are the reduced power dissipationÑ28% on average on the
benchmark circuitsÑthe optimized power density, and the guaranteed nonnegative
timing slack. These advantages are possible through a decreased timing slack
(1.1% of the clock period) and change in the logic wirelength $(+5.9%)$ on the
benchmark circuits.

Knechtel, J. Markov, I. L. Lienig, J. 
Assembling 2-D Blocks Into 3-D Chips
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132649

Despite numerous advantages of 3-D integrated circuits (ICs), their commercial
success remains limited. In part, this is due to the wide availability of
trustworthy intellectual property (IP) blocks developed for 2-D ICs and proven
through repeated use. Block-based design reuse is imperative for heterogeneous
3-D ICs where memory, logic, analog, and microelectromechanical systems dies
are manufactured at different technology nodes and circuit modules cannot be
partitioned among several dies. In this paper, we show how to integrate 2-D IP
blocks into 3-D chips without altering their layout. Experiments indicate that
the overhead of proposed integration is small, which can help accelerate
industry adoption of 3-D integration.

Zhao, Y. Chakrabarty, K. 
Simultaneous Optimization of Droplet Routing and Control-Pin Mapping to
Electrodes in Digital Microfluidic Biochips
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132659

The number of independent input pins used to control the electrodes in digital
microfluidic ÒbiochipsÓ is an important cost-driver in the emerging market
place, especially for disposable PCB devices that are being developed for
clinical and point-of-care diagnostics. However, most prior work on
pin-constrained biochip design considers droplet routing and the assignment of
pins to electrodes as independent problems. In this paper, we propose
optimization methods to solve the droplet routing and pin-constrained design
problems concurrently. First, we formulate the co-optimization problem
involving droplet routing and pin-mapping. Next, we present an integer linear
programming-based optimization method to solve the droplet-routing and the
pin-mapping design problems concurrently. The proposed co-optimization method
minimizes the number of control pins. We also present an efficient heuristic
approach to tackle the co-optimization problem. These methods overcome a major
drawback of a recently proposed method, which leads to infeasible solutions
involving conflicts in the mapping of pins to electrodes in different
droplet-routing stages. The effectiveness of the proposed co-optimization
method is demonstrated for two commercial biochips and an experimental
university chip for multiplexed in-vitro diagnostics. 

Tian, H. Tang, W.-C. Young, E. F. Y. Sze, C. N. 
Postgrid Clock Routing for High Performance Microprocessor Designs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132655

Designing a high-quality clock network is very important in very large-scale
integrated designs today, as it is the clock network that synchronizes all the
elements of a chip, and it is also a major source of power dissipation of a
system. Early study by Pham in 2006 shows that about 18.1% of the total clock
capacitance was due to this postgrid clock routing (i.e., lower mesh wires plus
clock twig wires). In this paper, we proposed a partition-based path expansion
algorithm to solve this postgrid clock routing problem effectively.
Experimental results on industrial test cases show that our algorithm can
improve over the latest work by Shelar on this problem significantly by
reducing the wire capacitance by 24.6% and the wirelength by 23.6%.

REGULAR PAPERS

LOGIC SYNTHESIS

Chen, Y.-C. Wang, C.-Y. 
Logic Restructuring Using Node Addition and Removal
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132646

This paper presents a logic restructuring technique named node addition and
removal (NAR). It works by adding a node into a circuit to replace an existing
node and then removing the replaced node. Previous node-merging techniques
focus on replacing one node with an existing node in a circuit, but fail to
replace a node that has no substitute node. To enhance the node-merging
techniques on logic restructuring and optimization, we propose an NAR approach
in this paper. We first present two sufficient conditions that state the
requirements of added nodes for safely replacing a target node. Then, an NAR
approach is proposed to quickly detect the added nodes by performing logic
implications based on these conditions. We apply the NAR approach to circuit
minimization together with two techniques: redundancy removal and mandatory
assignment reuse. We also apply it to satisfiability (SAT)-based bounded
sequential equivalence checking (BSEC) to reduce the computation complexity of
SAT solving. The experimental results show that our approach can enhance our
prior automatic test pattern generation-based node- merging approach.
Additionally, our approach has a competitive capability of circuit minimization
with 44 times speedup compared to a SAT-based node-merging approach. For BSEC,
our approach can work together with other optimization technique to save a
total of approximately 39-h verification time for all the benchmarks.

MODELING AND SIMULATION

Mangassarian, H. Veneris, A. Najm, F. N. 
Maximum Circuit Activity Estimation Using Pseudo-Boolean Satisfiability
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132653

With lower supply voltages, increased integration densities and higher
operating frequencies, power grid verification has become a crucial step in the
very large-scale integration design cycle. The accurate estimation of maximum
instantaneous power dissipation aims at finding the worst-case scenario where
excessive simultaneous switching could impose extreme current demands on the
power grid. This problem is highly input-pattern dependent and is proven to be
NP-hard. In this paper, we capitalize on the compelling advancements in
satisfiability (SAT) solvers to propose a pseudo-Boolean SAT-based framework
that reports the input patterns maximizing circuit activity, and consequently
peak dynamic power, in combinational and sequential circuits. The proposed
framework is enhanced to handle unit gate delays and output glitches. In order
to disallow unrealistic input transitions, we show how to integrate input
constraints in the formulation. Finally, a number of optimization techniques,
such as the use of gate switching equivalence classes, are described to improve
the scalability of the proposed method. An extensive suite of experiments on
ISCAS85 and ISCAS89 circuits confirms the robustness of the approach compared
to simulation- based techniques and encourages further research for low-power
solutions using Boolean SAT.

PHYSICAL DESIGN

Yan, T. Wong, M. D. F. 
Correctly Model the Diagonal Capacity in Escape Routing
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132657

Escape routing for packages and printed circuit boards (PCBs) has been studied
extensively in the past. Network flow is pervasively used to model this
problem. However, none of the previous works correctly models the diagonal
capacity, which is essential for 45 degree routing in most packages and PCBs.
As a result, existing algorithms may either produce routing solutions that
violate the diagonal capacity or fail to output a legal routing even though
there exists one. In this paper, we propose a new network flow model that
guarantees the correctness when diagonal capacity is taken into consideration.
This model leads to the first optimal algorithm for escape routing. We also
extend our model to handle missing pins.

SYSTEM-LEVEL DESIGN

Hernandez, C. Roca, A. Silla, F. Flich, J. Duato, J. 
On the Impact of Within-Die Process Variation in GALS-Based NoC Performance
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132647

Current integration scales allow designing chip multiprocessors (CMP), where
cores are interconnected by means of a network-on-chip (NoC). Unfortunately,
the small feature size of current integration scales causes some
unpredictability in manufactured devices because of process variation. In NoCs,
variability may affect links and routers causing them not to match the
parameters established at design time. In this paper, we first analyze the way
that manufacturing deviations affect the components of a NoC by applying a new
comprehensive and detailed within-die variability model to 200 instances of an
8x8 mesh NoC synthesized using 45 nm technology. Later, we show that GALS-based
NoCs present communication bottlenecks under process variation which cannot be
avoided by using just device-level solutions but higher level architectural
approaches are required. Therefore, to overcome this performance reduction, we
draft a novel architectural approach, called performance domains, intended to
reduce the negative impact of variability on application execution time. This
mechanism is suitable when several applications are simultaneously running in
the CMP chip.

TEST

Lin, Y.-T. Poku, O. Blanton, R. D. S. Nigh, P. Lloyd, P. Iyengar, V. 
Physically-Aware N-Detect Test
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132651

Physically-aware $N$-detect ($PAN$-detect) test improves defect coverage by
exploiting defect locality. This paper presents physically-aware test selection
(PATS) to efficiently generate $PAN$-detect tests for large industrial designs.
Compared to traditional $N$-detect test, the quality resulting from PAN-detect
is enhanced without any increase in test execution cost. Experiment results
from an IBM in-production application-specific integrated circuit demonstrate
the effectiveness of PATS in improving defect coverage. Moreover, utilizing
novel test-metric evaluation, we compare the effectiveness of traditional
N-detect and PAN-detect, and demonstrate the impact of automatic test pattern
generation parameters on the effectiveness of PAN-detect. 

SHORT PAPERS

Pomeranz, I. 
Multipattern Scan-Based Test Sets With Small Numbers of Primary Input Sequences
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132654 

When a multipattern scan-based test is applied at-speed to detect delay
defects, it is necessary to change the primary input vectors at-speed. However,
tester limitations can make this infeasible. The solution where the primary
input vectors are held constant during the test reduces the fault coverage. An
alternative solution is to store the primary input sequences of a multipattern
test set on-chip and apply them at-speed from an on-chip memory. To support
such a solution, this paper describes a procedure for computing a multipattern
test set that requires a small number of different primary input sequences.
Experimental results for single stuck-at faults and for transition faults show
that a multipattern test set that detects all the detectable faults requires a
number of primary input sequences that is significantly smaller than the number
of tests.

Errata to "Accelerating FPGA Routing Through Parallelization and Engineering
Enhancements Special Section on PAR-CAD 2010"
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132645