TCAD Newsletter – February 2010 Issue
Placing you one click away from the best new CAD research!

Regular Papers
==============

Enhanced Double Via Insertion Using Wire Bending
Lee, K.-Y.; Lin, S.-T.; Wang, T.C 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395744&isnumber=5395722&tag=1

Redundant via insertion is highly recommended for improving chip yield and
reliability. In this paper, we studied the problem of simultaneous double via
insertion and wire bending (DVI/WB) in a postrouting stage, where a single via
can have at most one redundant via inserted next to it. Aside from this, we are
allowed to bend existing signal wires for enhancing the insertion rate of
double vias. The primary goal of the DVI/WB problem is to insert as many double
vias as possible; the secondary objective is to minimize the amount of layout
perturbation. We formulate the DVI/WB problem as that of finding a
minimum-weight maximum independent set (mWMIS) on an enhanced conflict graph.
We proposed algorithms to perform wire bending and to construct the enhanced
conflict graph from a given design. We also proposed a zero-one integer linear
program (0-1 ILP)-based approach to solve the mWMIS problem. Moreover, we
studied the problem of DVI/WB with the consideration of via density and
extended our 0-1 ILP-based approach to solve it. Experimental results show that
our approaches can improve the insertion rate by up to 6.34% at the expense of
up to 1.29% wirelength increase when compared with the state-of-the-art double
via insertion methods that do not consider wire bending. Moreover, when
compared with an existing method that considers wire bending, our DVI/WB
approach can insert 2% more double vias and produce 32% less wirelength
increase rate on average.

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization
Yuan, K.; Yang, J.-S.; Pan, D. Z.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395742&isnumber=5395722

Double patterning lithography (DPL) is considered as a most likely solution for
32 nm/22nm technology. In DPL, the layout patterns are decomposed into two
masks (colors), and manufactured through two exposures and etch steps. If the
spacing between two features (polygons) is less than certain minimum coloring
distance, they have to be assigned opposite colors. However, a proper coloring
is not always feasible because two neighboring patterns within the minimum
distance may be in the same mask due to complex pattern configurations. In that
case, a feature may need to be split into two parts to resolve the conflict,
resulting in stitch insertion which causes yield loss due to overlay and
line-end effect. While previous layout decomposition approaches perform
coloring and splitting separately, in this paper, we propose a simultaneous
conflict and stitch minimization algorithm with an integer linear programming
(ILP) formulation.
Since ILP is in class NP-hard, the algorithm includes three speed-up techniques: 1) grid merging; 2) independent component computation; and 3) layout partition. In addition, our algorithm can be extended to handle design rules such as overlap margin and minimum width for practical use as well as off-grid layout. Our approach can reduce 33% of stitches and remove conflicts by 87.6% compared with two phase greedy decomposition.

Layout Generator for Transistor-Level High-Density Regular Circuits
Lin, Y.-W.; Marek-Sadowska, M.; Maly, W. P.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395731&isnumber=5395722

In this paper, we describe an automatic place and route strategy for a
high-density, super-regular, double-gate, transistor-array-based layout.
Interconnects on all metal layers are strictly parallel and can be manufactured
by an optical proximity correction free process. Our objective is to achieve a
circuit layout area equal to the transistor footprint. Such layout constraints
limit routing flexibility and render traditional approaches impractical. Our
tools automatically generate circuits with several tens of transistors.
Experimental results demonstrate both the efficiency of the proposed algorithms
and the high quality of the layouts produced.

Capturing Post-Silicon Variations Using a Representative Critical Path
Liu, Q.; Sapatnekar, S. S.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395738&isnumber=5395722

In nanoscale technologies that experience large levels of process variation,
post-silicon adaptation is an important step in circuit design. These
adaptation techniques are often based on measurements of a replica of the
nominal critical path, whose variations are intended to reflect those of the
entire circuit after manufacturing. For realistic circuits, where the number of
critical paths can be large, the notion of using a single critical path is too
simplistic. This paper overcomes this problem by introducing the idea of
synthesizing a representative critical path (RCP), which captures these
complexities of the variations. We first prove that the requirement on the RCP
is that it should be highly correlated with the circuit delay. Next, we present
three novel algorithms to automatically build the RCP. Our experimental results
demonstrate that over a number of samples of manufactured circuits, the delay
of the RCP captures the worst case delay of the manufactured circuit. The
average prediction error of all circuits is shown to be below 2.8% for all
three approaches. For both our approach and the critical path replica method,
it is essential to guard-band the prediction to ensure pessimism: on average
our approach requires a guard band 31% smaller than for the critical path
replica method.

A New Algorithm for Simultaneous Gate Sizing and Threshold Voltage Assignment
Liu, Y.; Hu, J.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395732&isnumber=5395722

Gate sizing and threshold voltage (Vt) assignment are popular techniques for
circuit timing and power optimization. Existing methods, by and large, are
either sensitivity-driven heuristics or based on discretizing continuous
optimization solutions. Sensitivity-driven heuristics are easily trapped in
local optima and the discretization may be subject to remarkable errors. In
this paper, we propose a systematic combinatorial approach for simultaneous
gate sizing and Vt assignment. The core idea of this approach is joint
relaxation and restriction, which employs consistency relaxation and coupled
bi-directional solution search. The process of joint relaxation and restriction
is conducted iteratively to systematically improve solutions. Our algorithm is
compared with a state-of-the-art previous work on benchmark circuits. The
results from our algorithm can lead to about 22% less power dissipation subject
to the same timing constraints.

Modeling the Overshooting Effect for CMOS Inverter Delay Analysis in Nanometer Technologies
Huang, Z.; Kurokawa, A.; Hashimoto, M.; Sato, T.; Jiang, M.; Inoue, Y,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395729&isnumber=5395722

With the scaling of complementary metal–oxide–semiconductor (CMOS) technology
into the nanometer regime, the overshooting effect due to the input-to-output
coupling capacitance has more significant influence on CMOS gate analysis,
especially on CMOS gate static timing analysis. In this paper, the overshooting
effect is modeled for CMOS inverter delay analysis in nanometer technologies.
The results produced by the proposed model are close to simulation program with
integrated circuit emphasis (SPICE). Moreover, the influence of the
overshooting effect on CMOS inverter analysis is discussed. An analytical model
is presented to calculate the CMOS inverter delay time based on the proposed
overshooting effect model, which is verified to be in good agreement with SPICE
results.
Furthermore, the proposed model is used to improve the accuracy of the switch-resistor model for approximating the inverter output waveform.

Victim Alignment in Crosstalk-Aware Timing Analysis
Gandikota, R.; Chopra, K.; Blaauw, D.; Sylvester, D.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395735&isnumber=5395722

Modeling the effect of coupling-noise on circuit delay is a key issue in static
timing analysis and involves the victim–aggressor alignment problem. As
delay-noise strongly depends on the skew between the victim–aggressor driver
input transitions, it is not possible a priori identify the victim-driver input
transition that results in the worst-case delay-noise. Several approaches have
been proposed in literature which heuristically search for the worst-case
victim–aggressor alignment. This paper presents an analytical result that
obviates the need to search for the optimal victim-driver input transition,
thereby simplifying the victim–aggressor alignment problem significantly. Using
the properties of standard nonlinear complementary metal-oxide semiconductor
drivers, it is shown that for monotonic input transitions the worst-case
victim-driver input transition is the one that switches at the latest point in
its timing window. Similarly, the victim-driver input alignment at the earliest
point in the timing window is optimal for early-mode analysis. Although this
result has been empirically observed in the industry, to the best of our
knowledge this is the first paper which provides a rigorous analysis and shows
that the above result holds for both linear and nonlinear drivers. It is also
shown that the latest alignment of the victim-driver input transition results
in the latest victim receiver output arrival time even for the cases where the
victim is coupled to multiple aggressors. Finally, experimental results show
that limiting the alignment of the victim to only the latest victim-driver
input transition can significantly reduce the runtime of existing approaches
with no loss of accuracy.

New Reconfigurable Architectures for Implementing FIR Filters with Low Complexity
Mahesh, R.; Vinod, A. P.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395736&isnumber=5395722+

Reconfigurability and low complexity are the two key requirements of finite
impulse response (FIR) filters employed in multistandard wireless communication
systems. In this paper, two new reconfigurable architectures of low complexity
FIR filters are proposed, namely constant shifts method and programmable shifts
method. The proposed FIR filter architecture is capable of operating for
different wordlength filter coefficients without any overhead in the hardware
circuitry. We show that dynamically reconfigurable filters can be efficiently
implemented by using common subexpression elimination algorithms. The proposed
architectures have been implemented and tested on Virtex 2v3000ff1152-4
field-programmable gate array and synthesized on 0.18 ?m complementary
metal–oxide–semiconductor technology with a precision of 16 bits. Design
examples show that the proposed architectures offer good area and power
reductions and speed improvement compared to the best existing reconfigurable
FIR filter implementations in the literature.

A Physical-Location-Aware X-Filling Method for IR-Drop Reduction in At-Speed Scan Test
Hsieh, W.-W.; Chen, S.-L.; Lin, I.-S.; Hwang, T. T.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395734&isnumber=5395722

The IR-drop problem during test mode exacerbates delay defects and results in
false failures. In this paper, we take the X-filling approach to reduce the
IR-drop effect during an at-speed test. The main difference between our
approach and the previous X-filling approaches lies in two aspects. The first
one is that we take the spatial information into consideration in our approach.
The second one is how X-filling is performed. We propose a backward-propagation
technique instead of a forward-propagation approach taken in previous work. The
experimental results show that our approach can reduce 21.1% of the maximum
IR-drop in the best case and 9.1% on the average as compared to previous work.

Using Launch-on-Capture for Testing BIST Designs Containing Synchronous and Asynchronous Clock Domains
Wang, L.-T.; Wen, X.; Wu, S.; Furukawa, H.; Chao, H.-J.; Sheu, B.; Guo, J.; Jone, W.-B.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395739&isnumber=5395722

This paper presents a new at-speed logic built-in self-test (BIST) architecture
supporting two launch-on-capture schemes, namely aligned double-capture and
staggered doublecapture, for testing multi-frequency synchronous and
asynchronous clock domains in a scan-based BIST design. The proposed
architecture also includes BIST debug and diagnosis circuitry to help locate
BIST failures. The aligned scheme detects and allows diagnosis of structural
and delay faults among all synchronous clock domains, whereas the staggered
scheme detects and allows diagnosis of structural and delay faults among all
asynchronous clock domains. Both schemes solve the longstanding problem of
using the conventional one-hot scheme, which requires testing each clock domain
one at a time, or the simultaneous scheme, which requires adding isolation
logic to normal functional paths across interacting clock domains. Physical
implementation is easily achieved by the proposed solution due to the use of a
slow-speed, global scan enable signal and reduced timing-critical design
requirements. Application results for industrial designs demonstrate the
effectiveness of the proposed architecture.



Special Section Short Papers


A Routing Approach to Reduce Glitches in Low Power FPGAs
Dinh, Q.; Chen, D.; Wong, M. D. F.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395747&isnumber=5395722

This paper presents a novel approach to reduce dynamic power in
field-programmable gate arrays (FPGAs) by reducing glitches during routing. It
finds alternative routes for early-arriving signals so that signal arrival
times at look-up tables are aligned. We developed an efficient algorithm to
find routes with target delays and then built a glitch-aware router aiming at
reducing dynamic power. To the best of our knowledge, this is the first
glitch-aware routing algorithm for FPGAs. Experiments show that an average of
27% reduction in glitch power is achieved, which translates into an 11%
reduction in dynamic power, compared to the glitch-unaware versatile place and
route’s router.

A Metal-Only-ECO Solver for Input-Slew and Output-Loading Violations
Lu, C.-P.; Chao, M. C.-T.; Lo, C.-H.; Chang, C. W.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395740&isnumber=5395722

To reduce the time-to-market and photomask cost for advanced process
technologies, metal-only engineering change order (ECO) has become a practical
and attractive solution to handle incremental design changes. Due to limited
spare cells in metal-only ECO, the new added netlist may often violate the
input-slew and output-loading constraints and, in turn, delay or even fail the
timing closure. This paper presents a framework, named metal-only ECO slew/cap
solver (MOESS), to resolve the input-slew and output-loading violations by
connecting spare cells onto the violated nets as buffers. MOESS performs two
buffer insertion schemes in a sequential manner to first minimize the number of
inserted buffers and then resolve timing violations, if any. The experimental
results based on industrial designs demonstrate that MOESS can resolve more
violations with fewer inserted buffers and less central processing unit runtime
compared to an electronic design automation vendor’s solution.

Routing With Constraints for Post-Grid Clock Distribution in Microprocessors
Shelar, R. S.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395733&isnumber=5395722

Microprocessors typically employ a global grid followed by block-level buffered
trees for clock distribution. The trees are connected to the grid by routing
wires along reserved tracks. The routing of these clock wires, which present
load to the grid, is constrained by delay/slope requirements at inputs of the
block-level trees. This leads to a capacitance minimization problem during
multiterminal routing, where routes use the reserved tracks and obey the
constraints. This paper presents an algorithm that addresses the problem,
improving wirelength by 14% over a competitive approach. The algorithm is
employed for post-grid clock distribution in a 45 nm technology microprocessor.


Short Papers

Placement Optimization for Yield Improvement of Switched-Capacitor Analog Integrated Circuits
Chen, J.-E.; Luo, P.-W.; Wey, C.-L.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395737&isnumber=5395722

Capacitor mismatch can generally result from two sources of error: random
mismatch and systematic mismatch. Random mismatch is caused by process
variation, while systematic mismatch is mainly due to an asymmetrical layout
and processing gradients. A common centroid structure may be used to reduce
systematic mismatch errors, but not random mismatch errors. Based on the
spatial correlation model, this paper formulates the placement optimization
problem of analog circuits using switched-capacitor techniques. A placement
with higher correlation coefficients of the unit capacitors results in a higher
acceptance rate, or chip yield. This paper proposes a heuristic algorithm that
quickly and automatically derives the placement of the unit capacitors with the
highest, or near-highest, correlation coefficients for yield improvement.
Results show that the resultant placement derived from the proposed algorithm
achieves better yield improvement than that from a common centroid approach.
The proposed heuristic algorithm can be applied for any arbitrary capacitor
ratios, i.e., more than two capacitors.

Optimal Double Via Insertion with On-Track Preference
Lee, K.-Y.; Wang, T.-C.; Koh, C.-K.; Chao, K.-Y.,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395741&isnumber=5395722

As on-track double vias take less routing resources and have better electrical
characteristics, we study in this paper the problem of double via insertion
with a preference for on-track double vias (DVI/ON) in a postrouting stage. The
primary goal is to insert as many double vias as possible, and maximizing the
number of on-track double vias is a secondary objective. We present a zero-one
integer linear program-based approach to optimally solve the DVI/ON problem.
Moreover, we also discuss a special case of the DVI/ON problem and present a
maximum weighted bipartite matching-based optimal approach. Experimental
results indicate that our approaches outperform existing algorithms in terms of
solution quality.