TCAD Newsletter - May 2010 Issue
Placing you one click away from the best new CAD research!


Regular Papers
==============

Paik, S.; Shin, I.; Kim, T.; Shin, Y., "HLS-l: A High-Level Synthesis Framework
for Latch-Based Architectures"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452109&isn...

Abstract: Level-sensitive latches are widely used in high-performance custom
designs while edge-triggered flip-flops are predominantly used in
application-specific integrated circuits. We consider a latch as a basis for
storage and address each step of high-level synthesis (HLS), including
scheduling, allocation, and control synthesis. While the use of latches
provides an opportunity to reduce the latency during the scheduling, the
register allocation has to take extra conflicts caused by latch into account,
and the control synthesis has to be tailored to support the latch-based
data-path. Optimization potentials specific to this HLS are identified and
solutions are proposed. Specifically, the register allocation can be improved
by refining the operation schedule in a way to reduce the number of edges in a
register conflict graph; the latency can be reduced by adjusting the clock duty
cycle in a way to generate a tighter schedule. All the steps of HLS and
optimization procedures were integrated into a framework called HLS-l. It was
tested on benchmark designs implemented in 1.1-V, 45 nm complementary
metal-oxide-semiconductor technology. Compared to the conventional HLS, HLS-l
was able to reduce the latency by 18.2% on average with 9.2% less area and
16.0% less power consumption. The application of HLS-l to an industrial example
is demonstrated through the design of a module extracted from H.264/advanced
video coding.


Tong, Y.-S.; Chen, S.-J., "An Automatic Optical Simulation-Based Lithography
Hotspot Fix Flow for Post-Route Optimization"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452094&isn...

Abstract: In this paper, an optical simulation-based lithography hotspot fix
guidance generator and an automatic hotspot fix flow are proposed. We develop
our aerial image simulation engine by enhancing the traditional sum of
coherence system method. Subject to the shape changes, a strong correlation
between the aerial image intensity difference maps of pre-optical proximity
correction (OPC) and post-OPC schemes is found. We collect near a litho hotspot
in a pre-OPC layout some fix actions that are local shape changes to optimize
the optical intensity. Then, fix guidances will be selected from the collected
fix actions by a heuristic algorithm and input to a router for fixing the
hotspot. We integrate the fix guidance generation method with a commercial
lithography hotspot detection tool to create an automatic post-route
optical-simulation-embedded local fix (OSELF) flow and test with industry 65 nm
designs. Compared with the commercial flow that uses only local fix, our method
has a $1.4times hbox{--}1.9times$ fix rate, similar run time, no new design
rule check violation, and negligible circuit timing impacts. We also combine
our OSELF algorithm with a rip-up and reroute engine, and test on the same
designs. Compared to the commercial tool that uses a hybrid (local fix plus
reroute) fix flow, our combined flow runs $1.7times hbox{--}2.9times$ faster
with 45\u201355% circuit timing impact. Both flows achieve a 100% hotspot fix
rate.


Hsu, C.-H.; Chen, H.-Y.; Chang, Y.-W., "Multilayer Global Routing With Via and Wire Capacity Considerations"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452098&isn...

Abstract: Global routing for modern large-scale circuit designs has attracted
much attention in the recent literature. Most of the state-of-the-art academic
global routers just work on a simplified routing congestion model that ignores
the essential via capacity for routing through multiple metal layers. Such a
simplified model would easily cause fatal routability problems in subsequent
detailed routing. To remedy this deficiency, a more effective congestion metric
that considers both the in-tile nets and the residual via capacity for global
routing is presented. Experimental results show that our global router can
achieve very high-quality routing solutions with more reasonable via usage.


Ho, K.-H.; Chen, Y.-P.; Fang, J.-W.; Chang, Y.-W., "ECO Timing Optimization
Using Spare Cells and Technology Remapping"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452097&isn...

Abstract: We introduce in this paper a new problem of post-mask engineering
change order (ECO) timing optimization using spare-cell rewiring and present a
two-phase framework for this problem. Spare-cell rewiring is a popular
technique for incremental timing optimization and/or functional change after
the placement stage. The spare-cell rewiring problem is very challenging
because of its dynamic wiring cost nature for selecting a spare cell, while the
existing related problems consider only static wiring cost: once a standard
cell is placed, its physical location is fixed and so is its wiring cost. For
the spare-cell rewiring problem, each rewiring could make some spare cells
become ordinary standard cells and some standard cells become new spare cells
simultaneously. As a result, the wiring cost becomes dynamic and further
complicates the optimization process. For the addressed problem, we present a
two-phase framework of 1) buffer insertion and gate sizing followed by 2)
technology remapping. For Phase 1, we present a dynamic programming algorithm
considering the dynamic cost, called dynamic cost programming, for the ECO
timing optimization with spare cells. Without loss of solution optimality, we
further present an effective pruning method by selecting spare cells only
inside an essential bounding polygon to reduce the solution space. For those
ECO timing paths that cannot be fixed during Phase 1, we apply technology
remapping on the spare cells to restructure the circuit to fix the timing
violations. The whole framework is integrated into a commercial design flow.
Experimental results based on five industry benchmarks show that our method is
very effective and efficient in fixing the timing violations of ECO paths.


Fang, J.-W.; Chang, Y.-W., "Area-I/O Flip-Chip Routing for Chip-Package
Co-Design Considering Signal Skews"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452092&isn...

Abstract: The area-input/output (I/O) flip-chip package provides a high
chip-density solution to the demand of more I/Os in very large scale
integration designs; it can achieve smaller package size, shorter wirelength,
and better signal and power integrity. In this paper, we introduce the routing
problem for chip and package co-design and present the first work in the
literature to handle the multiple re-distribution layer (RDL) routing problem
(without RDL vias) for flip-chip designs, considering pin and layer assignment,
signal integrity, signal-skew and total wirelength minimization, and
chip-package co-design. Our router adopts a two-stage technique of global
routing followed by RDL routing. The global routing assigns each block port to
a unique bump pad via an I/O pad and decides the RDL routing among I/O pads and
bump pads. Based on the minimum-cost maximum-flow algorithm, we can guarantee
100% RDL routing completion after the assignment and the optimal solution with
the minimum wirelength. The RDL routing efficiently distributes the routing
points between two adjacent bump pads and then generates a 100% routable
sequence to complete the routing. Experimental results based on 12 industry
designs demonstrate that our router can achieve 100% routability and the
optimal routing wirelength under reasonable central processing unit times,
while related works cannot.


Joshi, V.; Cline, B.; Sylvester, D.; Blaauw, D.; Agarwal, K., "Mechanical
Stress Aware Optimization for Leakage Power Reduction"

Abstract: Process-induced mechanical stress is used to enhance carrier transport and achieve higher drive currents in current complementary metal-oxide\u2014semiconductor technologies. This paper explores how to fully exploit the layout dependence of stress enhancement and proposes a circuit-level, block-based, stress-enhanced optimization algorithm that uses stress-optimized layouts in conjunction with dual-$V_{rm th}$ assignment to achieve optimal power-performance tradeoffs. We begin by studying how channel stress and drive current depend on layout parameters such as active area length and contact placement, while considering all layout-dependent sources of mechanical stress in a 65 nm industrial process. We then investigate the three main layout properties that impact mechanical stress in this process and discuss how to improve stress-based performance enhancement in standard cell libraries. While varying the stress-altering layout properties of a number of standard cells in a 65 nm industrial library, we show that
\u201cdual-stress\u201d standard cell layouts (analogous to \u201cdual-$V_{rm th}$\u201d) can be designed to achieve drive current differences up to
${sim}{rm 14}%$ while incurring less than half the leakage penalty of dual-$V_{rm th}$. Therefore, when the flexibility of \u201cdual-stress\u201d assignment is combined with dual-$V_{rm th}$ assignment (within the proposed joint optimization framework), simulation results for a set of benchmark circuits show that leakage is reduced by ${sim}{rm 24}%$ on average, for iso-delay, when compa- red to dual-$V_{rm th}$ assignment. Since mobility enhancement does not incur the exponential leakage penalty associated with $V_{rm th}$ assignment, our optimization technique is ideal for leakage power reduction. However, our framework can also be used to achieve higher performance circuits for iso-leakage and our joint optimization framework can be used to reduce delay on average by ${sim}{rm 5}%$. In both cases, the proposed method only incurs a small area penalty $({<}{rm 0.5}%)$.


Alizadeh, B.; Mirzaei, M.; Fujita, M., "Coverage Driven High-Level Test
Generation Using a Polynomial Model of Sequential Circuits"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452110&isn...

Abstract: This paper proposes a high-level test generation method which
considers the control part as well as data path of a register transfer level
circuit as a set of polynomial functions to generate behavioral test patterns
from faulty behavior instead of comparing the faulty and fault-free circuits
based on a hybrid Boolean-word canonical representation called Horner expansion
diagram. Since this set of polynomial functions express primary outputs and
next states with respect to primary inputs and present states, it is not
necessary to perform justification/propagation phase which leads to a minimum
number of backtracks. It improves fault coverage and reduces test generation
time over logic-level techniques. We assess then the effectiveness of
high-level test generation with a simple gate-level automatic test pattern
generation algorithm. Experimental results show robustness and reliability of
our method compared to other contemporary approaches in terms of fault coverage
and CPU time.

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452130&isn...

Zolotov, V.; Xiong, J.; Fatemi, H.; Visweswariah, C., "Statistical Path
Selection for At-Speed Test"

Abstract: Process variations make at-speed testing significantly more difficult. They cause subtle delay changes that are distributed in contrast to the localized nature of a traditional fault model. Due to parametric variations, different paths can be critical in different parts of the process space, and the union of such paths must be tested to obtain good process space coverage. This paper proposes an integrated at-speed structural testing methodology, and develops a novel branch-and-bound algorithm that elegantly and efficiently solves the hitherto open problem of statistical path tracing. The resulting paths are used for at-speed structural testing. A new test quality metric is proposed, and paths which maximize this metric are selected. After chip timing has been performed, the path selection procedure is extremely efficient. Path selection for a multimillion gate chip design can be completed in a matter of seconds.

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452120&isn...

Yilmaz, M.; Chakrabarty, K.; Tehranipoor, M., "Test-Pattern Selection for
Screening Small-Delay Defects in Very-Deep Submicrometer Integrated Circuits"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452121&isn...

Abstract: Timing-related defects are major contributors to test escapes and
in-field reliability problems for very-deep submicrometer integrated circuits.
Small delay variations induced by crosstalk, process variations, power-supply
noise, as well as resistive opens and shorts can potentially cause timing
failures in a design, thereby leading to quality and reliability concerns. We
present a test-grading technique that uses the method of output deviations for
screening small-delay defects (SDDs). A new gate-delay defect probability
measure is defined to model delay variations for nanometer technologies. The
proposed technique intelligently selects the best set of patterns for SDD
detection from an $n$-detect pattern set generated using timing-unaware
automatic test-pattern generation (ATPG). It offers significantly lower
computational complexity and excites a larger number of long paths compared to
a current generation commercial timing-aware ATPG tool. Our results also show
that, for the same pattern count, the selected patterns provide more effective
coverage ramp-up than timing-aware ATPG and a recent pattern-selection method
for random SDDs potentially caused by resistive shorts, resistive opens, and
process variations.


Ganeshpure, K.; Kundu, S., "On ATPG for Multiple Aggressor Crosstalk Faults"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452112&isn...

Abstract: Crosstalk faults have emerged as a significant mechanism of circuit
failure due to decreasing process geometries and increasing operation
frequencies. Long signal nets are highly susceptible to crosstalk faults
because they tend to have a higher coupling capacitance to overall capacitance
ratio. Moreover, a typical long net also has multiple aggressors. In generating
patterns to create maximal crosstalk induced delay on a victim net, it may be
impossible to activate all aggressors logically or simultaneously to
constructively induce maximum noise at the victim. Therefore, pattern
generation must focus on activating a maximal subset of aggressors, weighted by
actual coupling capacitance value, in close temporal proximity of the victim
net transition. This max-satisfiability problem is constrained by fault effect
propagation condition which involves determining an input signal assignment so
as to propagate the fault effect at the victim to the primary output. In this
paper, we present Automatic Test Pattern Generation (ATPG) solutions for
multiple aggressor crosstalk faults for zero and unit delay models and compare
the magnitude of crosstalk induced delay at the victim net. Our solution
involves a combination of 0\u20131 Integer Linear Programming (ILP), for
maximal aggressor excitation. Fault effect propagation is solved independently
by using traditional stuck-at fault ATPG or by generating additional ILP
constraints thus forming a integrated ILP formulation with error propagation.
The effect of gate delays is summed by circuit transformation. The proposed
technique was applied to ISCAS85 benchmark circuits. Results indicate that the
percentage of total capacitance that can be switched varies from 75\u2013100%
for zero delay and 30\u201380% for variable delay case while achieving
propagation of the fault effect to primary output.


Alves, N.; Buben, A.; Nepal, K.; Dworak, J.; Bahar, R. I., "A Cost Effective
Approach for Online Error Detection Using Invariant Relationships"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452115&isn...

Abstract: This paper investigates the use of logic implication checkers for the
online detection of errors. A logic implication, or invariant relationship,
must hold for all valid input conditions; therefore, any violation of this
implication will indicate an error due to an intermittent fault. Techniques are
presented to efficiently identify the most useful logic implications to include
in checker hardware such that the probability of error detection is maximized
while minimizing the additional hardware and delay overhead. Results show that
significant error detection is possible\u2014even with only a 10% area
overhead\u2014while minimizing impact on delay and power.


Qian, Y.; Lu, Z.; Dou, W., "Analysis of Worst-Case Delay Bounds for On-Chip
Packet-Switching Networks"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452093&isn...

Abstract: In network-on-chip (NoC), computing worst-case delay bounds for
packet delivery is crucial for designing predictable systems but yet an
intractable problem. This paper presents an analysis technique to derive
per-flow communication delay bound. Based on a network contention model, this
technique, which is topology independent, employs network calculus to first
compute the equivalent service curve for an individual flow and then calculate
its packet delay bound. To exemplify this method, this paper also presents the
derivation of a closed-form formula to compute a flow's delay bound under
all-to-one gather communication. Experimental results demonstrate that the
theoretical bounds are correct and tight.


Majzoub, S. S.; Saleh, R. A.; Wilton, S. J. E.; Ward, R. K., "Energy
Optimization for Many-Core Platforms: Communication and PVT Aware
Voltage-Island Formation and Voltage Selection Algorithm"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452113&isn...

Abstract: In this paper, we propose a novel approach to voltage-island
formation, for the energy optimization of many-core architectures, which
mitigates the impact of process, voltage, and temperature (PVT) variations. The
islands are created by balancing their shape constraints imposed by intra and
inter-island communication with the desire to limit the spatial extent of each
island to minimize PVT impact. In addition, to reduce the number of voltage
levels in the design, we propose an efficient voltage selection approach that
provides near optimal results, for a set of 33 examined cases, with more than a
ten times speedup compared to the best-known previous methods. This run-time
improvement is important, especially for large many-core platforms. Finally, we
present an evaluation platform considering pre-fabrication and post-fabrication
PVT scenarios where multiple applications with hundreds to thousands of tasks
are mapped onto many-core platforms with hundreds to thousands of cores to
evaluate the proposed techniques. Results show that the average energy savings
for 33 test cases using the proposed methods are 37% compared to 16% obtained
using previous methods.


Short Papers
============

Broussev, S. S.; Tchamov, N. T., "Time-Varying Root-Locus of Large-Signal LC
Oscillators"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452116&isn...

Abstract: Time-varying root-locus of a large-signal LC oscillator is obtained
via semi-symbolic analysis, where the complex frequency $s$ is a symbol. The
steady-state circuit behavior is modeled by its time-varying small-signal
admittance matrix within one oscillation period. The roots of the
characteristic equation are computed with the QZ method. The time-varying
root-locus analysis capabilities are demonstrated on a GHz range 130 nm
complementary metal-oxide-semiconductor cross-coupled LC oscillator, and they
complement the results obtained with the traditional numerical computer-aided
design methods.


Tzeng, C.-W.; Huang, S.-Y., "Split-Masking: An Output Masking Scheme for
Effective Compound Defect Diagnosis in Scan Architecture With Test Compression"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452123&isn...

Abstract: In modern scan architecture, it is often desirable to compact the
output response without jeopardizing the diagnostic resolution. In this paper,
we propose an output masking scheme to meet such a stringent requirement. We
consider a practical scenario in which an output compactor is in use. We aim to
support the harshest condition called compound defect diagnosis, in which
faults exist in both the scan chain and the core logic. To overcome the loss of
the diagnostic resolution, we incorporate a split-masking scheme, by which one
can easily separate the output responses of the faulty chains from those of the
fault-free ones. The experimental results demonstrate that the proposed scheme
can recover the diagnostic resolution loss induced by an output compactor
almost completely without sacrificing the compaction ratio.


Suissa, A.; Romain, O.; Denoulet, J.; Hachicha, K.; Garda, P., "Empirical
Method Based on Neural Networks for Analog Power Modeling"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452129&isn...

Abstract: We introduce an empirical method for power consumption modeling of
analog components at system level. The principal step of this method uses
neural networks to approximate the mathematical curve of the power consumption
as a function of the inputs and parameters of the analog component. For a node
of a wireless sensors network, we found an average error of 1.53% with a
maximum error of 3.06% between our estimation and the measured power
consumption. This novel method is suitable for Platform-Based Design and has
three key features for architecture exploration purposes. Firstly, the method
is generic as it can be applied to any analog component in any modeling and
simulation environment. Secondly, the method is suitable for the total (analog
and digital) power consumption estimation of a heterogeneous system. Thirdly,
the method provides an online estimation of the instantaneous power consumption
of analog blocks.


Chang, C.-H.; Faust, M., "On A New Common Subexpression Elimination Algorithm
for Realizing Low-Complexity Higher Order Digital Filters"
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5452122&isn...

Abstract: A thorough analysis of the paper above revealed several controversial
arguments about the superiority of binary representation over canonical signed
digits (CSD) for common subexpression elimination (CSE). It was improper to
model the number of logic operators (LO) required after CSE as a linear sum of
independently weighted numbers of nonzero bits, common subexpressions and
unpaired bits. The logic depth (LD) penalty of binary CSE had been deemphasized
by the errors in the reported LD. This comment corrects the LD of contention
resolution algorithm, and points out some contradictions with reference to the
latest experimentation of binary, CSD and minimal signed digit number
representations for CSE. Upon correcting the error in the reported filter
lengths for different stopband attenuations of digital advanced mobile phone
system specification, the LO and LD data of the CSE algorithms compared in the
above paper are recalculated using the corrected filter coefficient sets.