TCAD Newsletter - January 2011 Issue
Placing you one click away from the best new CAD research!


Regular Papers

FPGAS AND RECONFIGURABLE COMPUTING

Banerjee, P.; Sangtani, M.; Sur-Kolay, S., "Floorplanning for Partially
Reconfigurable FPGAs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671549&isn...

Abstract: Partial reconfiguration on heterogeneous field-programmable gate
arrays with millions of gates yields better utilization of its different types
of resources by swapping in and out the appropriate modules of one or more
applications at any instant of time. Given a schedule of sub-task instances
where each instance is specified as a netlist of active modules,
reconfiguration overhead can be reduced by fixing the position and shapes of
modules common across all instances. We propose a global floorplan generation
method PartialHeteroFP to obtain same positions for the common modules across
all instances such that the heterogeneous resource requirements of all modules
in each instance are satisfied, and the total half-perimeter wirelength over
all instances is minimal. Experimental results establish that the proposed
PartialHeteroFP produces floorplans very fast, with 100% match of common
modules and thereby minimizing the partial reconfiguration overhead.


Ling, A. C.; Brown, S. D.; Safarpour, S.; Zhu, J., "Toward Automated ECOs in
FPGAs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671540&isn...

Abstract: Engineering change orders (ECOs), which are used to apply late-stage
specification changes and bug fixes, have become an important part of the
field-programmable gate array design flow. ECOs are beneficial since they are
applied directly to a placed-and-routed netlist which preserves most of the
engineering effort invested previously. Unfortunately, designers often apply
ECOs in a manual fashion which may have an unpredictable impact on the design's
final correctness and end costs. As a solution, this paper introduces an
automated method to tackle the ECO problem. This paper uses a novel resynthesis
technique which can automatically update the functionality of a circuit by
leveraging the existing logic within the design, thereby removing the
inefficient manual effort required by a designer. The technique presented in
this paper is robust enough to handle a wide range of changes. Furthermore, the
technique can successfully make late-stage functional changes while minimally
perturbing the placed-and-routed netlist: something that is necessary for ECOs.
Also, this technique does this with a minimal impact on the circuit performance
where on average over 90% of the placement and routing wires remain unchanged.


LOGIC SYNTHESIS

Keren, O.; Levin, I.; Stankovic, R. S., "Determining the Number of Paths in
Decision Diagrams by Using Autocorrelation Coefficients"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671542&isn...

Abstract: This paper deals with the number of paths in multiterminal binary
decision diagrams (MTBDDs) and shared binary decision diagrams (SBDDs)
representing a set of Boolean functions. It is shown that the number of paths
in an MTBDD (SBDD) can be uniquely determined by values of specific
weighted-autocorrelation coefficients. An analytical expression for the number
of paths as a linear function of the values of the weighted-autocorrelation
coefficients is presented. Based on this expression, a method of minimization
of the number of paths is proposed. The method is based on replacing the
initial set of input variables with their linear combinations. By using this
method, a deterministic paths-reduction procedure, which provides MTBDDs and
SBDDs with a reduced number of paths, is presented. The efficiency of the
suggested approach is demonstrated on benchmark functions.

MODELING AND SIMULATION

Ye, X.; Dong, W.; Li, P.; Nassif, S., "Hierarchical Multialgorithm Parallel
Circuit Simulation"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671539&isn...

Abstract: The emergence of multicore and many-core processors has introduced
new opportunities and challenges to electronic design automation research and
development. While the availability of increasing parallel computing power
holds new promise to address many challenges in computer-aided design (CAD),
the leverage of hardware parallelism can only be possible with a new generation
of parallel CAD applications. In this paper, we propose a novel hierarchical
multialgorithm (MA) parallel circuit simulation approach and its multicore
implementation to expedite one of the most fundamental CAD applications:
transistor-level transient circuit simulation. In our parallel circuit
simulation approach, we create two levels of parallelism. At the higher level
of parallelism, we start multiple simulation algorithms in parallel for a given
simulation task. Interalgorithm communication is established to enable
simulation algorithms to exchange useful information so that they could advance
faster than without doing so. At the lower level of parallelism, each algorithm
within the MA framework utilizes fine-grained parallel techniques such as
parallel device evaluation and parallel matrix solve to fully harness the
available hardware resources. By combining the two levels of parallelism, the
computing power of the multicore or many-core processor platforms can be fully
utilized to achieve superlinear speedup in circuit simulation.


Xie, L.; Davoodi, A., "Bound-Based Statistically-Critical Path Extraction Under Process Variations"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671544&isn...

Abstract: This paper introduces a bound-based approach to extract a
pre-specified number of statistically-critical paths under process variations.
These are the paths with the highest "violation probability", which indicates
the probability that a path would violate a given timing constraint. Our
approach requires pre-computation of the violation probability of all the nodes
and edges in the circuit timing graph, which can be done using two rounds of
block-based statistical static timing analysis. Given these node/edge violation
probabilities, we derive tight upper and lower bounds for any arbitrary segment
of consecutive nodes and edges, which is the major contribution of this paper.
We further utilize these bounds to extract the statistically-critical paths and
show constant-time for incremental update of the bounds when extending a
segment to a longer one. If our goal is to extract the single most
statistically-critical path, we show a bound-based reduction that can prune a
large portion of circuit without losing optimality. In our simulations, we
verify the correctness and accuracy of our bounds for individual paths, and
compare with exact path extraction using Monte-Carlo-based simulation, and an
alternative which incorporates path-based statistical static timing analysis.

PHYSICAL DESIGN

Wu, T.-H.; Davoodi, A.; Linderoth, J. T., "GRIP: Global Routing via Integer
Programming"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671547&isn...

Abstract: This paper introduces GRIP, a global routing technique via integer
programming. GRIP optimizes wirelength and via cost directly without going
through a traditional layer assignment phase. Candidate routes spanning all the
metal layers are generated using a linear programming pricing phase that
formally accounts for the impact of existing candidate routes when generating
new ones. To make an integer-programming-based approach applicable for today's
large-scale global routing instances, the original problem is decomposed into
smaller subproblems corresponding to rectangular subregions on the chip
together with their net assignments. Route fragments of nets that fall in
adjacent subproblems are connected in a flexible manner. In case of overflow,
GRIP applies a second-phase optimization that explicitly minimizes overflow. By
using integer programming in an effective manner, GRIP obtains high-quality
solutions. Specifically, for the ISPD 2007 and 2008 benchmarks, GRIP obtains an
average improvement in wirelength and via cost of 9.23% and 5.24%,
respectively, when compared to the best result in the open literature.


Ma, Q.; Xiao, L.; Tam, Y.-C.; Young, E. F. Y., "Simultaneous Handling of
Symmetry, Common Centroid, and General Placement Constraints"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671537&isn...

Abstract: In today's system-on-chip designs, both digital and analog parts of a
circuit will be implemented on the same chip. Parasitic mismatch induced by
layout will affect circuit performance significantly for analog designs.
Consideration of symmetry and common centroid constraints during placement can
help to reduce these errors. Besides these two specific types of placement
constraints, other constraints, such as alignment, abutment, preplace, and
maximum separation, are also essential in circuit placement. In this paper, we
will present a placement methodology that can handle all these constraints at
the same time. To the best of our knowledge, this is the first piece of work
that can handle symmetry constraint, common centroid constraint, and other
general placement constraints, simultaneously. Experimental results do confirm
the effectiveness and scalability of our approach in solving this mixed
constraint-driven placement problem.


Jang, H.; Joo, D.; Kim, T., "Buffer Sizing and Polarity Assignment in Clock
Tree Synthesis for Power/Ground Noise Minimization"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671543&isn...

Abstract: In synchronous systems, clock tree causes high peak current at clock
edges, increasing power/ground noise significantly, if the clock tree is not
carefully designed. This paper addresses the problem of minimizing power/ground
noise in the clock tree synthesis. Contrary to the previous approaches which
only make use of assigning polarities to clock buffers to reduce power/ground
noise, our approach solves a new problem of simultaneous consideration of
assigning polarities to clock buffers and determining buffer sizes to fully
exploit the effects of buffer sizing together with polarity assignment on the
minimization of power/ground noise while satisfying the clock skew constraint.
Specifically, the contributions of this paper are: 1) precisely estimating peak
currents by clock buffers and reflecting them on the power/ground noise
minimization; 2) proposing a pseudo-polynomial time optimal algorithm based on
dynamic programming for solving the integrated problem, together with the proof
of intractability of the problem; 3) devising a systematic design flow
framework for reducing the power/ground noise over the entire chip; and
4)considering the effect of thermal variation on the clock skew bound and the
noise minimization.

SYSTEM-LEVEL DESIGN

Kim, J.; Yoo, S.; Kyung, C.-M., "Program Phase-Aware Dynamic Voltage Scaling
Under Variable Computational Workload and Memory Stall Environment"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671536&isn...

Abstract: Most complex software programs are characterized by program phase
behavior and runtime distribution. Dynamism of the two characteristics often
makes the design-time workload prediction difficult and inefficient.
Especially, memory stall time whose variation is significant in memory-bound
applications has been mostly neglected or handled in a too simplistic manner in
previous works. In this paper, we present a novel online dynamic voltage and
frequency scaling (DVFS) method which takes into account both program phase
behavior and runtime distribution of memory stall time, as well as
computational workload. The online DVFS problem is addressed in two ways:
intraphase workload prediction and program phase detection. The intraphase
workload prediction is to predict the workload based on the runtime
distribution of computational workload and memory stall time in the current
program phase. The program phase detection is to identify to which program
phase the current instant belongs and then to obtain the predicted workload
corresponding to the detected program phase, which is used to set voltage and
frequency during the program phase. The proposed method considers leakage power
consumption as well as dynamic power consumption by a temperature-aware
combined Vdd/Vbb scaling. Compared to a conventional method, experimental
results show that the proposed method provides up to 34.6% and 17.3% energy
reduction for two multimedia applications, MPEG4 and H.264 decoder,
respectively.


Loi, I.; Angiolini, F.; Fujita, S.; Mitra, S.; Benini, L., "Characterization
and Implementation of Fault-Tolerant Vertical Links for 3-D Networks-on-Chip"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671541&isn...

Abstract: Through silicon vias (TSVs) provide an efficient way to support
vertical communication among different layers of a vertically stacked chip,
enabling scalable 3-D networks-on-chip (NoC) architectures. Unfortunately, low
TSV yields significantly impact the feasibility of high-bandwidth vertical
connectivity. In this paper, we present a semi-automated design flow for 3-D
NoCs including a defect-tolerance scheme to increase the global yield of 3-D
stacked chips. Starting from an accurate physical and geometrical model of
TSVs: 1) we extract a circuit-level model for vertical interconnections; 2) we
use it to evaluate the design implications of extending switch architectures
with ports in the vertical direction; moreover, 3) we present a
defect-tolerance technique for TSV-based multi-bit links through an effective
use of redundancy; and finally, 4) we present a design flow allowing for
post-layout simulation of NoCs with links in all three physical dimensions.
Experimental results show that a 3-D NoC implementation yields around 10%
frequency improvement over a 2-D one, thanks to the propagation delay advantage
of TSVs and the shorter links. In addition, the adopted fault tolerance scheme
demonstrates a significant yield improvement, ranging from 66% to 98%, with a
low area cost (20.9% on a vertical link in a NoC switch, which leads a modest
2.1% increase in the total switch area) in 130 nm technology, with minimal
impact on very large-scale integrated design and test flows.

TEST

Xiang, D.; Zhang, Y., "Cost-Effective Power-Aware Core Testing in NoCs Based on a New Unicast-Based Multicast Scheme"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671538&isn...

Abstract: Reuse of network-on-chip (NoC) for test data and test response
delivery is attractive. However, previous techniques do not effectively use the
bandwidths of the network by delivering test packets to all cores separately,
which can make very much test cost and test data volume. The NoC core testing
problem is formulated as a unicast-based multicast problem in order to reduce
test data delivery time in the NoC. Test response data are forwarded back to
the automated test equipment (ATE) via the communication channels using the
reverse paths of test data delivery, which are compacted on the way from each
processor to the ATE. A new power-aware test scheduling scheme is proposed,
which is extended to cases for multiple port ATEs. Test data is further
compressed before delivering and a low-power test application scheme is used
for the cores because power produced by cores is the bottleneck of NoC test.
Experimental results are presented to show the effectiveness of the proposed
method in reducing the NoC test cost and test data volume by comparing to the
previous methods.


Biswas, S.; Blanton, R. D., "Reducing Test Execution Cost of Integrated,
Heterogeneous Systems Using Continuous Test Data"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671546&isn...

Abstract: Integrated, heterogeneous systems are comprehensively tested to
verify whether their performance specifications fall within some acceptable
ranges. However, explicitly testing every manufactured instance against all of
its specifications can be expensive due to the complex requirements for test
setup, stimulus application, and response measurement. To reduce manufacturing
test cost, we have developed a methodology that uses binary decision forests
and several test-specific enhancements for identifying redundant tests of an
integrated system. Feasibility is empirically demonstrated using test data from
over 70 000 manufactured instances of an in-production microelectromechanical
system accelerometer, and over 4 500 manufactured instances of an RF
transceiver. Through our analysis, we have shown that the expensive
cold-mechanical test of the accelerometer and nine out of the 22 RF tests of
the transceiver are likely redundant.


Short Papers
============

Jaffari, J.; Anis, M., "On Efficient LHS-Based Yield Analysis of Analog
Circuits"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5671545&isn...

Abstract: The Latin hypercube sampling (LHS) has been used as a
variance-reduction estimation tool for an efficient sampling-based variability
analysis of analog circuits. For a certain estimation confidence interval, a
lower number of LHS samples is needed than that of Monte Carlo due to the
estimation variance reduction. In this paper, an analysis of variance
decomposition of the indicator function, the yield function, reveals strong
contribution of interactive terms in the variance of the yield function,
leading to limited performance gain of the traditional LHS sampling. In order
to improve its efficiency, two correlation-controlled LHS methods are developed
to reduce the required number of LHS samples for analog circuit yield
estimation.