TCAD Newsletter - December 2010 Issue
Placing you one click away from the best new CAD research!


Regular Papers

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Maffezzoni, P., "Computing the Synchronization Regions of Injection-Locked
Strongly Nonlinear Oscillators for Frequency Division Applications"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621039&isn...

Abstract: This paper describes an original method to compute the
synchronization regions of injection-locked oscillators that exhibit a strong
nonlinear response, such as relaxation or ring architectures. The proposed
method is based on the theory of controllably periodically forced oscillators
and is implemented within the frame of the time-domain periodic steady-state
analysis. The novel method is able to predict, in an accurate way, the
conditions for which the correct frequency division occurs and can identify the
transitions to more complex regimes that limit the exploitable synchronization
region.


FPGAS AND RECONFIGURABLE COMPUTING

Lin, M.; Wawrzynek, J., "Improving FPGA Placement With Dynamically Adaptive
Stochastic Tunneling"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621041&isn...

Abstract: This paper develops a dynamically adaptive stochastic tunneling
(DAST) algorithm to avoid the "freezing" problem commonly found when using
simulated annealing for circuit placement on field-programmable gate arrays
(FPGAs). The main objective is to reduce the placement runtime and improve the
quality of final placement. We achieve this by allowing the DAST placer to
tunnel energetically inaccessible regions of the potential solution space,
adjusting the stochastic tunneling schedule adaptively by performing detrended
fluctuation analysis, and selecting move types dynamically by a multi-modal
scheme based on Gibbs sampling. A prototype annealing-based placer, called
DAST, was developed as part of this paper. It targets the same computer-aided
design flow as the standard versatile placement and routing (VPR) but replaces
its original annealer with the DAST algorithm. Our experimental results using
the benchmark suite and FPGA architecture file which comes with the Toronto
VPR5 software package have shown a 18.3% reduction in runtime and a 7.2%
improvement in critical-path delay over that of conventional VPR.


LOGIC SYNTHESIS

Maidee, P.; Bazargan, K., "Improvements on Efficiency and Efficacy of SPFD-Based Rewiring for LUT-Based Circuits"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621036&isn...

Abstract: This paper proposes two set-of-pairs-of-functions-to-be-distinguished
(SPFD)-based rewiring algorithms to be used in a multi-tier rewiring framework,
which employs multiple rewiring techniques. The first algorithm has two unique
features: 1) a satisfiability problem (SAT) instance was devised so that an
unsuccessful rewiring can be identified very quickly, and 2) unlike binary
decision diagram-based methods that require all pairs of SPFD, our algorithm
uses a few SAT instances to perform rewiring for a given wire without
explicitly enumerating all SPFDs. Experimental results show that the runtime of
our algorithm is about three times faster than that of a conventional one under
a simulated setting of such a framework and it scales well with the number of
candidate wires considered. The efficacy of the framework can be further
improved by the second proposed algorithm. The algorithm relies on a theory
presented herein to allow adding a new wire outside of the restricted set of
dominator nodes, a feature common in automatic-test-pattern-generation-based
rewiring, but absent in existing SPFD-based ones. Although this algorithm may
suffer from long runtimes in the same way conventional SPFD-based techniques
do, experiments show that the number of wires which can be rewired increases
13% on average and the number of alternative wires also increases.


MODELING AND SIMULATION

Maricau, E.; Gielen, G., "Efficient Variability-Aware NBTI and Hot Carrier
Circuit Reliability Analysis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621034&isn...

Abstract: This paper discusses an efficient method to analyze the spatial and
temporal reliability of analog and digital circuits. First, a SPICE-based
reliability simulator with automatic step-size control is proposed. Both hot
carrier degradation and negative bias temperature instability are included in
the simulator. Next, a method to analyze the interaction between process
variability effects and circuit aging is introduced. This method is based on a
screening experimental design (DoE) succeeded by a set of regression DoEs,
resulting in a good speed-accuracy tradeoff with a nearly linear complexity for
all circuits under test. Finally, based on the DoE analysis, a circuit response
surface model (RSM) is derived. The RSM is used for further circuit reliability
analysis such as circuit weak spot detection and yield calculation as a
function of circuit lifetime. The proposed method is validated over a broad
range of both analog and digital circuits. Yield simulation time is reduced
with up to three orders of magnitude, when compared to standard Monte
Carlo-based techniques and while still maintaining simulation accuracy.


Jaffari, J.; Anis, M., "Advanced Variance Reduction and Sampling Techniques for Efficient Statistical Timing Analysis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621042&isn...

Abstract: The Monte-Carlo (MC) technique is a traditional solution for a
reliable statistical analysis, and in contrast to probabilistic methods, it can
account for any complicate model. However, a precise analysis that involves a
traditional MC-based technique requires many simulation iterations, especially
for the extreme quantile points. In this paper, advanced sampling and variance
reduction techniques, along with applications for efficient digital circuit
timing yield analysis, are studied. Three techniques are proposed: 1) an
enhanced quasi-MC-based sampling which generates optimally low-discrepancy
samples suitable for yield estimation of digital circuits; 2) an
order-statistics based control-variate technique that improves the quality of
the yield estimations, when a moderate number of samples is needed; and 3) a
classical control-variate technique utilized for a variance-reduced critical
delay's statistical moment estimation. This solution is shown to be effective
even for a very low number of samples.


Wang, J.; Singhee, A.; Rutenbar, R. A.; Calhoun, B. H., "Two Fast Methods for
Estimating the Minimum Standby Supply Voltage for Large SRAMs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621031&isn...

Abstract: The data retention voltage (DRV) defines the minimum supply voltage
for an SRAM cell to hold its state. Intra-die variation causes a statistical
distribution of DRV for individual cells in a memory array. We present two fast
and accurate methods to estimate the tail of the DRV distribution. The first
method uses a new analytical model based on the relationship between DRV and
static noise margin. The second method extends the statistical blockade
technique to a recursive formulation. It uses conditional sampling for rapid
statistical simulation and fits the results to a generalized Pareto
distribution (GPD) model. Both the analytical DRV model and the generic GPD
model show a good match with Monte Carlo simulation results and offer speedups
of up to four or five orders of magnitude over Monte Carlo at the 6 sigma
point. In addition, the two models show a very close agreement with each other
at the tail up to 8 sigma. For error within 5% with a confidence of 95%, the
analytical DRV model and the GPD model can predict DRV quantiles out to 8 sigma
and 6.6 sigma respectively; and for the mean of the estimate, both models offer
within 1% error relative to Monte Carlo at the 4 sigma point.


PHYSICAL DESIGN

Su, Y.-S.; Hon, W.-K.; Yang, C.-C.; Chang, S.-C.; Chang, Y.-J., "Clock Skew
Minimization in Multi-Voltage Mode Designs Using Adjustable Delay Buffers"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621030&isn...

Abstract: In synchronous circuit designs, clock skew is difficult to minimize
because a single physical layout of a clock tree must satisfy multiple
constraints in a complicated power mode environment where certain modules may
operate with different voltages. In this paper, we use adjustable delay buffers
(ADB) whose delays can be tuned or adjusted to minimize clock skew under
different power modes. Assuming that the positions of k ADBs are already
determined, we first propose a linear-time optimal algorithm which assigns the
values of ADBs so that the skew is optimal among all possible ADB assignments
with a possibility of latency penalty. Then, we propose a modified optimal
algorithm without latency penalty. We also propose an efficient heuristic to
determine good positions for ADBs. Our results show significant improvement
when compared to cases without ADBs.


Chang, Y.-J.; Lee, Y.-T.; Gao, J.-R.; Wu, P.-C.; Wang, T.-C., "NTHU-Route 2.0:
A Robust Global Router for Modern Designs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621046&isn...

Abstract: This paper presents a robust global router called NTHU-Route 2.0 that
improves the solution quality and runtime of NTHU-Route by the following
enhancements: 1) a new history based cost function; 2) new ordering methods for
congested region identification and rip-up and reroute; and 3) two
implementation techniques. We report convincing experimental results to show
the effectiveness of each individual enhancement. With all these enhancements
together, NTHU-Route 2.0 solves all ISPD98 benchmarks with very good quality.
Moreover, NTHU-Route 2.0 routes 7 of 8 ISPD07 benchmarks and 12 of 16 ISPD08
benchmarks without any overflow. Compared with other state-of-the-art global
routers, NTHU-Route 2.0 is able to produce better solution quality and/or run
more efficiently.


Rajaram, A.; Pan, D. Z., "MeshWorks: A Comprehensive Framework for Optimized Clock Mesh Network Synthesis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621032&isn...

Abstract: Clock mesh networks are well known for their variation tolerance. But
their usage is limited to high-end designs due to the significantly high
resource requirements compared to clock trees and the lack of automatic mesh
synthesis tools. Most existing works on clock mesh networks either deal with
semi-custom design or perform optimizations on a given clock mesh. However, the
problem of obtaining a good initial clock mesh has not been addressed. Also,
the problem of achieving a smooth tradeoff between variation tolerance and
resource requirements has not been addressed adequately. In this paper, we
present our MeshWorks framework, the first comprehensive automated framework
for planning, synthesis, and optimization of clock mesh networks that addresses
the above issues. Experimental results suggest that our algorithms can achieve
an additional reduction of 31% in buffer area, 21% in wirelength, and 23% in
power, compared to the best previous work, with similar worst case maximum
frequency. We also demonstrate the effectiveness of our framework under several
practical issues such as blockages, multiple clocks, uneven load distribution,
and electromigration violations.


SYSTEM-LEVEL DESIGN

Dong, A.; Zhao, J.; Xie, Y., "Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621044&isn...

Abstract: 3-D integration technology is emerging as an attractive alternative
to increase the transistor count for future chips. The majority of the existing
3-D integrated circuit (IC) research is focused on the performance, power,
density, and heterogeneous integration benefits offered by 3-D integration. All
such advantages, however, ultimately have to translate into cost evaluation
when a design strategy has to be decided. Consequently, system-level cost
analysis at early design stages is imperative to decide on whether 3-D
integration should be adopted. This paper presents a cost estimation method for
3-D ICs at early design stages and proposes a set of cost models that include
wafer cost, 3-D bonding cost, package cost, and cooling cost. The proposed 3-D
IC cost estimation method can help designers analyze the cost implication for
3-D ICs during the design space exploration at the early stage, and it enables
a cost-driven 3-D IC design flow that can guide the design choice toward a
cost-effective direction. Based on the proposed cost estimation method, this
paper demonstrates two case studies that explore the cost benefits of 3-D
integration for application-specific integrated circuit designs and many-core
microprocessor designs style, respectively. Finally, this paper suggests the
optimum partitioning strategy for future 3-D IC designs.


Jafari, F.; Lu, Z.; Jantsch, A.; Yaghmaee, M. H., "Buffer Optimization in Network-on-Chip Through Flow Regulation"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621045&isn...

Abstract: For network-on-chip (NoC) designs, optimizing buffers is an essential
task since buffers are a major source of cost and power consumption. This paper
proposes flow regulation and has defined a regulation spectrum as a means for
system-on-chip architects to control delay and backlog bounds. The regulation
is performed per flow for its peak rate and burstiness. However, many flows may
have conflicting regulation requirements due to interferences with each other.
Based on the regulation spectrum, this paper optimizes the regulation
parameters aiming for buffer optimization. Three timing-constrained buffer
optimization problems are formulated, namely, buffer size minimization, buffer
variance minimization, and multiobjective optimization, which has both buffer
size and variance as minimization objectives. Minimizing buffer variance is
also important because it affects the modularity of routers and network
interfaces. A realistic case study exhibits 62.8% reduction of total buffers,
84.3% reduction of total latency, and 94.4% reduction on the sum of variances
of buffers. Likewise, the experimental results demonstrate similar improvements
in the case of synthetic traffic patterns. The optimization algorithm has low
run-time complexity, enabling quick exploration of large design spaces. This
paper concludes that optimal flow regulation can be a highly valuable
instrument for buffer optimization in NoC designs.


Seiculescu, C.; Murali, S.; Benini, L.; De Micheli, G., "SunFloor 3D: A Tool for Networks on Chip Topology Synthesis for 3-D Systems on Chips"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621033&isn...

Abstract: Three-dimensional integrated circuits (3D-ICs) are a promising
approach to address the integration challenges faced by current systems on
chips (SoCs). Designing an efficient network on chip (NoC) interconnect for a
3-D SoC that meets not only the application performance constraints but also
the constraints imposed by the 3-D technology is a significant challenge. In
this paper, we present a design tool, SunFloor 3D, to synthesize
application-specific 3-D NoCs. The proposed tool determines the best NoC
topology for the application, finds paths for the communication flows, assigns
the network components to the 3-D layers, and places them in each layer. We
perform experiments on several SoC benchmarks and present a comparative study
between 3-D and 2-D NoC designs. Our studies show large improvements in
interconnect power consumption (average of 38%) and delay (average of 13%) for
the 3-D NoC when compared to the corresponding 2-D implementation. Our studies
also show that the synthesized topologies result in large power (average of
54%) and delay savings (average of 21%) when compared to standard topologies.


Ogras, U. Y.; Bogdan, P.; Marculescu, R., "An Analytical Approach for Network-on-Chip Performance Analysis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621037&isn...

Abstract: Networks-on-chip (NoCs) have recently emerged as a scalable
alternative to classical bus and point-to-point architectures. To date,
performance evaluation of NoC designs is largely based on simulation which,
besides being extremely slow, provides little insight on how different design
parameters affect the actual network performance. Therefore, it is practically
impossible to use simulation for optimization purposes. In this paper, we
present a mathematical model for on-chip routers and utilize this new model for
NoC performance analysis. The proposed model can be used not only to obtain
fast and accurate performance estimates, but also to guide the NoC design
process within an optimization loop. The accuracy of our approach and its
practical use is illustrated through extensive simulation results.


TEST

Jeong, W.; Lee, J.; Han, T.; Lee, K.; Kang, S., "An Advanced BIRA for Memories With an Optimal Repair Rate and Fast Analysis Speed by Using a Branch Analyzer"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621043&isn...

Abstract: As memory capacity and density grow, a corresponding increase in the
number of defects decreases the yield and quality of embedded memories for
systems-on-chip as well as commodity memories. For embedded memories, built-in
redundancy analysis (BIRA) is widely used to solve quality and yield issues by
replacing faulty cells with healthy redundant cells. Many BIRA approaches
require extra hardware overhead in order to achieve optimal repair rates, or
they suffer a loss of repair rate in minimizing the hardware overhead. An
innovative BIRA approach is proposed to achieve optimal repair rates, lower
area overhead, and increase analysis speed. The proposed BIRA minimizes area
overhead by eliminating some storage coverage for only must-repair faulty
information. The proposed BIRA analyzes redundancies quickly and efficiently by
evaluating all nodes of a branch in parallel with a new analyzer which is
simple and easy-to-implement. Experimental results show that the proposed BIRA
allows for a much faster analysis speed than that of the state-of-the-art BIRA,
as well as the optimal repair rate, and relatively small area overhead.


VERIFICATION

Nocco, S.; Quer, S., "A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621035&isn...

Abstract: The task graph cost-optimal scheduling problem consists in scheduling
a certain number of interdependent tasks onto a set of heterogeneous processors
(characterized by idle and running rates per time unit), minimizing the cost of
the entire process. This paper provides a novel formulation for this scheduling
puzzle, in which an optimal solution is computed through a sequence of binate
covering problems, hinged within a bounded model checking paradigm. In this
approach, each covering instance, providing a min-cost trace for a given
schedule depth, can be solved with several strategies, resorting to
minimum-cost satisfiability solvers or pseudo-Boolean optimization tools.
Unfortunately, all direct resolution methods show very low efficiency and
scalability. As a consequence, we introduce a specialized method to solve the
same sequence of problems, based on a traditional all-solution SAT solver. This
approach follows the "circuit cofactoring" strategy, as it exploits a powerful
technique to capture a large set of solutions for any new SAT counter-example.
The overall method is completed with a branch-and-bound heuristic which
evaluates lower and upper bounds of the schedule length, to reduce the state
space that has to be visited. Our results show that the proposed strategy
significantly improves the blind binate covering schema, and it outperforms
general purpose state-of-the-art tools.


Short Papers
============

Jang, Y.; Kim, J.; Kyung, C.-M., "Topology Synthesis for Low Power Cascaded Crossbar Switches"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621040&isn...

Abstract: Crossbar switch network has an increasing impact on such critical
measures as throughput, latency, area, and power consumption of complex
system-on-chip as technology scales to deep submicrometer. With high clock
frequency crossbar switch network design, global wire delay and pipeline
registers inserted for throughput are important because they affect area,
frequency, and power consumption of the chip. The traffic congestion is also a
very important factor because it leads to the lowering of throughput of the
crossbar switch network. In this paper, we present a topology synthesis method
for the low-power cascaded crossbar switch network satisfying the given
bandwidth, latency, frequency, and area constraints. Unlike previous papers,
our paper considers wire delay, traffic congestion, and pipeline register
insertion at the same time. Experimental results show that the topologies
optimized for power consumption for a given clock frequency consume less power
than existing methods by up to 38.04%.


Chen, F.-W.; Liu, Y.-Y., "Performance-Driven Dual-Rail Routing Architecture for Structured ASIC Design Style"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621038&isn...

Abstract: In recent years, structured application-specific integrated circuit
(ASIC) design style has lessened the importance of mask cost. Multiple
structured ASIC chip designs share the same pre-fabricated device and wire
masks. Nevertheless, the interconnection delay in a pre-fabricated wire slows
down circuit performance as a result of high capacitive load. We propose a
dual-rail routing architecture that reduces wire delay by 10% to 15% compared
to the original routing architecture. Furthermore, we propose a dual-rail
insertion algorithm to reduce routing area overhead. The experimental results
demonstrate that our dual-rail technique reduces wire delay by 9.8% with 4.8%
routing area overhead and improves overall circuit performance by 7.0%.


Chun, S.; Orailoglu, A., "DiSC: A New Diagnosis Method for Multiple Scan Chain Failures"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621047&isn...

Abstract: In scan-based testing environments, identifying the scan chain
failures can be of significant help in guiding the failure analysis process for
yield improvement. In this paper, we propose an efficient scan chain diagnosis
method using a symbolic fault simulation to achieve high diagnostic resolution
and small candidate list for single and multiple defects in scan chains. The
main ideas of the proposed scan chain diagnosis method are twofold: 1) the
reduction of the candidate scan cells through the analysis of the symbolic
simulation responses, and 2) the identification of final candidate scan cells
using the backward tracing method with the symbolic simulation responses.
Experimental results show the effectiveness.