TCAD Newsletter - November 2010 Issue
Placing you one click away from the best new CAD research!


ANNOUNCEMENT: Call for Papers for an upcoming TCAD Special Section:
PAR-CAD: Parallel CAD Algorithms and CAD for Parallel Architectures/Systems 
Submission deadline: Jan 30, 2011


Regular Papers


ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Li, X., "Finding Deterministic Solution From Underdetermined Equation: Large-Scale Performance Variability Modeling of Analog/RF Circuits"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605327&isn...

Abstract: The aggressive scaling of integrated circuit technology results in
high-dimensional, strongly-nonlinear performance variability that cannot be
efficiently captured by traditional modeling techniques. In this paper, we
adapt a novel L0-norm regularization method to address this modeling challenge.
Our goal is to solve a large number of (e.g., 10^4-10^6) model coefficients
from a small set of (e.g., 10^2-10^3) sampling points without over-fitting.
This is facilitated by exploiting the underlying sparsity of model
coefficients. Namely, although numerous basis functions are needed to span the
high-dimensional, strongly-nonlinear variation space, only a few of them play
an important role for a given performance of interest. An efficient orthogonal
matching pursuit (OMP) algorithm is applied to automatically select these
important basis functions based on a limited number of simulation samples.
Several circuit examples designed in a commercial 65 nm process demonstrate
that OMP achieves up to 25times, speedup compared to the traditional
least-squares fitting method.


Ballicchia, M.; Orcioni, S., "Design and Modeling of Optimum Quality Spiral Inductors With Regularization and Debye Approximation"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605328&isn...

Abstract: This paper proposes a methodology, based on the regularization
theory, to find the planar spiral inductors with optimum quality factor in the
whole space of geometric parameters. Moreover a new technique, based on the
Debye approximation, has been applied to extract an equivalent circuit model of
these inductors. This model provides an accurate wideband frequency
characterization up to 30 GHz.

EMERGING TECHNOLOGIES

Huang, T.-W.; Lin, C.-H.; Ho, T.-Y., "A Contamination Aware Droplet Routing Algorithm for the Synthesis of Digital Microfluidic Biochips"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605305&isn...

Abstract: Recent advances of digital microfluidic biochips (DMFBs) have
revolutionized the traditional laboratory procedures. By providing the
droplet-based system, DMFB can perform real-time biological analysis and
safety-critical biomedical applications. However, different droplets being
transported and manipulated on the DMFB may introduce the contamination problem
caused by liquid residue between different biomolecules. To overcome this
problem, a wash droplet is introduced to clean the contaminations on the
surface of the microfluidic array. However, current scheduling of wash droplet
does not restrict the extra used cells and execution time of bioassay, thereby
degrading the reliability and fault-tolerance significantly. In this paper, we
propose a contamination aware droplet routing algorithm for DMFBs. To reduce
the routing complexity and the used cells, we first construct preferred routing
tracks by analyzing the global moving vector of droplets to guide the droplet
routing. To cope with contaminations within one subproblem, we first apply a
k-shortest path routing technique to minimize the contaminated spots. Then, to
take advantage of multiple wash droplets, we adopt a minimum cost circulation
(MCC) algorithm for optimal wash-droplet routing to simultaneously minimize
used cells and the cleaning time. Since the droplet routing problem consists of
several subproblems, a look-ahead prediction technique is further used to
determine the contaminations between successive subproblems. After that, we can
simultaneously clean both contaminations within one subproblem and those
between successive subproblems by using the MCC-based algorithm to reduce the
execution time and the used cells significantly. Based on four widely used
bioassays, our algorithm reduces the used cells and the execution time
significantly compared with the state-of-the-art algorithm.


Roy, S.; Bhattacharya, B. B.; Chakrabarty, K., "Optimization of Dilution and Mixing of Biochemical Samples Using Digital Microfluidic Biochips"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605330&isn...

Abstract: The recent emergence of lab-on-a-chip (LoC) technology has led to a
paradigm shift in many healthcare-related application areas, e.g.,
point-of-care clinical diagnostics, high-throughput sequencing, and proteomics.
A promising category of LoCs is digital microfluidic (DMF)-based biochips, in
which nanoliter-volume fluid droplets are manipulated on a 2-D electrode array.
A key challenge in designing such chips and mapping lab-bench protocols to a
LoC is to carry out the dilution process of biochemical samples efficiently. As
an optimization and automation technique, we present a dilution/mixing
algorithm that significantly reduces the production of waste droplets. This
algorithm takes O(n) time to compute at most n sequential mix/split operations
required to achieve any given target concentration with an error in
concentration factor less than 1/2^n. To implement the algorithm, we design an
architectural layout of a DMF-based LoC consisting of two O(n)-size rotary
mixers and O(n) storage electrodes. Simulation results show that the proposed
technique always yields nonnegative savings in the number of waste droplets and
also in the total number of input droplets compared to earlier methods.

FPGAS AND RECONFIGURABLE COMPUTING

Chen, D.; Cong, J.; Dong, C.; He, L.; Li, F.; Peng, C.-C., "Technology Mapping and Clustering for FPGA Architectures With Dual Supply Voltages"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605304&isn...

Abstract: This paper presents a technology mapping algorithm for
field-programmable gate array architectures with dual supply voltages (Vdds)
for power optimization. This is done with the guarantee that the mapping depth
of the circuit will not increase compared to the circuit with a single Vdd.
This paper also presents an enhanced clustering algorithm that considers dual
supply voltages, honoring the dual-Vdd mapping solution. To carry out various
comparisons, we first design a single-Vdd mapping algorithm, named SVmap-2,
which achieves a 3.8% total power reduction (15.6% dynamic power reduction)
over a previously published low-power mapping algorithm, Emap . We then show
that our dual-Vdd mapping algorithm, named DVmap-2, can further improve total
power savings by 12.8% over SVmap-2, with a 52.7% dynamic power reduction.
Compared to the early single-Vdd version SVmap, DVmap-2 is 14.3% better for
total power reduction. This is achieved through an ideal selection of the
low-Vdd/high-Vdd ratio and the consideration of various voltage changing
scenarios during the mapping process.


Czajkowski, T. S.; Brown, S. D., "Decomposition-Based Vectorless Toggle Rate Computation for FPGA Circuits"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605306&isn...

Abstract: This paper presents a novel and accurate method of estimating the
toggle rates of signals in field-programmable gate array (FPGA)-based logic
circuits without the use of simulation vectors. Compared to previous vectorless
techniques, our approach provides improved accuracy-of-results, especially for
individual signals, which could be leveraged by computer-aided design (CAD)
tools for performing power optimization of logic circuits. Increased accuracy
is achieved by using stochastic methods that estimate the transition densities
at FPGA logic elements while accounting for both spatial and temporal
correlation of logic signals. Spatial correlation is calculated by leveraging a
unique XOR-based decomposition technique that provides both accurate results
and fast computation times. We also consider the delay information of
implemented circuits, providing for a comprehensive treatment of glitches,
including the effects of inertial limits on power dissipation. Our toggle-rate
estimation approach has been tested on a commonly used set of Microelectronic
Center of North Carolina circuits, as well as a set of industrial circuits
targeted to Altera Stratix II FPGAs. Results show that our techniques provide a
three times lower percent error, while maintaining a low processing time, when
compared to two existing techniques: the vectorless estimation tool shipped
with the commercial Quartus II 8.0 CAD tool, and the ACE v2.0 academic tool
produced from the University of British Columbia, Vancouver, BC, Canada.

HIGH-LEVEL SYNTHESIS

Andriamisaina, C.; Coussy, P.; Casseau, E.; Chavet, C., "High-Level Synthesis for Designing Multimode Architectures"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605303&isn...

Abstract: This paper addresses the design of multimode architectures for
digital signal and image processing applications. We present a dedicated design
flow and its associated high-level synthesis tool, named GAUT. Given a unified
description of a set of time-wise mutually exclusive tasks and their associated
throughput constraints, a single register transfer level hardware architecture
optimized in area is generated. In order to reduce the register, the steering
logic, and the controller complexities, this paper proposes a joint-scheduling
algorithm, which maximizes the similarities between the control steps and
specific binding approaches for both operators and storage elements which
maximize the similarities between the datapaths. It is shown through a set of
test cases that the proposed approach offers significant area saving and
low-performance penalties compared to both state-of-the-art techniques and
dedicated mono-mode architectures.

MODELING AND SIMULATION

Cong, J.; Gupta, P.; Lee, J., "Evaluating Statistical Power Optimization"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605324&isn...

Abstract: In response to the increasing variations in integrated-circuit
manufacturing, the current trend is to create designs that take these
variations into account statistically. In this paper, we quantify the
difference between the statistical and deterministic optima of leakage power
while making no assumptions about the delay model. We develop a framework for
deriving a theoretical upper bound on the suboptimality that is incurred by
using the deterministic optimum as an approximation for the statistical
optimum. We show that for the mean power measure, the deterministic optima is
an excellent approximation, and for the mean plus standard deviation measures,
the optimality gap increases as the amount of inter-die variation grows, for a
suite of benchmark circuits in a 45 nm technology. For large variations, we
show that there are excellent linear approximations that can be used to
approximate the effects of variation. Therefore, the need to develop special
statistical power optimization algorithms is questionable.


Singhee, A.; Rutenbar, R. A., "Why Quasi-Monte Carlo is Better Than Monte Carlo or Latin Hypercube Sampling for Statistical Circuit Analysis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605333&isn...

Abstract: At the nanoscale, no circuit parameters are truly deterministic; most
quantities of practical interest present themselves as probability
distributions. Thus, Monte Carlo techniques comprise the strategy of choice for
statistical circuit analysis. There are many challenges in applying these
techniques efficiently: circuit size, nonlinearity, simulation time, and
required accuracy often conspire to make Monte Carlo analysis expensive and
slow. Are we - the integrated circuit community - alone in facing such
problems? As it turns out, the answer is "no". Problems in computational
finance share many of these characteristics: high dimensionality, profound
nonlinearity, stringent accuracy requirements, and expensive sample evaluation.
We perform a detailed experimental study of how one celebrated technique from
that domain - quasi-Monte Carlo (QMC) simulation - can be adapted effectively
for fast statistical circuit analysis. In contrast to traditional pseudorandom
Monte Carlo sampling, QMC uses a (shorter) sequence of deterministically chosen
sample points. We perform rigorous comparisons with both Monte Carlo and Latin
hypercube sampling across a set of digital and analog circuits, in 90 and 45 nm
technologies, varying in size from 30 to 400 devices. We consistently see
superior performance from QMC, giving 2 times to 8 times speedup over
conventional Monte Carlo for roughly 1% accuracy levels. We present rigorous
theoretical arguments that support and explain this superior performance of
QMC. The arguments also reveal insights regarding the (low) latent
dimensionality of these circuit problems; for example, we observe that over
half of the variance in our test circuits is from unidimensional behavior. This
analysis provides quantitative support for recent enthusiasm in dimensionality
reduction of circuit problems.

SYSTEM-LEVEL DESIGN

Javaid, H.; Ignjatovic, A.; Parameswaran, S., "Rapid Design Space Exploration of Application Specific Heterogeneous Pipelined Multiprocessor Systems"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605323&isn...

Abstract: This paper describes a rapid design methodology to create a pipeline
of processors to execute streaming applications. The methodology seeks a system
with the smallest area while its runtime is within a specified runtime
constraint. Initially, a heuristic is used to rapidly explore a large number of
processor configurations to find the near Pareto front of the design space, and
then an exact integer linear programming (ILP) formulation (EIF) is used to
find an optimal solution. A reduced ILP formulation (RIF) or the heuristic is
used if the EIF does not find an optimal solution in a given time window. This
design methodology was integrated into a commercial design flow and was
evaluated on four benchmarks with design spaces containing up to 10^16 design
points. For each benchmark, the near Pareto front was found in less than 3 h
using the heuristic, while EIF took up to 16 h. The results show that the
average area error of the heuristic and RIF was within 2.25% and 1.25% of the
optimal design points for all the benchmarks, respectively. The heuristic is
faster than RIF, while both the heuristic and RIF are significantly faster than
EIF.

VERIFICATION

Keng, B.; Safarpour, S.; Veneris, A., "Bounded Model Debugging"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605322&isn...

Abstract: Design debugging is a major bottleneck in modern very large scale
integration design flows as both the design size and the length of the error
trace contribute to its inherent complexity. With typical design blocks
exceeding half a million synthesized logic gates and error traces in the
thousands of clock cycles, the complexity of the debugging problem poses a
great challenge to automated debugging techniques. This paper aims to address
this daunting challenge by introducing the bounded model debugging methodology
that iteratively analyzes bounded sequences of the error trace. Two techniques
are introduced in this methodology to solve this growing problem. The first
technique iteratively analyzes bounded subsequences of the error trace of
increasing size until the error is found or the entire trace is analyzed. The
second technique partitions the error trace into non-overlapping bounded
sequences of clock cycles which are each separately analyzed. A discussion of
these two techniques is presented and a unified methodology that leverages the
strengths of both techniques is developed. Empirical results on real industrial
designs show that for large designs and long error traces the proposed
methodology can find the actual error in 79% of cases with the first technique
and 100% of cases with the second technique. In cases where the methodology is
not used, only 21% of cases are able to find the actual error. These numbers
confirm the benefits of the proposed methodology to allow conventional
automated debuggers to handle much larger real-life circuits.


Chen, Y.; Safarpour, S.; Marques-Silva, J.; Veneris, A., "Automated Design Debugging With Maximum Satisfiability"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605301&isn...

Abstract: As contemporary very large scale integration designs grow in
complexity, design debugging has rapidly established itself as one of the
largest bottlenecks in the design cycle today. Automated debug solutions such
as those based on Boolean satisfiability (SAT) enable engineers to reduce the
debug effort by localizing possible error sources in the design. Unfortunately,
adaptation of these techniques to industrial designs is still limited by the
performance and capacity of the underlying engines. This paper presents a novel
formulation of the debugging problem using MaxSAT to improve the performance
and applicability of automated debuggers. Our technique not only identifies
errors in the design but also indicates when the bug is excited in the error
trace. MaxSAT allows for a simpler formulation of the debugging problem,
reducing the problem size by 80% compared to a conventional SAT-based
technique. Empirical results demonstrate the effectiveness of the proposed
formulation as run-time improvements of 4.5 times are observed on average. This
paper introduces two performance improvements to further reduce the time
required to find all error sources within the design by an order of magnitude.


Short Papers
============

Lucas, G.; Dong, C.; Chen, D., "Variation-Aware Placement With Multi-Cycle Statistical Timing Analysis for FPGAs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605329&isn...

Abstract: Deep submicron processes have allowed field-programmable gate arrays
(FPGAs) to grow in complexity and speed. However, such technology scaling has
caused FPGAs to become more susceptible to the effects of process variation. In
order to obtain sufficient yield values, it is now necessary to consider
process variation during physical design. It is common for FPGAs to contain
designs with multi-cycle paths to help increase the performance, but current
statistical static timing analysis (SSTA) techniques cannot support this type
of timing constraint. In this paper, we propose an extension to block-based
SSTA to consider multi-cycle paths. We then use this new SSTA to optimize FPGA
placement with our tool VMC-Place for designs with multi-cycle paths.
Experimental results show our multi-cycle SSTA is accurate to 0.59% for the
mean and 0.0024% for the standard deviation. Our results also show that
VMC-Place is able to reduce the 95% performance yield clock period by 15.36% as
compared to VPR.


Kim, J.; Vandenberghe, L.; Yang, C.-K. K., "Convex Piecewise-Linear Modeling Method for Circuit Optimization via Geometric Programming"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605325&isn...

Abstract: This paper presents a new method for fitting a convex
piecewise-linear function to a given set of data, which can serve as an
empirical modeling framework for circuit optimization via geometric
programming. The method iteratively solves a series of linear optimization
problems to minimize the fitting error. To reduce the fitting error in each
iteration, an extra plane is added in the region where the largest error
occurs. For verification, we apply the method to create transistor-level models
in 90-nm complementary metal-oxide-semiconductor technology. Numerical results
indicate that the proposed method can generate process-dependent
transistor-level models with reasonable modeling accuracy.


Chen, Y.-C.; Wang, C.-Y., "Fast Node Merging With Don't Cares Using Logic Implications"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605316&isn...

Abstract: Node merging is a popular and effective logic restructuring technique
that has recently been applied to minimize logic circuits. However, in the
previous satisfiability (SAT)-based methods, the search for node mergers
required trial-and-error validity checking of a potentially large set of
candidate mergers. Here, we propose a new method, which directly identifies
node mergers using logic implications without any SAT solving calls. Although
the efficiency benefits of the method come at the expense of quality, we
further engage the redundancy removal and the wire replacement techniques to
enhance its quality. The experimental results show that the proposed
optimization method achieves approximately 46 times the speedup while
possessing a competitive capability of circuit minimization compared to the
state-of-the-art method.


Pomeranz, I.; Reddy, S. M., "On Undetectable Faults and Fault Diagnosis"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605331&isn...

Abstract: The presence of an undetectable fault u_i may modify the response of
a detectable fault d_j to a test set used for fault diagnosis. This may impact
the accuracy of fault diagnosis based on the responses of single faults. Many
state-of-the-art diagnosis processes are based on the responses of single
stuck-at faults even though their goal is to diagnose defects (including
multiple defects) that are different from stuck-at faults. Therefore, we study
the effects of undetectable single stuck- at faults on the accuracy of fault
diagnosis based on the responses of single stuck-at faults. For this purpose,
we consider the cases where the response of a double stuck-at fault u_i&d_j,
which consists of an undetectable fault u_i and a detectable fault d_j, is
different from the response of the single fault d_j. We show that there are
significant, yet manageable, numbers of such faults in benchmark circuits under
test sets used for fault diagnosis. In all these cases, a fault diagnosis
process based on single stuck-at faults may not identify the locations of d_j
and u_i as candidate defect sites if a defect affects the sites of d_j and u_i.
We conclude that it is important to consider u_i&d_j during fault diagnosis in
order not to preclude the sites of d_j and u_i as candidate defect sites.


Chen, P.-L.; Huang, Y.-C.; Chang, T.-Y., "Fast Test Integration: Toward Plug-and-Play At-Speed Testing of Multiple Clock Domains Based on IEEE Standard
1500"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605317&isn...

Abstract: The rapid advance of semiconductor technology exposes multifrequency
designs to severe reliability loss due to incomplete at-speed testing, which is
induced by ignorance of timing-related defects between clocks. However, the
reduced testability caused by core-based design strategy also aggravates the
difficulty in applying on-chip at-speed testing. Although previous works were
able to successfully increase the quality of the at-speed testing, the
diversity of on-chip clock control schemes from different components may
complicate the test integration, increasing the test costs. Therefore, to
accelerate the time-to-market and the time-to-volume, the development of a
plug-and-play at-speed testing based on a well-defined test interface has
become increasingly urgent. In this paper, a fast test integration approach for
multi-clock-domain at-speed testing based on IEEE Standard 1500 is proposed.
The proposed framework has been successfully integrated into an IEEE
1500-wrapped ultrawide-band design and a simple SoC design. Experiment results
also confirm the feasibility of the proposed approach.


Li, J.-F.; Huang, Y.-J.; Hu, Y.-J., "Testing Random Defect and Process Variation Induced Comparison Faults of TCAMs With Asymmetric Cells"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605326&isn...

Abstract: This paper presents a march-like test T_AC-P to cover comparison
faults of ternary content addressable memories (TCAMs) with asymmetric cells.
The T_AC-P only requires 4N Write operations and (3N+2B) Compare operations for
an N x B-bit TCAM with Hit and priority address encoder outputs. We show that
the test also can cover search time failures induced by process variation in
the comparison circuit of a TCAM. Furthermore, a test T_MF for match failures
induced by the process variation in the comparison circuit of a TCAM is also
presented.