TCAD Newsletter - September 2010 Issue
Placing you one click away from the best new CAD research!


Regular Papers

EMBEDDED SYSTEMS

Chou, C.L.; Marculescu, R., "Designing Heterogeneous Embedded Network-on-Chip Platforms With Users in Mind"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552223&isn...

Abstract: In this paper, we propose a user-centric design methodology targeting
heterogeneous embedded systems-on-chip where communication happens via the
network-on-chip approach. More precisely, in this new design methodology, we
consider explicitly the information about the user experience and apply machine
learning techniques to develop a design flow which aims at minimizing the
workload variance; this allows the system to better adapt to different types of
user needs and workload variations. Our experimental results show that by
considering the user experience into the design space exploration step, the
system platforms generated by our approach achieve more than 30% energy
savings, on average, compared to the single platform derived from the
traditional design flow; this implies that each system configuration we
generate is highly suitable for the targeted class of user and workload
behaviors.

EMERGING TECHNOLOGIES

Lin, C. C.Y.; Chang, Y.W., "ILP-Based Pin-Count Aware Design Methodology for
Microfluidic Biochips"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552222&isn...

Abstract: Digital microfluidic biochips have emerged as a popular alternative
for laboratory experiments. To make the biochip feasible for practical
applications, pin-count reduction is a key problem to higher-level integration
of reactions on a biochip. Most previous works approach the problem by
post-processing the placement and routing solutions to share compatible control
signals; however, the quality of such sharing algorithms is inevitably limited
by the placement and routing solutions. We present in this paper a
comprehensive pin-constrained biochip design flow that addresses the pin-count
issue at all design stages. The proposed flow consists of three major stages:
1) pin-count aware stage assignment that partitions the reactions in the given
bioassay into execution stages; 2) pin-count aware device assignment that
determines a specific device used for each reaction; and 3) guided placement,
routing, and pin assignment that utilize the pin-count saving properties from
the stage and device assignments to optimize the assay time and pin-count. For
both the stage and device assignments, basic integer linear programming
formulations and effective solution-space reduction schemes are proposed to
minimize the assay time and pin-count. Experimental results show the efficiency
of our methods and a 55-57% pin-count reduction over the state-of-the-art
algorithms/flow.

MODELING AND SIMULATION

Bayrakci, A. A.; Demir, A.; Tasiran, S., "Fast Monte Carlo Estimation of Timing
Yield With Importance Sampling and Transistor-Level Circuit Simulation"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552215&isn...

Abstract: Considerable effort has been expended in the electronic design
automation community in trying to cope with the statistical timing problem.
Most of this effort has been aimed at generalizing the static timing analyzers
to the statistical case. On the other hand, detailed transistor-level
simulations of the critical paths in a circuit are usually performed at the
final stage of performance verification. We describe a transistor-level Monte
Carlo (MC) technique which makes final transistor-level timing verification
practically feasible. The MC method is used as a golden reference in assessing
the accuracy of other timing yield estimation techniques. However, it is
generally believed that it can not be used in practice as it requires too many
costly transistor-level simulations. We present a novel approach to
constructing an improved MC estimator for timing yield which provides the same
accuracy as standard MC but at a cost of much fewer transistor-level
simulations. This improved estimator is based on a unique combination of a
variance reduction technique, importance sampling, and a cheap but approximate
gate delay model. The results we present demonstrate that our improved yield
estimator achieves the same accuracy as standard MC at a cost reduction
reaching several orders of magnitude.


Ye, X.; Li, P.; Zhao, M.; Panda, R.; Hu, J., "Scalable Analysis of Mesh-Based
Clock Distribution Networks Using Application-Specific Reduced Order Modeling"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552177&isn...

Abstract: Clock meshes possess inherent low clock skews and excellent immunity
to process-voltage-temperature variations, and have increasingly found their
way to high-performance integrated circuit designs. However, analysis of such
massively coupled networks is significantly hindered by the sheer size of the
network and tight coupling between non-tree interconnects and large numbers of
clock drivers. While the SPICE simulation of large clock meshes is often
intractable, standard interconnect model order reduction algorithms also fail
due to the large number of input/output ports introduced by clock drivers. The
presented approach is motivated by the key observation of the steady-state
operation of the clock networks while its efficiency is facilitated by
exploring new clock-mesh specific harmonic-weighted model order reduction
algorithm and locality analysis via port sliding. The scalability of the
analysis is significantly improved by eliminating the need for computing
infeasible multi-port passive reduced order interconnect models with large port
count and decomposing the overall task into very tractable and naturally
parallelizable model generation and fast Fourier transform/inverse-fast Fourier
transform operations, all on a per driver or per sink basis. We demonstrate the
application of our approach by feasibly analyzing large clock meshes with
excellent accuracy.


Reis, T.; Stykel, T., "PABTEC: Passivity-Preserving Balanced Truncation for
Electrical Circuits"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552178&isn...

Abstract: We present a passivity-preserving balanced truncation model reduction
method for differential-algebraic equations arising in circuit simulation. This
method is based on balancing the solutions of projected Lure equations. By
making use of the special structure of circuit equations, we can reduce the
numerical effort for balanced truncation significantly. It is shown that the
property of reciprocity is also preserved in the reduced-order model. Network
topological interpretations of certain circuit effects are given. The presented
model reduction method is illustrated by numerical examples. 

Meraji, S.; Zhang, W.; Tropper, C., "On the Scalability and Dynamic
Load-Balancing of Optimistic Gate Level Simulation"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552191&isn...

Abstract: As proscribed by Moore's law, the size of integrated circuits has
grown geometrically, resulting in simulation becoming the major bottleneck in
the circuit design process. Parallel simulation provides us with a way to cope
with this growth. In this paper, we describe an optimistic (time warp) parallel
discrete event simulator which can simulate all synthesizeable Verilog
circuits. We investigate its scalability and describe a machine learning based
dynamic load balancing algorithm for use with the simulator. We initially
developed two dynamic load balancing algorithms to balance the load and the
communication, respectively, during the course of a simulation. Making use of
reinforcement learning (RL), we then created an algorithm which is an amalgam
of these two algorithms. To the best of our knowledge, this is the first time
that RL has been used for the dynamic load-balancing of time warp. We
investigated the scalability and the effectiveness of the dynamic load
balancing algorithms on gate level simulations of several realistic very large
scale integration (VLSI) circuits. Our experimental results showed that our
simulator is indeed scalable. They also reveled a 88.6% improvement in the
simulation time through the use of our RL algorithm.

SYSTEM-LEVEL DESIGN

Kang, K.; Kim, J.; Yoo, S.; Kyung, C.M., "Temperature-Aware Integrated DVFS
and Power Gating for Executing Tasks With Runtime Distribution"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552187&isn...

Abstract: At high-operating temperature, chip cooling is crucial due to the
exponential temperature dependence of leakage current. However, traditional
cooling methods, e.g., power/clock gating applied when a temperature threshold
is reached, often cause excessive performance degradation. In this paper, we
propose a method for delivering lower energy consumption by integrating the
cooling and running in a temperature-aware manner without incurring performance
penalty. In order to further reduce the energy consumption, we exploited the
runtime distribution of each sub-segment of a task called "bin" in an
analytical manner such that time budget for cooling in each bin is allocated in
proportion to the probability of the occurrence of the bin. We apply the
proposed method to two realistic software programs, H.264 decoder and ray
tracing and a benchmark program, equake. The experimental results show that the
proposed method yields additional 19.4%-27.2% reduction in energy consumption
compared with existing methods.


Jung, H.; Pedram, M., "Supervised Learning Based Power Management for Multicore Processors"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552184&isn...

Abstract: This paper presents a supervised learning based power management
framework for a multi-processor system, where a power manager (PM) learns to
predict the system performance state from some readily available input features
(such as the occupancy state of a global service queue) and then uses this
predicted state to look up the optimal power management action (e.g.,
voltage-frequency setting) from a precomputed policy table. The motivation for
utilizing supervised learning in the form of a Bayesian classifier is to reduce
the overhead of the PM which has to repetitively determine and assign
voltage-frequency settings for each processor core in the system. Experimental
results demonstrate that the proposed supervised learning based power
management technique ensures system-wide energy savings under rapidly and
widely varying workloads.

TEST

Khursheed, S.; Al-Hashimi, B. M.; Chakrabarty, K.; Harrod, P.,
"Gate-Sizing-Based Single  V_{dd}  Test for Bridge Defects in Multivoltage
Designs"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552193&isn...

Abstract: The use of multiple voltage settings for dynamic power management is
an effective design technique. Recent research has shown that testing for
resistive bridging faults in such designs requires more than one voltage
setting for 100% fault coverage; however, switching between several supply
voltage settings has a detrimental impact on the overall cost of test. This
paper proposes an effective gate sizing technique for reducing test cost of
multi-V_{rm dd} designs with bridge defects. Using synthesized ISCAS and ITC
benchmarks and a parametric fault model, experimental results show that for all
the circuits, the proposed technique achieves single V_{rm dd} test, without
affecting the fault coverage of the original test. In addition, the proposed
technique performs better in terms of timing, area, and power than the recently
proposed test point insertion technique. This is the first reported work that
achieves single V_{rm dd} test for resistive bridge defects, without
compromising fault coverage in multi-V_{rm dd} designs.

VERIFICATION

Alizadeh, B.; Fujita, M., "Modular Datapath Optimization and Verification Based on Modular-HED"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552210&isn...

Abstract: This paper proposes an automatic design flow of datapath-dominated
applications which is able to deal with optimization and equivalence checking
of multi-output polynomials over Z_{2^{n}}. This paper also gives four main
contributions: 1) proposing a complete design flow for modular equivalence
checking, high level synthesis, and optimization; 2) considering hidden
monomials to factorize those polynomials which do not have any common
monomials; 3) combining and improving our previous optimization heuristics to
eliminate multi-operand common sub-expressions as much as possible; and 4)
implementing all algorithms on top of the modular Horner expansion diagram
package. Experimental results have shown an average saving of 9.9% and 4.5% in
the number of gates and critical path delay, respectively, after applying
modular reduction over Z_{2^{n}} , while other optimization techniques are not
used. Besides, after applying our optimization techniques, experimental
comparisons with the state-of-the-art techniques show an average improvement in
the area by 19.5% with an average delay decrease of 16.7%. Regarding the
comparison with our previous papers, the area and delay are improved by 13.3%
and 15.5%, respectively.


Morin-Allory, K.; Boule, M.; Borrione, D.; Zilic, Z., "Validating Assertion
Language Rewrite Rules and Semantics With Automated Theorem Provers"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552218&isn...

Abstract: Modern assertion languages such as property specification language
(PSL) and SystemVerilog assertions include many language constructs. By far,
the most economical way to process the full languages in automated tools is to
rewrite the majority of operators to a small set of base cases, which are then
processed in an efficient way. Since recent rewrite attempts in the literature
have shown that the rules could be quite involved, sometimes counterintuitive,
and that they can make a significant difference in the complexity of
interpreting assertions, ensuring that the rewrite rules are correct is a major
contribution toward ensuring that the tools are correct, and even that the
semantics of the assertion languages are well founded. This paper outlines the
methodology for computer-assisted proofs of several publicly known rewrite
rules for PSL properties. We first present the ways to express the PSL syntax
and semantics in the prototype verification system (PVS) theorem prover, and
then prove or disprove the correctness of over 50 rewrite rules published
without proofs in various sources in the literature. In doing so, we also
demonstrate how to circumvent known issues with PSL semantics regarding the
"ssr never" and "ssr eventually" operators, and offer our proposals on
assertion language semantics.


Short Papers
============

Pomeranz, I.; Reddy, S. M., "Hazard-Based Detection Conditions for Improved
Transition Path Delay Fault Coverage"

URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552192&isn...

Abstract: Transition path delay faults were defined to capture the behavior of
both small and large delay defects in a single fault model. The number of
detectable transition path delay faults as defined earlier is the same or close
to the number of conventional path delay faults that are detectable under the
strong non-robust propagation conditions. When the weak non-robust propagation
conditions are used, the number of detectable conventional path delay faults is
significantly higher. Using what are called the hazard-based detection
conditions for transition faults, we define detection conditions for transition
path delay faults, under which the number of detectable faults is the same or
close to the number of conventional path delay faults that are detectable under
the weak non-robust propagation conditions. The fault model still captures the
behavior of both small and large delay defects, but the number of detectable
faults is significantly higher.