June 2012 Newsletter 

Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-06.txt 


Announcement for a sister publication

IEEE Design & Test Magazine is looking for a new Editor-In-Chief
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6198427 

REGULAR PAPERS

EMERGING TECHNOLOGIES

Zhao, Y.; Chakrabarty, K. 
Cross-Contamination Avoidance for Droplet Routing in Digital Microfluidic
Biochips 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200446

Recent advances in digital microfluidics have enabled droplet-based biochip
devices for DNA sequencing, immunoassays, clinical chemistry, and protein
crystallization. Since cross-contamination between droplets of different
biomolecules can lead to erroneous outcomes for bioassays, the avoidance of
cross-contamination during droplet routing is a key design challenge for
biochips. We propose a droplet-routing method that avoids cross-contamination
in the optimization of droplet flow paths. The proposed approach targets
disjoint droplet routes and synchronizes wash-droplet routing with functional
droplet routing, in order to reduce the duration of droplet routing while
avoiding the cross-contamination between different droplet routes. In order to
avoid cross-contamination between successive routing steps, an optimization
technique is used to minimize the number of wash operations that must be used
between successive routing steps. Two real-life biochemical applications are
used to evaluate the proposed droplet-routing methods.

LOGIC SYNTHESIS

Zhu, X.-Y.; Basten, T.; Geilen, M.; Stuijk, S. 
Efficient Retiming of Multirate DSP Algorithms 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200447

Multirate digital signal processing (DSP) algorithms are often modeled with
synchronous dataflow graphs (SDFGs). A lower iteration period implies a faster
execution of a DSP algorithm. Retiming is a simple but efficient graph
transformation technique for performance optimization, which can decrease the
iteration period without affecting functionality. In this paper, we deal with
two problems: feasible retiming-retiming a SDFG to meet a given iteration
period constraint, and optimal retiming-retiming a SDFG to achieve the smallest
iteration period. We present a novel algorithm for feasible retiming and based
on that one, a new algorithm for optimal retiming, and prove their correctness.
Both methods work directly on SDFGs, without explicitly converting them to
their equivalent homogeneous SDFGs. Experimental results show that our methods
give a significant improvement compared to the earlier methods.

MODELING AND SIMULATION

Lee, M.-S. M.; Liao, W.-T.; Liu, C.-N. J. 
Levelized High-Level Current Model of Logic Blocks for Dynamic Supply Noise Analysis 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200439

Since the problem of power integrity has become a critical issue that limits
design performance, obtaining the supply current waveforms at early design
stages is essential to achieve efficient reduction of supply noise.  Therefore,
a high-level current macro model is proposed by Bodapati and Najm for logic
blocks to provide fast current waveform estimation at register-transfer level
(RTL). However, due to the different arrival time of internal signals, modeling
the supply current of the entire logic block accurately as specific fixed
templates is difficult. This paper thus proposes a levelized high-level current
model for logic blocks. By merging gates with similar arrival time as a
super-gate and recording its current waveforms separately, obtaining more
accurate supply current waveforms is possible by using a unified model, even
for multipeak cases. This paper also proposes a frequency-domain waveform
transformation method to consider the effects of nonideal supply resistance on
the supply current waveform. As shown in the experimental results, the peak
error and waveform correlation of the proposed current model are significantly
improved compared to the results of the single-stage current model. Using
accurate supply current waveforms can also help obtain precise IR-drop
estimation in RTL simulations for early system evaluation.

Badami, K. M. H.; Karmalkar, S. 
Quasi-Static Compact Model for Coupling Between Aligned Contacts on Finite Substrates With 
Insulating or Conducting Backplanes 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200437

We derive a closed-form quasi-static model for the coupling impedance between
aligned coplanar rectangular contacts on bulk and epitaxial semiconductor
substrates by resolving the 3-D field lines into simpler components (vertical,
lateral, fringing, 2-D, etc.). Both insulating and conducting (grounded or
floating) backplane conditions are considered. Our model reflects all the
geometry and process parameters, and its constants are process independent and
universal. The model also gives the capacitive coupling via ambient, i.e., via
the region outside the substrate, and specifies conditions under which a given
thickness or lateral extension of the substrate can be regarded as infinite.
Comparisons with technology computer-aided design simulations and measurements
validate the model over a wide range of width/length and width/separation
ratios of the contacts.

PHYSICAL DESIGN

Qian, H.; Restle, P. J.; Kozhaya, J. N.; Gunion, C. L. 
Subtractive Router for Tree-Driven-Grid Clocks 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200441

A tree-driven clock grid has become the choice of clock delivery for most
microprocessors, due to its ability to achieve lower skew and lower variability
than clock trees, and is becoming the choice of clock delivery for certain
high-end application-specific integrated circuit designs. This paper reports on
a clock routing tool that was used in designing multiple tree-driven clock
grids in a 2.3 GHz processor system-on-chip, which achieved below 5 ps skew
within 500 $mu{rm m}$ Manhattan distance and below 10 ps skew across each clock
grid.  This clock routing tool employs a nonsequential algorithm comprised of
linear programming and combinatorial heuristics. Its robust length-matching
capability enables flexible buffer placement, improved clock signal quality,
and robustness to variations.

Lin, C.-W.; Lee, P.-W.; Chang, Y.-W.; Shen, C.-F.; Tseng, W.-C. 
An Efficient Pre-Assignment Routing Algorithm for Flip-Chip Designs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200434

The flip-chip package is introduced for modern integrated circuit (IC) designs
with higher integration density and larger I/O counts. In this paper, we
consider the pre-assignment flip-chip routing problem with predefined
connections between driver pads and bump pads. This problem has been shown to
be much more difficult than the free-assignment one, but is more popular in
real-world designs because the connections between driver pads and bump pads
are typically predetermined by IC or packaging designers. Based on the concept
of routing sequence exchange, we propose a very efficient approach to guide the
global routing by computing the longest common subsequence and the maximum
planar subset of chords for pre-assignment flip-chips. We observe that the
existing work over-constrains the capacity of a routing tile, which might miss
some critical solution space with a better routing solution (e.g., smaller
wirelength), and provide a remedy for this insufficiency to identify a better
solution in a more complete solution space. We also develop a constant-time
routability analyzer to check if a given set of wires can pass through a tile.
Experimental results show that our router can achieve a $125times$ speedup with
even better solution quality (same routability with slightly smaller
wirelength), compared with a state-of-the-art flip-chip router based on integer
linear programming.


SYSTEM-LEVEL DESIGN

Ren, P.; Lis, M.; Cho, M. H.; Shim, K. S.; Fletcher, C. W.; Khan, O.; Zheng, N.; Devadas, S. 
HORNET: A Cycle-Level Multicore Simulator 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200443

We present hornet, a parallel, highly configurable, cycle-level multicore
simulator based on an ingress-queued wormhole router network-on-chip (NoC)
architecture. The parallel simulation engine offers cycle-accurate as well as
periodic synchronization; while preserving functional accuracy, this permits
tradeoffs between perfect timing accuracy and high speed with very good
accuracy. When run on six separate physical cores on a single die, speedups can
exceed a factor of over 5, and when run on a two-die 12-core system with 2-way
hyperthreading, speedups exceed $12times$. Most hardware parameters are
configurable, including memory hierarchy, interconnect geometry, bandwidth,
crossbar dimensions, parameters driving power, and thermal effects. A highly
parametrized table-based NoC design allows a variety of routing and virtual
channel allocation algorithms out of the box, ranging from simple
dimension-ordered routing to complex Valiant, ROMM, O1Turn or PROM schemes,
BSOR, and adaptive routing. Hornet can run in network-only mode using synthetic
traffic or traces, or directly emulate a MIPS-based multicore. Hornet is freely
available under the open-source MIT license at
http://csg.csail.mit.edu/hornet/.

Cui, J.; Maskell, D. L. 
A Fast High-Level Event-Driven Thermal Estimator for Dynamic Thermal Aware Scheduling 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200435

Thermal aware scheduling (TAS) is an important system level optimization for
many-core systems. A fast event driven thermal estimation method, which
includes both the dynamic and leakage power models, for monitoring temperature
and guiding dynamic TAS (DTAS) is proposed in this paper. The fast event driven
thermal estimation is based upon a thermal map, with occasional thermal
sensor-based calibration, which is updated only when a high level event occurs.
To minimize the overhead, while maintaining the estimation accuracy, prebuilt
look-up-tables and predefined leakage calibration parameters are used to speed
up the thermal solution. Experimental results show our method is accurate,
producing thermal estimations of similar quality to an existing open-source
thermal simulator, while having a considerably reduced computational
complexity. Based on this predictive approach, we take full advantage of a
projected future thermal map to develop several heuristic policies for DTAS. We
show that our proposed predictive policies are significantly better, in terms
of minimizing average/peak temperature, reducing the dynamic thermal management
overhead and improving other real-time features, than existing DTAS schedulers,
making them highly suitable for heuristically guiding thermal aware task
allocation and scheduling.

TEST

Tam, W. C.; Blanton, R. D. 
SLIDER: Simulation of Layout-Injected Defects for Electrical Responses 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200444

Logic-level simulation has been the de facto method for simulating
defect/faulty behavior for various testing tasks since it offers a good
tradeoff between accuracy and speed. Unfortunately, by abstracting defect
behavior to the logic level (i.e., a fault model), it also discards important
information that inevitably results in inaccuracies. This paper describes a
fast and accurate defect simulation framework called SLIDER (simulation of
layout-injected defects for electrical responses). SLIDER uses well-developed
mixed-signal simulation technology that is conventionally used for design
verification. There are three innovative aspects that distinguish SLIDER from
prior work in this area: 1) accuracy resulting from defect injection taking
place at the layout level; 2) speedup resulting from careful and automatic
partitioning of the circuit into maximal digital and minimal analog domains for
mixed-signal simulation; and 3) complete automation that includes defect
generation, defect injection, design partitioning, netlist extraction,
mixed-signal simulation, and test-data extraction. The virtual failure data
created by SLIDER is useful in a variety of settings that include diagnosis
resolution improvement, defect localization, fault model evaluation, and
evaluation of yield/test learning techniques that are based on failure data
analysis.

Chen, T.-J.; Li, J.-F.; Tseng, T.-W. 
Cost-Efficient Built-In Redundancy Analysis With Optimal Repair Rate for RAMs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200438

Built-in self-repair (BISR) techniques are widely used for the repair of
embedded memories. One of the key components of a BISR circuit is the built-in
redundancy-analysis (BIRA) module, which allocates redundancies according to
the designed redundancy analysis algorithm. Thus, the BIRA module affects the
repair rate of the BISR circuit. Existing BIRA schemes for RAMs can provide the
optimal repair rate (the ratio of the number of repaired RAMs to the number of
defective RAMs), but they require either high area cost or multiple test runs.
This paper proposes a BIRA scheme for RAMs, which can provide the optimal
repair rate using very low area cost and single test run. Furthermore, the BIRA
is designed as reconfigurable such that it can be shared by multiple RAMs.
Experimental results show that the area cost for implementing the proposed BIRA
scheme is much lower than that of existing BIRA schemes with optimal repair
rate. A test chip is also implemented to demonstrate the proposed BIRA scheme.

Harutyunyan, G.; Shoukourian, S.; Vardanian, V.; Zorian, Y. 
A New Method for March Test Algorithm Generation and Its Application for Fault Detection in RAMs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200436

In this paper, all linked and unlinked static and two-operation dynamic faults
are considered. A classification for their description is introduced. To
generate a test algorithm for detection of all the considered faults, it was
shown that it is not an easy problem. For this purpose, a new
structure-oriented method is developed. Based on the proposed method, an
efficient test algorithm March LSD of complexity 75N is generated for the
detection of the considered linked static and dynamic faults.

Rabenalt, T.; Richter, M.; Poehl, F.; Goessel, M. 
Highly Efficient Test Response Compaction Using a Hierarchical X-Masking Technique 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200442

This paper presents a highly effective compactor architecture for processing
test responses with a high percentage of x-values. The key component is a
hierarchical configurable masking register, which allows the compactor to
dynamically adapt to and provide excellent performance over a wide range of
x-densities. A major contribution of this paper is a technique that enables the
efficient loading of the x-masking data into the masking logic in a parallel
fashion using the scan chains. A method for eliminating the requirement for
dedicated mask control signals using automated test equipment timing
flexibility is also presented. The proposed compactor is especially suited to
multisite testing. Experiments with industrial designs show that the proposed
compactor enables compaction ratios exceeding 200x.

Nassery, A.; Erol, O. E.; Ozev, S.; Verhelst, M. 
Test Signal Development and Analysis for OFDM Systems RF Front-End Parameter Extraction 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200440

Testing radio frequency (RF) transceivers requires the measurement of a diverse
set of specifications, requiring multiple testing setups. This complicates load
board design, debug, and diagnosis, as well as results in long testing time. In
this paper, we present a single setup testing solution for orthogonal
frequency-division multiplexing systems RF front-ends based on a loop-around
scheme. With this technique, it is possible to determine gain and phase
mismatch, inphase-quadrature time skew, and dc offset. Linear gain and IIP3
decouple the transmitter parameters from the receiver parameters. Although
loop-around has been used in many forms, the basic challenge is to determine
what input conditions will lead to accurate measurement and what form of
modeling will yield this accuracy. To this end, we develop test signal design
and multistep extraction techniques. Experimental results indicate that IIP3
can be extracted with 0.6 dB maximum error while phase mismatch and gain
mismatch can be extracted with 0.3$^{circ}$ and 0.6% maximum error. Our method
is able to de-embed the characteristics of transmitter from those of receiver
while it requires the analysis of only low-frequency digital baseband signals
(I and Q branches) and eliminates the need for RF testers.

Yilmaz, E.; Ozev, S. 
Test Application for Analog/RF Circuits With Low Computational Burden 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200445

In this paper, we propose an adaptive test strategy that tailors the test
sequence with respect to the properties of each individual instance of a
circuit. Reducing the test set by analyzing the dropout patterns during
characterization and eliminating the unnecessary tests has always been the
approach for high volume production in the analog domain. However, once
determined, the test set remains typically fixed for all devices. We propose to
exploit the statistical diversity of the manufactured devices and adaptively
eliminate tests that are determined to be unnecessary based on information
obtained on the circuit under test. Test time information is incorporated in
the method to yield short test time. The proposed methodology is
computationally efficient and imposes very little overhead on the tester. We
compare our results with other similar specification-based test reduction
techniques for a low noise amplifier (LNA) circuit and an analog industrial
circuit. Results show 85% test quality improvement for the same test time or
24% test time reduction for the same test quality for the LNA circuit.
Moreover, near zero defective parts per million is achieved for the industrial
circuit.