August 2011 Newsletter
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-08.txt 

TCAD NEWS

Impact factor data

The latest impact factor information is out: according to data from the Journal
Citation Reports, the two-year impact factor for the journal in 2010 is 1.252
(roughly level from 1.230 in 2009), the top rating among journals in the area
of design automation.

Changes to the editorial board

Professor Li-C. Wang from the University of California-Santa Barbara has joined
the editorial board as an Associate Editor in the area of test, replacing
Professor Hank Walker from Texas A&M University.  We thank Hank for his service
and warmly welcome Li.

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Habal, H. Graeb, H. Constraint-Based Layout-Driven Sizing of Analog Circuits 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956867

A flow is presented for the automatic synthesis of an analog circuit layout
based on a schematic and a list of circuit design parameter values. The flow is
driven by design, placement, and routing constraints?no layout template is
necessary. Every possible layout for each device in the circuit is
investigated; the layouts with the best geometric features and smallest
quantization error (due to manufacturing grid alignment) are kept. For circuit
placement, a complete enumeration of possible circuit placements, limited only
by usual constraints of symmetry, proximity, and common centroid, is performed.
Out of this enumeration a final circuit placement is selected and routed. The
new flow is integrated with a deterministic nonlinear optimization algorithm to
perform layout-driven circuit sizing; layouts are synthesized during both
gradient approximation and next step determination. Layout-driven circuit
sizing was applied to two example circuits. Sizing of the first circuit example
took $8times$ the amount of CPU time needed for traditional circuit sizing, but
remained feasible at 2.1 h of wall clock time on a contemporary workstation.

EMERGING TECHNOLOGIES

Zhang, J. Patil, N. P. Hazeghi, A. Wong, H.-S. P. Mitra, S.Characterization and
Design of Logic Circuits in the Presence of Carbon Nanotube Density Variations 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956874

Variations in the spatial density of carbon nanotubes (CNTs), resulting from
the lack of precise control over CNT positioning during chemical synthesis, is
a major hurdle to the scalability of carbon nanotube field effect transistor
(CNFET) circuits. Such CNT density variations can lead to non-functional CNFET
circuits. This paper presents a probabilistic framework for modeling the CNT
count distribution contained in a CNFET of given width, and establishes the
accuracy of the model using experimental data obtained from CNT growth. Using
this model, we estimate the impact of CNT density variations on the yield of
CNFET very large-scale integrated circuits. Our estimation results demonstrate
that CNT density variations can significantly degrade the yield of CNFETs, and
can be a major concern for scaled CNFET circuits. Finally, we analyze the
impact of CNT correlation (i.e., correlation of CNT count between CNFETs) that
exists in CNT growth, and demonstrate how the yield of a CNFET storage circuit
(primarily limited by its noise immunity) can be significantly improved by
taking advantage of such correlation.

LOGIC SYNTHESIS

Paik, S. Lee, S. Shin, Y.Retiming Pulsed-Latch Circuits With Regulating Pulse Width 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956870

A pulsed-latch is an ideal sequencing element for high-performance
application-specific integrated circuit designs due to its simple timing model
and reduced sequencing overhead. The possibility of time-borrowing while a
latch is transparent is deliberately ignored in pulsed-latch circuits to
simplify the timing model. However, using more than one pulse width allows
another form of time-borrowing, which preserves the simple timing model. The
associated problem of allocating pulse widths is called pulse width allocation
(PWA); and we combine it with retiming to achieve a shorter clock period in
pulsed-latch circuits than can be obtained by retiming or PWA alone, with less
requirement for extra latches than standard retiming. An exact solution can be
obtained by an integer linear programming, but this is restricted to small
circuits. We therefore introduce a practical heuristic, which performs clock
skew scheduling to find the minimum clock period and then brings the clock skew
as nearly back to zero as possible by converting it to combined retiming and
PWA. Experiments with 45 nm technology circuits suggest that the heuristic
algorithm achieves a clock period that is close to minimum in most circuits,
with an average 23% reduction compared to initial circuit, at an average cost
of a 16% increase in the number of latches. On the same benchmarks, standard
retiming achieves a 20% reduction in clock period with a 29% increase in the
number of latches. The cost in extra area and energy is reasonable.

Hu, W. Oberg, J. Irturk, A. Tiwari, M. Sherwood, T. Mu, D. Kastner, R.
Theoretical Fundamentals of Gate Level Information Flow Tracking 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5948366

Information flow tracking is an effective tool in computer security for
detecting unintended information flows. However, software based information
flow tracking implementations have drawbacks in preciseness and performance. As
a result, researchers have begun to explore tracking information flow in
hardware, and more specifically, understanding the interference of individual
bits of information through logical functions. Such gate level information flow
tracking (GLIFT) can track information flow in a system at the granularity of
individual bits. However, the theoretical basis for GLIFT, which is essential
to its adoption in real applications, has never been thoroughly studied. This
paper provides fundamental analysis of GLIFT by introducing definitions,
properties, and the imprecision problem with a commonly used shadow logic
generation method. This paper also presents a solution to this imprecision
problem and provides results that show this impreciseness can be tolerated for
the benefit of lower area and delay.

MODELING AND SIMULATION

Villena, J. F. Silveira, L. M. Multi-Dimensional Automatic Sampling Schemes for
Multi-Point Modeling Methodologies 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956873

This paper presents a methodology for optimizing sample point selection in the
context of model order reduction (MOR). The procedure iteratively selects
samples from a large candidate set in order to identify a projection subspace
that accurately captures system behavior. Samples are selected in an efficient
and automatic manner based on their relevance measured through an error
estimator. Projection vectors are computed only for the best samples according
to the given criteria, thus minimizing the number of expensive solves. The
scheme makes no prior assumptions on the system behavior, is general, and valid
for single and multiple dimensions, with applicability on linear and
parameterized MOR methodologies. The proposed approach is integrated into a
multi-point MOR algorithm, with automatic sample and order selection based on a
transfer function error estimation. Different implementations and improvements
are proposed, and a wide range of results on a variety of industrial examples
demonstrate the accuracy and robustness of the methodology.

PHYSICAL DESIGN

Ma, Q. Qian, Z. Young, E. F. Y. Zhou, H.MSV-Driven Floorplanning 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956869

Power consumption has become a crucial problem in modern circuit design.
Multiple supply voltage (MSV) design is introduced to provide higher
flexibility in controlling the power and performance tradeoff. One important
requirement of MSV design is that timing constraints of the circuit must be
satisfied after voltage assignment of the cells. In this article, we develop
two algorithms to solve the voltage assignment problem under timing
constraints, namely, min-cost flow (MCF) and value-oriented branch-and-bound
(VOBB). In the MCF algorithm, the voltage assignment problem is formulated as a
convex cost dual network flow problem, and can be solved optimally in
polynomial time under certain conditions by calling a MCF solver. The VOBB
algorithm, which is a VOBB-based searching method, solves the voltage
assignment problem optimally in general cases by employing the MCF algorithm
and a linear programming solver as subroutines. At last, we propose a
MSV-driven floorplanning framework that optimizes power consumption and
physical layout of a circuit simultaneously during the floorplanning stage, by
embedding the MCF algorithm into a simulated annealing-based floorplanner and
applying the VOBB algorithm as a postprocessing step. We compared our approach
with the latest works on this problem, and the experimental results show that,
using our approach, significant improvement on power saving can be achieved in
much less running time, which confirms the effectiveness and efficiency of our
method.

SYSTEM-LEVEL DESIGN

Kim, S. H. Mukhopadhyay, S. Wolf, M.Modeling and Analysis of Image Dependence
and Its Implications for Energy Savings in Error Tolerant Image Processing 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958188

We present an analysis of the relationship between input images and energy
consumption in error tolerant image processing. Under aggressive voltage
scaling, the output image quality of image processing depends on input images
for two reasons: 1) error tolerance among images is naturally disparate in
terms of perceptual image quality assessment, and 2) the error rate under
aggressive voltage scaling varies by input image types. Based on both effects,
the supply voltage can be optimized for a given quality requirement so as to
achieve ultralow power/energy dissipation. Our analysis demonstrates the
significance of the accurate delay estimation, which depends on not only
combinational inputs but also the previous state of the logic. We present a new
sequential model for accurate error estimation. Based on the model, our
experimental results demonstrate that different input image types lead to very
different output quality. We also present the effect of process variation on
the relationship between input image and output quality. The dependence of
energy consumption on input images provides a new perspective for low-power
multimedia and image processing system design.

Irturk, A. Matai, J. Oberg, J. Su, J. Kastner, R.Simulate and Eliminate: A
Top-to-Bottom Design Methodology for Automatic Generation of Application
Specific Architectures 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956868

There is an increasing trend toward application specific processing,
particularly in embedded computing devices that have stringent performance
requirements. Achieving the desired area and throughput constraints requires
careful tuning of the underlying architecture and high-level design tools are
gaining increasing acceptance to achieve this goal while decreasing the design
time. Most existing tools employ a bottom-to-top methodology, which piece
together functional units, interconnect, and control logic based on the given
application; this tends to scale poorly. We developed a tool, simulate and
eliminate (S&E), that is fundamentally different from the existing high-level
design tools as it employs a top-to-bottom methodology. S&E provides automatic
generation of a variety of general purpose processing cores with different
parameterization options. Then, the provided application(s) are simulated on
this general-purpose architecture and the unneeded functionality is eliminated
resulting in application specific architecture. S&E generates completely
synthesizable hardware description language for an input C and/or MATLAB code.
S&E provides different design methods and parameterization options to enable
the user to study area and performance tradeoffs over a large number of
different architectures and find the optimum architecture for the desired
objective.

Jou, J. M. Lee, Y.-L. Wu, S.-S.Model-Driven Design and Generation of New
Multi-Facet Arbiters: From the Design Model to the Hardware Synthesis 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958189

Designing of arbiters has become increasingly important due to their wide use
in the areas such as multi-processor systems-on-a-chip and on-chip or off-chip
high-speed cross-bar switches and networks. In this paper, we proposed a new
systematic model-driven flow for designing the new scalable multi-facet
arbiters through a 3-phase process combined with the template-based modular
design approach that includes the design model derivation phase, the
architecture model (or template) design phase as well as the arbiter hardware
implementation and generation phase. First, we described the phase 1 of the
design flow of how to induce an arbiter design model in detail by careful
analysis of arbiter design issues and systematic design space exploring the
construction of the model. Then, we continue to discuss the phase 2 of how to
derive an architecture model or template using the reusability, modularity and
expansibility techniques. With both the design and the architecture models,
designers can easily design or at least understand and thus choose many kinds
of different but better arbiters efficiently. Finally, we have developed a
scalable decentralized parallel tree structure and corresponding modular
algorithms for efficient arbiter hardware implementation, which also is the
final phase of our proposed 3-phase model-driven design flow. Moreover, based
on this modular and reusable hardware implementation structure as well as the
algorithms, we have designed a parametric arbiter generator that automatically
generates various multi-facet arbiters. Using this hardware implementation
design and the generator, not only a fast and small round-robin arbiter but
also other type arbiters were designed and generated on the fly. The hardware
implementation algorithms, the generator, and the experiment results all were
given to verify their performances. To our knowledge, this is the first time
that such a systematic model-driven design approach is proposed in the
practical hardware d- - esign and such a multi-facet arbiter is designed.

Bogdan, P. Marculescu, R.Hitting Time Analysis for Fault-Tolerant Communication
at Nanoscale in Future Multiprocessor Platforms 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956865

This paper investigates the on-chip stochastic communication and proposes an
analytical model for computing its mean hitting time. Toward this end, we model
the on-chip stochastic communication of any source-destination pair as a
branching and annihilating random walk taking place on a finite mesh. The
evolution of this branching process is studied via a master equation which
helps us estimate the mean number of communication rounds needed to reach a
destination node from a particular source node. Besides the probabilistic
performance analysis, we also present experimental results for two concrete
platforms and assess the potential of stochastic communication for future
nanotechnologies.

Beretta, I. Rana, V. Atienza, D. Sciuto, D. A Mapping Flow for Dynamically
Reconfigurable Multi-Core System-on-Chip Design 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956864

Nowadays, multi-core systems-on-chip (SoCs) are typically required to execute
multiple complex applications, which demand a large set of heterogeneous
hardware cores with different sizes. In this context, the popularity of
dynamically reconfigurable platforms is growing, as they increase the ability
of the initial design to adapt to future modifications. This paper presents a
design flow to efficiently map multiple multi-core applications on a
dynamically reconfigurable SoC. The proposed methodology is tailored for a
reconfigurable hardware architecture based on a flexible communication
infrastructure, and exploits applications similarities to obtain an effective
mapping. We also introduce a run-time mapper that is able to introduce new
applications that were not known at design-time, preserving the mapping of the
original system. We apply our design flow to a real-world multimedia case study
and to a set of synthetic benchmarks, showing that it is actually able to
extract similarities among the applications, as it achieves an average
improvement of 29% in terms of reconfiguration latency with respect to a
communication-oriented approach, while preserving the same communication
performance.

TEST

Czysz, D. Mrugalski, G. Mukherjee, N. Rajski, J. Szczerbicki, P. Tyszer, J.
Deterministic Clustering of Incompatible Test Cubes for Higher Power-Aware EDT
Compression 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958190

The embedded deterministic test-based compression uses cube merging to reduce a
pattern count, the amount of test data, and test time. It gradually expands a
test pattern by incorporating compatible test cubes. This paper demonstrates
that compression ratios can be order of magnitude higher, if the cube merging
continues despite conflicts on certain positions. Our novel solution produces
test clusters, each comprising a parent pattern and a number of its derivatives
obtained by imposing extra bits on it. In order to load scan chains with
patterns that feature original test cubes, only data necessary to recreate
parent patterns as well as information regarding locations and values of the
corresponding conflicting bits are required. A test controller can then deliver
tests by repeatedly applying the same parent pattern, every time using a
different control pattern to decide whether a given scan chain receives data
from the parent pattern, or another pattern is used instead to recover content
of the original test cube. Compression of incompatible test cubes preserves all
benefits of continuous flow decompression and offers compression ratios of
order $1000times$ with encoding efficiency much higher than 1.0. We also
demonstrate that test clusters make it possible to deliver test patterns in a
flexible power-aware fashion. This framework achieves significant reductions in
switching activity during scan loading as well as additional test data volume
reductions due to encoding algorithms employed to compress parent and control
vectors.

VERIFICATION

Fey, G. Sulflow, A. Frehse, S. Drechsler, R. Effective Robustness Analysis
Using Bounded Model Checking Techniques 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956866

Continuously shrinking feature sizes result in an increasing susceptibility of
circuits to transient faults, e.g., due to environmental radiation. Approaches
to implement fault tolerance are known. But assessing the fault tolerance of a
given implementation is a hard verification problem. Here, we propose the use
of formal methods to assess the robustness of a digital circuit with respect to
transient faults. Our formal model uses a fixed bound in time and exploits
fault detection circuitry to cope with the complexity of the underlying
sequential equivalence check. As a result, a lower and an upper bound on the
robustness are returned together with vulnerable components. The underlying
algorithm and techniques to improve the efficiency are presented. In
experiments, we evaluate the method on circuits with different fault detection
mechanisms.

SHORT PAPERS

Pomeranz, I.Generation of Multi-Cycle Broadside Tests 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956871

The use of multi-cycle (or multi-pattern) tests for delay faults can reduce the
number of clock cycles required for test application, and enhance the ability
of a test set to detect delay defects. This is achieved by exercising the
circuit in functional mode for several clock cycles as part of each test. This
advantage is especially important for multi-pattern functional broadside tests,
which guarantee normal functional operation conditions during the functional
clock cycles of the test. This paper describes a procedure for generating
multi-pattern broadside tests. The procedure extends a two-pattern test set
gradually to increase the number of patterns included in each test while
reducing the number of tests. Experimental results demonstrate that significant
reductions in the numbers of clock cycles are possible with the proposed
procedure for both functional and arbitrary broadside test sets.

Shebaita, A. Das, D. Petranovic, D. Ismail, Y.A Novel Moment Based Framework
for Accurate and Efficient Static Timing Analysis 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956872

A novel methodology for accurate and efficient static timing analysis is
presented in this paper. Our methodology uses the traditional cell library
table structure with one modification. The cell library tables are filled with
the gate output signal moments instead of the gate output 50% delay and output
slew. Using only few moments gives much better accuracy and visibility for the
gate output waveform than using the time domain information. Simple convolution
of the gate output moments with the interconnect moments yields the signal
moments at the stage output. The parameters of the gate input signal, which are
used for the table access of the successive stage, are directly computed from
the predecessor stage output moments using the closed form expressions without
having to explicitly transform the frequency domain moments to time domain.
Thus, the interconnects and the gates are treated in a unified moment-based
homogeneous framework. The proposed approach inherits the classical cell
library tables approach efficiency with even reduced computation complexities.
As compared to the classical cell library table approach, the proposed approach
accounts for the increasingly nonlinear and non-monotonic waveform shapes which
are prohibitively difficult to represent in the classical approaches. In
contrary to the classical approaches, increasing the accuracy in the novel
approach is made flexible and can be achieved by simply using more moments. To
illustrate the concept and prove its merits, multiple examples are presented
with 2?3 moments which maintain accuracy within 1%?3% as compared to SPICE.