Oct 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-10.txt 

REGULAR PAPERS

KEYNOTE PAPER

Pedram, M.
Energy-Efficient Datacenters
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303941

Pervasive use of cloud computing and the resulting rise in the number of
datacenters and hosting centers (that provide platform or software services to
clients who do not have the means to set up and operate their own computing
facilities) have brought forth many concerns, including the electrical energy
cost, peak power dissipation, cooling, and carbon emission. With power
consumption becoming an increasingly important issue for the operation and
maintenance of the hosting centers, corporate and business owners are becoming
increasingly concerned.  Furthermore, provisioning resources in a cost-optimal
manner so as to meet different performance criteria, such as throughput or
response time, has become a critical challenge. The goal of this paper is to
provide an introduction to resource provisioning and power or thermal
management problems in datacenters, and to review strategies that maximize the
datacenter energy efficiency subject to peak or total power consumption and
thermal constraints, while meeting stipulated service level agreements in terms
of task throughput and/or response time.

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Singh, A. K.; Ragab, K.; Lok, M.; Caramanis, C.; Orshansky, M.
Predictable Equation-Based Analog Optimization Based on Explicit Capture of
Modeling Error Statistics
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303944

Equation-based optimization using geometric programming (GP) for automated
synthesis of analog circuits has recently gained broader adoption. A major
outstanding challenge is the inaccuracy resulting from fitting the complex
behavior of scaled transistors to posynomial functions. In this paper, we
advance a novel optimization strategy that explicitly handles the error of the
model in the course of optimization. The innovation is in enabling the
successive refinement of transistor models within gradually reducing ranges of
operating conditions and dimensions. Refining via a brute force requires
exponential complexity. The key contribution is the development of a framework
that optimizes efficient convex formulations, while using SPICE as a
feasibility oracle to identify solutions that are feasible with respect to the
accurate behavior rather than the fitted model. Due to the poor posynomial fit,
standard GP can return grossly infeasible solutions. Our approach dramatically
improves feasibility. We accomplish this by introducing robust modeling of the
fitting error's sample distribution information explicitly within the
optimization.  To address cases of highly stringent constraints, we introduce
an automated method for identifying a true feasible solution through minimal
relaxation of design targets. We demonstrate the effectiveness of our algorithm
on two benchmarks: a two-stage CMOS operational amplifier and a
voltage-controlled oscillator designed in TSMC 0.18um CMOS technology. Our
algorithm is able to identify superior solution points producing uniformly
better power and area values under a gain constraint with improvements of up to
50% in power and 10% in area for the amplifier design. Moreover, whereas
standard GP methods produced solutions with constraint violations as large as
45%, our method finds feasible solutions.

Levantino, S.; Maffezzoni, P.
Computing the Perturbation Projection Vector of Oscillators via Frequency
Domain Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303938

This paper describes an original computational procedure to extract the
perturbation projection vector of oscillators by means of a frequency domain
technique. A key feature of the method is that it relies on the periodic
transfer function analysis, which is available in most circuit simulators, and
thus it can easily be exploited by oscillator designers. The accuracy of the
proposed extraction procedure is verified for two relevant oscillator
topologies.

HIGH-LEVEL SYNTHESIS
Sinha, R.; Patel, H. D.
synASM: A High-Level Synthesis Framework With Support for Parallel and Timed
Constructs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303940

This paper presents a high-level synthesis framework called synASM that
synthesizes abstract state machines (ASMs) to VHDL for field-programmable gate
arrays (FPGAs). In particular, this paper focuses on the specification,
scheduling, and synthesis of parallel and timed constructs. ASMs possess
well-defined formal semantics for sequential and parallel computation, and
their composition. We extend ASMs to support the specification of timing
requirements, which we call timed constructs. We also describe the composition
of timed constructs with sequential and parallel computation. A key
contribution of this paper is the extension of the force- directed scheduling
algorithm to support both parallel and timed constructs. We implement the
synthesis back-end in synASM that targets FPGAs. Our experiments show
improvements of up to 52% in lookup table usage and 34% in total area for
certain examples.

MODELING AND SIMULATION

Priyadarshi, S.; Saunders, C. S.; Kriplani, N. M.; Demircioglu, H.; Davis, W.
R.; Franzon, P. D.; Steer, M. B.
Parallel Transient Simulation of Multiphysics Circuits Using Delay-Based
Partitioning
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303942

A parallel transient simulation technique for multiphysics circuits is
presented. The technique develops partitions utilizing the inherent delay
present within a circuit and between physical domains. A state-variable-based
circuit delay element is presented, which implements the coupling between two
spatially or temporally isolated circuit partitions. A parallel delay-based
iterative approach for interfacing delay-partitioned subcircuits is applied,
which achieves the reasonable accuracy of nonparallel circuit simulation if
both incorporate the same interblock delay. The partitioned subcircuits are
distributed to different cores of a shared-memory multicore processor and
solved in parallel. A multithreaded implementation of the methodology using
OpenMP is presented. Examples showing superlinear speedup compared to
unpartitioned single-core simulation using the direct method are presented.
This paper also discusses the impact of load balancing and absolute delay on
simulation speedup.

Kim, D.; Kim, H.; Eo, Y.
Analytical Eye-Diagram Determination for the Efficient and Accurate Signal
Integrity Verification of Single Interconnect Lines
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303934

In this paper, a new efficient and accurate analytical eye-diagram
determination technique for interconnect lines is presented. The simplest input
test signal model for the intersymbol interference analysis of high-speed data
links is mathematically formulated. Since input test patterns for eye
boundaries are determined analytically, it is considered very convenient and
efficient. The proposed technique shows excellent agreement with the
SPICE-based simulation in both eye height and jitter, i.e., within 5% error for
nondiscontinuous data paths and 10% error for discontinuous data paths. The
method is much more computation-time-efficient than the pseudorandom bit
sequence-based SPICE simulation in the order of magnitude.

PHYSICAL DESIGN

Hsu, K.-T.; Sinha, S.; Pi, Y.-C.; Ho, T.-Y.
A Hierarchy-Based Distributed Algorithm for Layout Geometry Operations
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303936

This paper introduces a novel distributed algorithm for performing the layout
geometry operations usually found in design rule checking, layout verification,
and mask synthesis. A large number of machines are typically available to the
user during the mask synthesis flow. As multiple machines or cores become more
ubiquitous, even designers using layout verification tools will have access to
a large set of machines. Therefore, an efficient and scalable distributed
algorithm for performing sequences of layout geometry operations will be of
great value to both designers and mask synthesis engineers. Given a layout and
a sequence of layout geometry operations, the proposed algorithm divides the
layout into several partitions. The given sequence of layout geometry
operations is executed in parallel on different partitions. New partitions are
derived from the original set of partitions and the sequence of geometry
operations is repeated on larger partitions with much fewer polygons. This
process continues until it produces a partition that covers the entire layout
area. A key feature of the proposed algorithm is that it is correct-
by-construction, i.e., each partition is guaranteed to generate a subset of the
correct results. Complete and correct results are generated for each layout
geometry operation for the entire layout when the operation completes execution
on all the partitions. The proposed algorithm was implemented in Gearman, an
open-source distributed framework. Results on large industrial layouts show
good performance and scalability.

Ozdal, M. M.; Burns, S.; Hu, J.
Algorithms for Gate Sizing and Device Parameter Selection for High-Performance
Designs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303939

It is becoming increasingly important to design high-performance circuits with
as low power as possible. In this paper, we study the gate sizing and device
parameter selection problem for today's industrial designs. We first outline
the typical practical problems that make it difficult to use traditional
algorithms on high-performance industrial designs. Then, we propose a
Lagrangian relaxation-based formulation that decouples timing analysis from
optimization without a resulting loss in accuracy. We also propose a graph
model that accurately captures discrete cell-type characteristics based on
library data. We model the relaxed Lagrangian subproblem as a graph problem and
propose algorithms to solve it. In our experiments, we demonstrate the
importance of using the signoff timing engine to guide the optimization. We
also show the benefit of the graph model we propose to solve the discrete
optimization problem. Compared to a state-of-the art industrial optimization
flow, we show that our algorithms can obtain up to 38% leakage power reductions
and better overall timing for real high-performance microprocessor blocks.

SYSTEM-LEVEL DESIGN

Gladigau, J.; Haubelt, C.; Teich, J.
Model-Based Virtual Prototype Acceleration
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303935

Today, virtual prototypes are often employed for software development early in
the design flow. There, high simulation speed may support fast development. So,
the acceleration of virtual prototype simulation is important in the early
phases of design. To accelerate virtual prototypes, complex prototype
simulation can be prevented by exploiting model-specific knowledge. We replace
complex event-driven interaction with execution of predefined traces. In
particular, we show that, for many dataflow-dominated application models, such
accelerating traces may be efficiently determined. Trace determination is based
on a novel symbolic search technique. We show that virtual prototypes
exploiting such traces may lead to a significant simulation time reduction. The
benefits are quantified for the prototype of a SystemC/TLM network packet
filter, where traces result in up to 30% simulation acceleration.

TEST

Fang, H.; Chakrabarty, K.; Wang, Z.; Gu, X.
Diagnosis of Board-Level Functional Failures Under Uncertainty Using
Dempster-Shafer Theory
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303933

Despite recent advances in structural test methods, the diagnosis of the root
cause of board-level failures for functional tests remains a major challenge. A
promising approach to address this problem is to carry out fault diagnosis in
two phasesÑsuspect faulty components on the board or modules within components
(together referred to as blocks in this paper) are first identified and ranked,
and then fine-grained diagnosis is used to target the suspect blocks in a
ranked order. We propose a new method based on dataflow analysis and
DempsterÐShafer (DS) theory for ranking faulty blocks in the first phase of
diagnosis. The proposed approach transforms the information derived from one
functional test failure into multiple-stage failures by partitioning the given
functional test into multiple stages. A measure of ÒbeliefÓ is then assigned to
each block based on the knowledge of each failing stage, and the DS theory is
subsequently used to aggregate the beliefs from multiple failing stages. Blocks
with higher beliefs are ranked on the top of the candidate list. Simulations on
an industry design for a network interface application as well as on an open
source system-on-a-chip show that the proposed method can provide accurate
ranking for most board- level functional failures.

Huang, Y.-J.; Li, J.-F.
Built-In Self-Repair Scheme for the TSVs in 3-D ICs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303937

3-D integration using through-silicon-via (TSV) has been widely acknowledged as
one future integrated-circuit (IC) technology. Test and yield are two big
issues for volume production of 3-D ICs. In this paper, we propose a built-in
self-repair (BISR) scheme to test and repair TSVs in 3-D ICs. The BISR scheme,
arranging the TSVs into arrays similar to memories, can effectively enhance the
yield of TSVs in a 3-D IC such that the yield of the 3-D IC is boosted.
Furthermore, a global fusing methodology is proposed to reduce the requirement
of fuses. Simulation and analysis results show that the proposed BISR scheme
can drastically reduce the area cost and test time in comparison with an
existing TSV repair scheme for the same final yield of TSVs under repair. For a
3-D wide-IO DRAM with 512 TSVs, for example, the proposed repair scheme can
achieve 32.4% area reduction and 73.4% test time reduction.

Yu, X.; Blanton, R. D.
Improving Diagnosis Through Failing Behavior Identification
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303946

Logic diagnosis analyzes the observed failing circuit responses to derive the
potential defect sites. This paper describes a method for improving diagnosis
through failing behavior identification (FBI). FBI captures defect behavior
(i.e., activation conditions of the defect) by identifying the signal lines
related to defect activation. This additional information allows the root cause
to be estimated in order to improve yield, design quality, and test quality, as
well as guide PFA to perform faster defect localization. FBI is accomplished
by: 1) deriving the neighborhood states of the defect site, i.e., the actual
values on the signal lines within logical or physical proximity to the defect
site, and 2) identifying the signal lines that are most relevant to defect
activation. The efficacy of FBI is validated using circuit-level and
logic-level simulation experiments. The results show that FBI achieves an
average accuracy of 94% in identifying signal lines that are relevant to defect
activation, a 28% improvement over an existing approach. Moreover, by analyzing
the neighborhood states of each defect site reported by logic diagnosis, sites
that are not likely to be defective can be eliminated, which leads to
improvement in diagnosis resolution.  Experiment results show that with little
influence on diagnosis accuracy, the number of incorrect defective sites
reported by logic diagnosis can be reduced by 64%, on average.