March 2013 Newsletter
Placing you one click away from the best new CAD research!


CALL FOR NOMINATIONS: IEEE Transactions on CAD Donald O. Pederson Best Paper
Award (Deadline: March 1, 2013)  

The IEEE Transactions on CAD invites nominations for the 2013 Donald O.
Pederson Best Paper Award. All papers that appeared in TCAD between January
2011 and December 2012 (both inclusive) are eligible to be nominated.  By MARCH
1, 2013, please send your nomination(s) to tcad@umn.edu. The information
required in a nomination is: 

Nominator (should not be an author): Title: Authors: Publication information:    
- Volume:    
- Issue:    
- Page(s): Basis for nomination (max 100 words):

REGULAR PAPERS

EMBEDDED SYSTEMS

Wu, J.; Wang, J.; Li, K.; Zhou, H.; Lv, Q.; Shang, L.; Sun, Y.  Large-Scale
Energy Storage System Design and Optimization for Emerging Electric-Drive
Vehicles
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461989

Energy consumption and the associated environmental impact are a pressing
challenge faced by the transportation sector. Emerging electric-drive vehicles
have shown promises for substantial reductions in petroleum use and vehicle
emissions.  Their success, however, has been hindered by the limitations of
energy storage technologies. Existing in-vehicle lithium-ion battery systems
are bulky, expensive, and unreliable. Energy storage system (ESS) design and
optimization is essential for emerging transportation electrification. This
paper presents an integrated ESS modeling, design, and optimization framework
targeting emerging electric-drive vehicles. A large-scale ESS modeling solution
is first presented, which considers major runtime and long-term battery
effects, and uses fast frequency-domain analysis techniques for efficient and
accurate characterization of large-scale ESS. The proposed design framework
unifies design-time optimization and runtime control. This conducts statistical
optimization for ESS cost and lifetime, which jointly considers the variances
of ESS due to manufacture tolerance and heterogeneous driver-specific runtime
usage. This optimizes ESS design by incorporating complementary energy storage
technologies, e.g., lithium-ion batteries and ultracapacitors. Using physical
measurements of battery manufacture variation and real-world user driving
profiles, our experimental study has demonstrated that the proposed framework
effectively explores the statistical design space and produces cost-efficient
ESS solutions with statistical system lifetime guarantees.

HIGH-LEVEL SYNTHESIS

Morvan, A.; Derrien, S.; Quinton, P.  Polyhedral Bubble Insertion: A Method to
Improve Nested Loop Pipelining for High-Level Synthesis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461983

High-level synthesis (HLS) allows hardware to be directly produced from
behavioral description in C/C++, thus accelerating the design process. Loop
pipelining is a key transformation of HLS, as it improves the throughput of the
design at the price of a small hardware overhead. However, for small loops, its
use often results in a poor hardware utilization due to the pipeline latency
overhead.  Overlapping the iterations of the whole loop nest instead of only
overlapping the innermost loop is a way to overcome this difficulty, but
currently available techniques are restricted to perfectly nested loops with
constant bounds, involving uniform dependences only. Using the polyhedral
model, we extend the applicability of the nested loop pipelining transformation
by proposing a new legality check and a new loop correction technique, called
polyhedral bubble insertion. This method was implemented in a source-to-source
compiler targeting HLS, and results on benchmark kernels show that polyhedral
bubble insertion is effective in practice on a much larger class of loop nests.

MODELING AND SIMULATION

Yu, W.; Zhuang, H.; Zhang, C.; Hu, G.; Liu, Z.  RWCap: A Floating Random Walk
Solver for 3-D Capacitance Extraction of Very-Large-Scale Integration
Interconnects 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461990

A floating random walk (FRW) solver, called RWCap, is presented for the
capacitance extraction of very-large-scale integration (VLSI) interconnects. An
approach, including the numerical characterization of the cross-interface
transition probability and weight value, is proposed to accelerate the
extraction of structures with multiple dielectric layers. A comprehensive
variance reduction scheme based on the importance sampling and stratified
sampling is proposed to improve the convergence rate of the FRW algorithm.
Finally, the space management technique using an octree data structure and the
parallel computing technique are presented to further improve the efficiency.
Numerical experiments are carried out with the test cases generated under the
180 and 45-nm process technologies. They demonstrate that the proposed
multidielectric FRW algorithm achieves up to $160times$ speedup over the FRW
algorithm using spherical transition domains to cross dielectric interface,
with very small memory overhead. The variance reduction techniques further
bring $3times$ or more speedup without memory overhead and the loss of
accuracy. The RWCap also outperforms other existing FRW algorithm and fast
boundary element method solvers in terms of computational time or scalability.
The experiments on an 8-core CPU machine show that the parallel RWCap is over
$6times$ faster than its serial-computing version.

Li, B.; Chen, N.; Xu, Y.; Schlichtmann, U.  On Timing Model Extraction and
Hierarchical Statistical Timing Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461978

In this paper, we investigate the challenges of applying statistical static
timing analysis in hierarchical design flow, where modules supplied by IP
vendors are used to hide design details for IP protection and to reduce the
complexity of design and verification. For the three basic circuit types,
combinational, flip-flop-based, and latch-controlled, we propose methods for
extracting timing models that contain interfacing and compressed internal
constraints. Using these compact timing models, the runtime of full-chip timing
analysis can be reduced, while circuit details from IP vendors are not exposed.
We also propose a method for reconstructing correlation between modules during
full-chip timing analysis. This correlation cannot be incorporated into timing
models because it depends on the layout of the corresponding modules in the
chip. In addition, we investigate how to apply the extracted timing models with
the reconstructed correlation to evaluate the performance of the complete
design. Experiments demonstrate that using the extracted timing models and
reconstructed correlation full-chip timing analysis can be several times faster
than applying the flattened circuit directly, while the accuracy of statistical
timing analysis is still well maintained.


PHYSICAL DESIGN

Chin, C.-Y.; Kuan, C.-Y.; Tsai, T.-Y.; Chen, H.-M.; Kajitani, Y.  Escaped
Boundary Pins Routing for High-Speed Boards
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461975

Routing for
high-speed boards is still achieved manually today. There have recently been
some related works to solve this problem; however, a more practical problem has
not been addressed. Usually, the packages or components are designed with or
without the requirement from board designers, and the boundary pins are usually
fixed or advised to follow when the board design starts. In this paper, we
describe this fixed ordering boundary pin routing problem, and propose a
practical approach to solve it. Not only do we provide a way to address, we
also further plan the wires in a better way to preserve the precious routing
resources in the limited number of layers on the board, and to effectively deal
with obstacles. Our approach has different features compared with the
conventional shortest-path-based routing paradigm. In addition, we consider
length-matching requirements and wire shape resemblance for high-speed signal
routes on board. Our results show that we can utilize routing resources very
carefully, and can account for the resemblance of nets in the presence of the
obstacles. Our approach is workable for board buses as well.

Lim, K.-H.; Joo, D.; Kim, T.  An Optimal Allocation Algorithm of Adjustable
Delay Buffers and Practical Extensions for Clock Skew Optimization in Multiple
Power Mode Designs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461977

Satisfying a clock skew constraint is one of the most important tasks in clock
tree synthesis. Moreover, the task becomes much harder to solve when the clock
tree is designed in a multiple power mode environment, in which the voltage
applied to some design module varies as the power mode changes. Recently, it
has been shown that an adjustable delay buffer (ADB), whose delay can be tuned
dynamically, can be used to solve the clock skew problem effectively under
multiple power modes. However, due to the area or control overhead by ADBs, it
is very important to minimize the number of ADBs to be allocated. This paper
provides a complete solution to the problem of clock skew optimization using
ADBs under multiple power modes. We propose a linear-time algorithm that
simultaneously solves the problems of computing: 1) the minimum (optimal)
number of ADBs to be used; 2) the location where each ADB is to be placed; and
3) the delay value of each ADB to be assigned to each power mode. Experimental
results show that, in comparison with the previous work, which iteratively
performs the ADB allocation, placement, and value assignment, our integrated
algorithm produces consistently better designs for all tested benchmarks; it
reduces the numbers of ADBs by 9.27% on average under the skew bound of 30Ð50
ps, even with shorter clock latencies compared to that of previous algorithm of
ADB allocation, placement, and delay assignment. To make it practically
feasible, we also propose a new ADB design technique and systematic algorithmic
solutions to address the problems of discrete delay values, slew rate
variation, nonzero initial ADB delay, and a possible exploration of ADB
resizing.

Liu, W.; Calimera, A.; Macii, A.; Macii, E.; Nannarelli, A.; Poncino, M.
Layout-Driven Post-Placement Techniques for Temperature Reduction and Thermal
Gradient Minimization
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461982

With the continuing scaling of CMOS technology, on-chip temperature and
thermal-induced variations have become a major design concern. To effectively
limit the high temperature in a chip equipped with a cost-effective cooling
system, thermal specific approaches, besides low power techniques, are
necessary at the chip design level. The high temperature in hotspots and large
thermal gradients are caused by the high local power density and the nonuniform
power dissipation across the chip. With the objective of reducing power density
in hotspots, we propose two placement techniques that spread cells in hotspots
over a larger area. Increasing the area occupied by the hotspot directly
reduces its power density, leading to a reduction in peak temperature and
thermal gradient. To minimize the introduced overhead in delay and dynamic
power, we maintain the relative positions of the coupling cells in the new
layout. We compare the proposed methods in terms of temperature reduction,
timing, and area overhead to the baseline method, which enlarges the circuit
area uniformly. The experimental results showed that our methods achieve a
larger reduction in both peak temperature and thermal gradient than the
baseline method. The baseline method, although reducing peak temperature in
most cases, has little impact on thermal gradient.

Wu, P.-H.; Lin, M. P.-H.; Chen, T.-C.; Ho, T.-Y.; Chen, Y.-C.; Siao, S.-R.;
Lin, S.-H.  1-D Cell Generation With Printability Enhancement
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461981

As process technologies advance to the subwavelength era, the 1-D design style
is regarded as one of the most effective ways to continue scaling down the
minimum feature size. To improve the printability of 1-D cell design, it is
essential to insert dummy patterns and optimize line-end gap distribution for
each layer. This paper presents novel 1-D cell generation algorithms that
simultaneously minimize 1-D cell area and enhance the printability.
Experimental results show that the proposed algorithms can effectively and
efficiently reduce the number of diffusion gaps, minimize used routing tracks,
insert sufficient dummy patterns, and eliminate stage-like line-end gaps
without power and timing overhead. Consequently, the 1-D cell area is minimized
and the printability of the cell is enhanced. To the best of our knowledge,
this is also the first work in the literature that considers line-end gap
distribution during 1-D cell generation.

TEST

Basith, I. I.; Kandalaft, N.; Rashidzadeh, R.; Ahmadi, M.  Charge-Controlled
Readout and BIST Circuit for MEMS Sensors
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461973

In this paper, we present a new readout circuit with an integrated built-in
self-test (BIST) structure for capacitive microelectromechanical system (MEMS).
In the proposed solution, instead of commonly used voltage control signals to
test the device, charge-controlled stimuli are employed to cover a wider range
of structural defects. The proposed test solution eliminates the risk of
structural collapse in the test phase for gap-varying parallel-plate MEMS
devices. Measurement results using a prototype fabricated in TSMC 65-nm CMOS
technology indicate that the proposed BIST scheme can successfully detect minor
structural defects altering MEMS nominal capacitance.

Pomeranz, I.  Generation of Functional Broadside Tests for Logic Blocks With
Constrained Primary Input Sequences
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461987

This paper describes a test generation procedure that produces functional
broadside tests for logic blocks whose primary input sequences are constrained.
The constraints are created during functional operation by logic blocks that
drive the logic block under consideration. Functional broadside tests avoid
overtesting of delay faults by creating functional operation conditions during
the clock cycles where delay faults are detected. Test generation procedures
for functional broadside tests typically assume that the primary input
sequences are unconstrained during functional operation. This paper shows that
the constraints, which are imposed by a logic block driving the primary inputs
of another block, can be time dependent and difficult to represent compactly.
The test generation procedure described in this paper addresses this issue by
separating the problem of test generation into the generation of constrained
primary input sequences for the block under consideration, and the extraction
of functional broadside tests from these sequences.

VERIFICATION

Nanshi, K.; Somenzi, F.  Using Abstraction to Guide the Search for Long Error
Traces
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461986

Model checking is a formal method for verifying whether the system satisfies a
user-defined specification. Compared to simulation, model checking is
restricted in capacity. On the other hand, simulation is weak in detecting bugs
that require long and complex sequences of events to be exposed. This paper
combines model checking and simulation in an abstraction-refinement scheme to
mitigate the problems of both methods. Abstraction refinement iteratively
constructs a simplified model to verify the original model. While a simplified
model mitigates the weakness of model checking, the set of simplified error
traces model helps guide simulation toward deep bugs. In abstraction
refinement, concretizationÑa process of deriving an error trace in the original
model from the abstract onesÑis used to invalidate spurious abstract error
traces or to refute a property. In this paper, we describe a novel
concretization algorithm that combines simulation with satisfiability to
efficiently refute properties with very long error traces.

Huang, S.-L.; Lin, W.-H.; Huang, P.-K.; Huang, C.-Y.  Match and Replace: A
Functional ECO Engine for Multierror Circuit Rectification
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461976

Functional engineering change order (ECO) is a popular technique for rectifying
design errors after synthesis and placement stages. We present a new approach
to generating the patch circuits for multierror circuit rectification. In this
paper, we propose a two-phase approach of: 1) discovering the functional
matches in two circuits followed by 2) determining the final patch circuits
from the matches. The ECO engine in this paper discovers functional and
structural matches in two circuits by coordinating the SAT-sweeping and the
cut-matching algorithms. Then, the patch selection is conducted by the
combinational equivalence checking technique and a linear-time selection
heuristic. The experimental results on public benchmark and industrial circuits
demonstrate that this ECO engine outperforms state-of-the-art
interpolation-based engines.

SHORT PAPERS

Reviriego, P.; Pontarelli, S.; Maestro, J. A.; Ottavi, M.  A Method to
Construct Low Delay Single Error Correction Codes for Protecting Data Bits Only
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6461988

Error correction codes (ECCs) have been used for decades to protect memories
from soft errors. Single error correction (SEC) codes that can correct 1-bit
error per word are a common option for memory protection. In some cases, SEC
codes are extended to also provide double error detection and are known as
SEC-DED codes. As technology scales, soft errors on registers also became a
concern and, therefore, SEC codes are used to protect registers. The use of an
ECC impacts the circuit design in terms of both delay and area. Traditional SEC
or SEC-DED codes developed for memories have focused on minimizing the number
of redundant bits added by the code. This is important in a memory as those
bits are added to each word in the memory. However, for registers used in
circuits, minimizing the delay or area introduced by the ECC can be more
important. In this paper, a method to construct low delay SEC or SEC-DED codes
that correct errors only on the data bits is proposed. The method is evaluated
for several data block sizes, showing that the new codes offer significant
delay reductions when compared with traditional SEC or SEC-DED codes. The
results for the area of the encoder and decoder also show substantial savings
compared to existing codes.