November 2013 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-11.txt 

REGULAR PAPERS

ANALOG, MIXED-SIGNAL, AND RF CIRCUITS

Martins, R. ; Lourenco, N. ; Horta, N.
LAYGEN II-Automatic Layout Generation of Analog Integrated Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634591

This paper describes an innovative design automation tool, LAYGEN II, for
analog integrated circuit (IC) layout generation based on template descriptions
and on evolutionary computation techniques. LAYGEN II was developed giving
special emphasis to the reusability of expert knowledge and to the efficiency
of retargeting operations. The designer specifies the sized circuit-level
structure, the required technology and also, the layout template consisting of
technology and specification independent high-level layout guidelines. For
placement, the topological relations present in the template are extracted to a
nonslicing B*-tree layout representation, and the tool automatically merges
devices and improves the floorplan quality. For routing an optimization kernel
consisting of a tailored version of the multiobjective multiconstraint
evolutionary algorithm NSGA-II is used. The Router optimizes all nets
simultaneously and uses a built-in engine to evaluate each of the layout
solutions. The automatic layout generation is demonstrated here using the
LAYGEN II tool for typical analog circuit structures, and the results in GDSII
format were validated using the industrial grade verification tool Calibre¨.

EMERGING TECHNOLOGIES

Chang, J.-W. ; Yeh, S.-H. ; Huang, T.-W. ; Ho, T.-Y. 
An ILP-Based Routing Algorithm for Pin-Constrained EWOD Chips With Obstacle
Avoidance
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634553

Electrowetting-on-dielectric (EWOD) chips have become the most popular
actuators, particularly for droplet-based digital microfluidic biochip (DMFB)
systems. In order to enable the electrical manipulations, wire routing is a key
problem in designing EWOD chips. Unlike traditional
very-large-scale-integration (VLSI) routing problems, in addition to
routing-path establishment on signal pins, the pin-constrained EWOD-chip
routing problem must address the issue of signal sharing for pin-count
reduction under a practical constraint posed by a limited pin-count supply.
Moreover, EWOD-chip designs might incur several obstacles in the routing region
due to embedded devices for specific fluidic protocols. However, no existing
work considers the EWOD-chip routing with obstacles and, therefore, lots of
manual design efforts are involved. To remedy this insufficiency, we propose in
this paper the first routing algorithm for pin-constrained EWOD chips with
obstacle avoidance. The proposed algorithm, based on effective
integer-linear-programming (ILP) formulation as well as efficient routing
framework, can achieve high routability with a low design complexity.
Experimental results based on real-life chips with obstacles demonstrate the
high routability of proposed algorithm for pin-constrained EWOD chips with
obstacle avoidance.

FPGAS AND RECONFIGURABLE COMPUTING

Mehta, G. ; Patel, K.K. ; Parde, N. ; Pollard, N.S. 
Data-Driven Mapping Using Local Patterns 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634546

The problem of mapping a data flow graph onto a reconfigurable architecture has
been difficult to solve quickly and optimally. Anytime algorithms have the
potential to meet both goals by generating a good solution quickly and
improving that solution over time, but they have not been shown to be practical
for mapping. The key insight into this paper is that mapping algorithms based
on search trees can be accelerated using a database of examples of high quality
mappings. The depth of the search tree is reduced by placing patterns of nodes
rather than single nodes at each level. The branching factor is reduced by
placing patterns only in arrangements present in a dictionary constructed from
examples. We present two anytime algorithms that make use of patterns and
dictionaries: Anytime A* and Anytime Multiline Tree Rollup. We compare these
algorithms to simulated annealing and to results from human mappers playing the
online game UNTANGLED. The anytime algorithms outperform simulated annealing
and the best game players in the majority of cases, and the combined results
from all algorithms provide an informative comparison between architecture
choices.


MODELING AND SIMULATION

Brachtendorf, H.G. ; Bittner, K. 
Grid Size Adapted Multistep Methods for High  Oscillators
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634587

The numerical calculation of the limit cycle of oscillators with resonators
exhibiting a high-quality factor Q such as quartz crystals is a difficult task
in the time domain. Time domain integration formulas, when not carefully
selected, introduce numerical damping that leads to erroneous limit cycles or
spurious oscillations. A novel class of adaptive multistep integration formulas
based on finite difference (FD) schemes is derived, which circumvent the
aforementioned problems. The results are compared with the well-known harmonic
balance (HB) technique.  Moreover, the range of absolute stability is derived
for these methods. The resulting discretized system by FD methods is sparser
than that of HB and, therefore, easier to solve and easier to implement.

Jung, M. ; Pan, D.Z. ; Lim, S.K. 
Chip/Package Mechanical Stress Impact on 3-D IC Reliability and Mobility
Variations
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634536

In this paper, we propose a fast and accurate chip/package thermomechanical
stress co-analysis tool for through- silicon-via (TSV)-based 3-D ICs. We use
our tool for full-stack mechanical reliability as well as stress-aware timing
analyses. First, we analyze the stress induced by chip/package interconnect
elements, i.e., TSV, micro-bump, and package bump. Second, we explore and
validate the principle of lateral and vertical linear superposition of stress
tensors (LVLS), considering all chip/package elements. The proposed LVLS method
greatly reduces the complexity of stress calculation compared with the
conventional finite element analysis method with high enough accuracy for
full-chip/package-scale stress simulations and reliability analysis. In
addition, we build hole and electron mobility variation maps based on LVLS.
Finally, we study the mechanical reliability issues and provide full-stack
timing analysis results in practical 3-D chip/package designs including
wide-I/O and block-level 3-D ICs.

PHYSICAL DESIGN

Cortadella, J. 
Area-Optimal Transistor Folding for 1-D Gridded Cell Design 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634571

The 1-D design style with gridded design rules is gaining ground for addressing
the printability issues in subwavelength photolithography. One of the synthesis
problems in cell generation is transistor folding, which consists of breaking
large transistors into smaller ones (legs) that can be placed in the active
area of the cell. In the 1-D style, diffusion sharing between differently sized
transistors is not allowed, thus implying a significant area overhead when
active areas with different sizes are required. This paper presents a new
formulation of the transistor folding problem in the context of 1-D design
style and a mathematical model that delivers area-optimal solutions.  The
mathematical model can be customized for different variants of the problem,
considering flexible transistor sizes and multiple-height cells. An innovative
feature of the method is that area optimality can be guaranteed without
calculating the actual location of the transistors. The model can also be
enhanced to deliver solutions with good routability properties.

Chang, H.-Y. ; Jiang, I.H.-R. ; Chang, Y.-W. 
ECO Optimization Using Metal-Configurable Gate-Array Spare Cells 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634578

Due to the rapidly increasing design complexity in modern IC designs,
metal-only engineering change order (ECO) becomes inevitable to achieve design
closure with a low respin cost. Traditionally, preplaced redundant standard
cells are regarded as spare cells. However, these cells are limited by
predefined functionalities and locations, and they always consume leakage power
despite their inputs being tied off. To overcome the inflexibility and power
overhead, a new type of spare cells, called metal-configurable gate-array spare
cells, are introduced. In this paper, we address a new ECO problem, which
performs design changes using metal-configurable gate-array spare cells.  We
first study the properties of this new ECO problem and propose a new cost
metric, aliveness, to model the capability of a spare gate array. Based on
aliveness and routability, we then develop two ECO optimization frameworks, one
for timing ECO and the other for functional ECO. Experimental results show that
our approach delivers superior efficiency and effectiveness.

Sai, M.P.D. ; Yu, H. ; Shang, Y. ; Tan, C.S. ; Lim, S.K. 
Reliable 3-D Clock-Tree Synthesis Considering Nonlinear Capacitive TSV Model
With Electrical-Thermal-Mechanical Coupling
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634568

A robust physical design of 3-D IC requires investigation on through-silicon
via (TSV). The large temperatures and stress gradients can severely affect TSV
delay with large variation. The traditional physical model treats TSV as a
resistor with linear electrical-thermal dependence, which ignores the
fundamental device physics. In this paper, a physics-based
electrical-thermalÐmechanical delay model is developed for signal TSVs in 3-D
IC. With consideration of liner material and also stress, a nonlinear model is
established between electrical delay with temperature and stress. Moreover,
sensitivity analysis is performed to relate the reduction of temperature and
stress gradients with respect to dummy TSVs insertion. Taking the design of 3-D
clock tree as a case study, we have formulated a nonlinear optimization problem
for clock-skew reduction. By allocating dummy TSVs to reduce the temperature
and stress gradients, the clock skew introduced by signal TSVs and drivers can
be minimized. A number of 3-D clock-tree benchmarks are utilized in
experiments. We have observed that with the use of dummy TSV insertion, clock
skew can be reduced by 61.3% on average when the accurate nonlinear
electrical-thermalÐ mechanical delay model is applied.

SYSTEM-LEVEL DESIGN

Lee, J. ; Chung, M.-K. ; Cho, Y.-G. ; Ryu, S. ; Ahn, J.H. ; Choi, K. 
Mapping and Scheduling of Tasks and Communications on Many-Core SoC Under Local
Memory Constraint 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634549

There has been extensive research on mapping and scheduling tasks on a
many-core SoC. However, none considers the optimization of communication types,
which can significantly affect performance, energy consumption, and local
memory usage of the SoC. This paper presents an approach to automatic mapping
and scheduling of tasks and communications on a many-core SoC. The key idea is
to decide the type of each communication between message passing and shared
memory when we do the mapping and scheduling. By assigning a proper type to
each communication, we can optimize the energy consumption, performance, or
energy-delay product. To solve the optimization problem, the approach adopts a
probabilistic algorithm coupled with some heuristics. To enhance throughput of
the system, it performs software pipelined scheduling of the tasks using a
modified iterative modulo scheduling technique. Experiments show that our
algorithm achieves on average 50.1% lower energy consumption, 21.0% higher
throughput, and 64.9% lower energy- delay product, compared to shared memory
only communication.

TEST

Sismanoglou, P. ; Nikolos, D. 
Input Test Data Compression Based on the Reuse of Parts of Dictionary Entries:
Static and Dynamic Approaches 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634582

In this paper, we present a new test data compression method for intellectual
property (IP) cores testing, based on the reuse of parts of dictionary entries.
Two approaches are investigated: the static and the dynamic. In the static
approach, the content of the dictionary is constant during the testing of a
core, while in the dynamic approach the testing of a core consists of several
test sessions and the content of the dictionary is different during each test
session. The efficiency of the proposed method is supported with extensive
simulation results and comparisons to already known test data compression
methods suitable for IP cores testing.

Janicki, J. ; Kassab, M. ; Mrugalski, G. ; Mukherjee, N. ; Rajski, J. ; Tyszer, J. 
Test Time Reduction in EDT Bandwidth Management for SoC Designs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634561

This paper presents novel methods of reducing test time and enhancing test
compression for system-on-chip (SoC) designs armed with embedded deterministic
test (EDT)-based compression logic. The ability of the proposed scheme to
improve the encoding efficiency and test compression, while reducing test
application time, is accomplished by appropriate selecting and laying out
automatic test equipment channel injectors of every single core EDT-based
decompressor as well as appropriate bandwidth management of the entire test
procedure combined with new control data optimization techniques. The efficacy
of the proposed scheme is validated through experiments on several industrial
SoC designs and is reported herein.

VERIFICATION

Karfa, C. ; Banerjee, K. ; Sarkar, D. ; Mandal, C. 
Verification of Loop and Arithmetic Transformations of Array-Intensive
Behaviors 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634544

Loop transformation techniques along with arithmetic transformations are
applied extensively on array and loop intensive behaviors in design of
area/energy efficient systems in the domain of multimedia and signal processing
applications. Ensuring correctness of such transformations is crucial for the
reliability of the designed systems. In this paper, array data dependence
graphs (ADDGs) are used to represent both the input and the transformed
behaviors and the correctness of the transformations is ensured by proving
equivalence of the two ADDGs. A slice- based equivalence checking method of
ADDGs is proposed for this purpose. The method relies on the normalization of
arithmetic expressions and some simplification rules to handle arithmetic
transformations. Unlike many other reported techniques, our method is strong
enough to handle several arithmetic transformations along with all kinds of
loop transformations. Correctness and complexity of the method have been dealt
with. Experimental results on several test cases demonstrate the effectiveness
of the method.

Hazra, A. ; Mukherjee, R. ; Dasgupta, P. ; Pal, A. ; Harer, K.M. ; Banerjee, A. ; 
Mukherjee, S. 
POWER-TRUCTOR: An Integrated Tool Flow for Formal Verification and Coverage of
Architectural Power Intent
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634589

With the growing complexity and gradually shrinking power requirements in the
system-on-chip designs, sophisticated global power management policies (which
orchestrate the switching between power states of multiple power domains) are
commonplace. Recent research has paved some novel ways to verify the
sophisticated on-chip architectural power management decisions and analyze the
verification coverage. However, one of the primary challenges in verifying such
power management architectures stems from the mixed implementation of such
strategies, where the local power controllers are in hardware and the global
power management is implemented in software/firmware. There has been lack of
effort to build a unified and automated framework for power intent verification
and coverage analysis for generic power management logics. This paper tries to
develop an end-to-end automated framework enabled by a tool named POWER-TRUCTOR
for power intent validation.

SHORT PAPERS

Huang, L. ; Wang, Z. ; Xiao, N. ; Wang, Y. ; Dou, Q. 
Dynamic Streamization Model Execution for SIMD Engines on Multicore Architectures
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634570

This paper proposes dynamic streamization model execution (DSME), a dynamic
vectorization technique for single instruction multiple data (SIMD) engines on
multicore architectures. The technique uses stream model as intermediate
representation for programs to optimize the combination of computation and
memory accesses of SIMD engines in general-purpose (GP) designs. DSME allows
the dynamic placement of computations on different cores when they are not in
use to utilize multiple SIMD engines. This study also discusses hardware
extensions to existing GP processor designs as well as related compiler
extensions that use the special hardware components. Our extensive experiments
demonstrate that performance gains of DSME can be achieved.

Prakash, A. ; Patel, H.D. 
An Instruction Scratchpad Memory Allocation for the Precision Timed Architecture
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634576

This paper presents a static instruction scratchpad memory allocation scheme
for the precision timed architecture (PRET). Since PRET provides timing
instructions to control the temporal execution of programs, the objective of
the allocation scheme is to ensure that the explicitly specified temporal
requirements are met. Furthermore, this allocation incorporates the timing
requirements from the multiple hardware threads of the PRET architecture. We
formulate the allocation problem as an integer-linear programming problem, and
we implement a tool that takes compiled ARMv4 binaries, constructs a
timing-requirements-aware control-flow graph, performs a WCET analysis and SPM
allocation, and rewrites the binaries with the allocation. We evaluate our
approach using a modified version of the Malardalen benchmarks to show the
benefits of the proposed approach. We also present a UAV benchmark derived from
the PapaBench benchmark.

Kundu, S. ; Pal, S. ; Chattopadhyay, S. ; Sengupta, I. ; Kapur, R. 
A Metric for Test Set Characterization and Customization Toward Fault Diagnosis 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634573

This paper introduces a new metric to characterize test sets in terms of their
diagnostic power. Our method uses much less space compared to the existing ones
and is quite accurate. The metric can be utilized to increase the
diagnosability of incompletely specified test sets via don't care filling. The
X-filling approach can be integrated with test pattern generation tools to aid
in better diagnostic pattern set generation.

Teodorovic, P. ; Dautovic, S. ; Malbasa, V. 
Recursive Boolean Formula Minimization Algorithms for Implication Logic
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634562

In this paper are given two novel algorithms for minimization of recursive
Boolean formula (RBF), which is adequate for implementation of N-input 1-output
Boolean functions (BFs) over basis imply, false using two memristors. Both of
our algorithms are direct consequence of necessary and sufficient conditions
related to regular ordering of positive product terms within recursive formula.
The results demonstrate how developed algorithms provide up to 26% gain in
average number of implications and shorter recursive Boolean formula length in
up to 77% of problem instances than previously published algorithms, tested on
the set of all 4-input 1-output BFs.