August 2013 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at 
http://www.umn.edu/~tcad/newsletter/2013-08.txt

NEW EDITORIAL TEAM TO TAKE THE HELM IN 2014

CEDA is pleased to announce that Professor Vijaykrishnan Narayanan of Penn
State University has been appointed as the next Editor-in-Chief of the IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
for a two-year term starting January 2014. Professor Narayanan is an expert in
the areas of power-aware and reliable systems, embedded systems, reconfigurable
architectures, nanoarchitectures, and computer architecture.  He has published
over 350 papers with an h-index of 55, has received multiple awards for his
research, and is a Fellow of the IEEE.

Dr. Charles J. Alpert, Manager of the Design Productivity group at the IBM
Austin Research Laboratory, will also serve on the leadership team of TCAD as
Deputy Editor-in-Chief. Dr. Alpert has published extensively in the area of
physical design. He holds nearly 60 patents and has received three DAC Best
Paper awards. He currently serves as Technical Program Cochair for DAC and as
Chair of the Design Automation Technical Committee, and is a Fellow of the
IEEE.

Interested in volunteering for TCAD? The incoming leadership team will be
working on formulating the new editorial board over the next few months. If you
are interested in serving as an Associate Editor, please send a recent CV to
Professor Narayanan at vijay@cse.psu.edu before October 1, 2013.

REGULAR PAPERS

EMBEDDED SYSTEMS

Hsieh, J.-W. ; Zheng, Y.-C. ; Peng, Y.-S. ; Yeh, P.-H. 
VAST: Virtually Associative Sector Translation for MLC Storage Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559102

In recent years, multilevel cell Flash memory (MLC), which stores two or more
bits per cell, has gradually replaced single-level cell flash memory due to its
lower cost and higher density. However, MLC also brings new constraints, i.e.,
no partial programming and sequential page writes within a block, to the
management. This paper proposes a virtual log-block-based hybrid-mapping
scheme, referred to as virtually associative sector translation (VAST), for MLC
storage systems. Unlike traditional hybrid-mapping schemes, VAST is a
combination of block-level and segment-level mappings and manages log blocks in
a flexible manner. The goals of our research are to avoid timeout by decreasing
dummy-page writes, to get a better response time by decreasing live-page
copies, and to prolong the life span of flash memory by decreasing total block
erasures. Our trace-driven simulation shows that VAST could reduce up to 90% of
dummy-page writes, 22%Đ52% of live-page copies, and 55%Đ83% of block erasures,
compared to well-known hybrid-mapping schemes.

EMERGING TECHNOLOGIES

Chen, Y.-H. ; Hsu, C.-L. ; Tsai, L.-C. ; Huang, T.-W. ; Ho, T.-Y. 
A Reliability-Oriented Placement Algorithm for Reconfigurable Digital
Microfluidic Biochips Using 3-D Deferred Decision Making Technique
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559249

In recent studies, digital microfluidic biochips (DMFBs) have been a promising
solution for lab-on-a-chip and bio- assay experiments because of their flexible
application and low fabrication cost. However, the reliability problem is an
imperative issue to guarantee the valid function of DMFBs. The reliability of
DMFBs decreases when electrodes are excessively actuated, preventing droplets
on DMFBs controlled successfully. Because the placement for bio- assays in
DMFBs is a key step in generating corresponding actuating signals, the
reliability of DMFBs must be considered during biochip placement to avoid
excessive actuation. Although researchers have proposed several DMFB placement
algorithms, they have failed to consider the reliability issue. In addition,
previous algorithms were all based on the simulated-annealing (SA) method,
which is time consuming and does not guarantee to obtain an optimal solution.
This paper proposes the first reliability-oriented non-SA placement algorithm
for DMFBs. This approach considers the reliability problem during placement,
and uses the 3-D deferred decision making (3D-DDM) technique to enumerate only
possible placement solutions. Large-scale DMFB placement can be synthesized
efficiently by partitioning the operation sequential graph of bioassays.
Experimental results demonstrate that the proposed technique can achieve
reliability-oriented placement for DMFBs without excessive actuation in each
electrode, while optimizing bioassay completion time. 

LOGIC SYNTHESIS

Choudhury, M.R. ; Mohanram, K. 
Low Cost Concurrent Error Masking Using Approximate Logic Circuits
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558854

With technology scaling, logical errors arising due to single-event upsets and
timing errors arising due to dynamic variability effects are increasing in
logic circuits. Existing techniques for online resilience to logical and timing
errors are limited to detection of errors, and often result in significant
performance penalty and high area/power overhead. This paper proposes
approximate logic circuits as a design approach for low cost concurrent error
masking. An approximate logic circuit predicts the value of the outputs of a
given logic circuit for a specified portion of the input space, and can
indicate uncertainty about the outputs over the rest of the input space. Using
portions of the input space that are most vulnerable to errors as the specified
input space, we show that approximate logic circuits can be used to provide low
overhead concurrent error masking support for a given logic circuit. We
describe efficient algorithms for synthesizing approximate circuits for
concurrent error masking of logical and timing errors. Results indicate that
concurrent error masking based on approximate logic circuits can mask 88% of
targeted logical errors for 34% area overhead and 17% power overhead, 100%
timing errors on all timing paths within 10% of the critical path delay for 23%
area overhead and 8% power overhead, and 100% timing errors on all timing paths
within 20% of the critical path delay for 42% area overhead and 26% power
overhead.

MODELING AND SIMULATION

Burke, D. ; Smy, T. 
Thermal Models for Optical Circuit Simulation Using a Finite Cloud Method and
Model Reduction Techniques 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558888

This paper presents a procedure for the creation of versatile and powerful
thermal compact models of integrated optical devices and demonstrates their use
in an optical circuit level simulator. A detailed 3-D model of the device is
first built using a meshless finite cloud method, producing a large linear
sparse set of equations. This model is then reduced to a compact representation
using a Krylov subspace model reduction (MR) technique. Such a reduced model is
described by small dense matrices, but can reproduce the original temperature
distribution within acceptable error. Three devices are used as demonstration
models for the technique: a microdisc laser and two microring-based devices, a
modulator, and an optical switch. All three devices are built in a silicon on
oxide platform. Using MR the linear systems describing these models are reduced
from thousands of unknowns to systems with less than 100 reduced variables. It
is then demonstrated how the reduced compact models can be linked together to
describe a complete optical system with solution errors of lower than 1%.
Finally, it is shown how this reduced thermal model can be utilized in a
circuit level opto-electronic circuit simulator and simulations are presented,
demonstrating the effectiveness of the reduced models in speeding up simulation
times or enabling otherwise intractable problems.

Ardestani, E.K. ; Mesa-Martinez, F.J. ; Southern, G. ; Ebrahimi, E. ; Renau, J. 
Sampling in Thermal Simulation of Processors: Measurement, Characterization,
and Evaluation 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6560454

Power densities in modern processors induce thermal issues that limit
performance. Power and thermal models add complexity to architectural
simulators, limiting the depth of analysis. Prohibitive execution time
overheads may be circumvented using sampling techniques. While these approaches
work well when characterizing processor performance, they introduce new
challenges when applied to the thermal domain. This paper aims to improve the
accuracy and performance of sampled thermal simulation at the architectural
level. To the best of our knowledge, this paper is the first to evaluate the
impact of statistical sampling on thermal metrics through direct temperature
measurements performed at runtime. Experiments confirm that sampling can
accurately estimate certain thermal metrics. However, extra consideration needs
to be taken into account to preserve the accuracy of temperature estimation in
a sampled simulation. Mainly because, on average, thermal phases are much
longer than performance phases. Based on these insights, we introduce a
framework that extends statistical sampling techniques, used at the performance
and power stages, to the thermal domain. The resulting technique yields an
integrated performance, power, and temperature simulator that maintains
accuracy, while reducing simulation time by orders of magnitude.  In
particular, this paper shows how dynamic frequency and voltage adaptations can
be evaluated in a statistically sampled simulation. We conclude by showing how
the increased simulation speed benefits architects in the exploration of the
design space.

Yakopcic, C. ; Taha, T.M. ; Subramanyam, G. ; Pino, R.E. 
Generalized Memristive Device SPICE Model and its Application in Circuit Design 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558886

This paper presents a SPICE model for memristive devices. It builds on existing
models and is correlated against several published device characterization data
with an average error of 6.04%. When compared to existing alternatives, the
proposed model can more accurately simulate a wide range of published
memristors. The model is also tested in large circuits with up to 256
memristors, and was less likely to cause convergence errors when compared to
other models. We show that the model can be used to study the impact of
memristive device variation within a circuit. We examine the impact of
nonuniformity in device state variable dynamics and conductivity on individual
memristors as well as a four memristor read/write circuit. These studies show
that the model can be used to predict how variation in a memristor wafer may
impact circuit performance.

PHYSICAL DESIGN

Brenner, U. 
BonnPlace Legalization: Minimizing Movement by Iterative Augmentation 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559150

We describe BonnPlaceLegal, an algorithm for VLSI placement legalization. Based
on a minimum-cost flow algorithm that iteratively augments flows along paths,
our approach ensures that only augmentations are considered that can be
realized exactly by cell movements. Hence, this method avoids realization
problems that are inherent to previous flow-based legalization algorithms. As a
result, it combines the global perspective of minimum-cost flow approaches with
the efficiency of local search algorithms. The tool is mainly designed to
minimize total and maximum cell movement, but it is flexible enough to optimize
other objective functions provided that the effect of single cell movements on
them can be estimated efficiently. We compare our approach to legalization
tools from industry and academia by experiments on dense recent real-world
designs and public benchmarks. The results show that we are much faster and
produce significantly better results in terms of average (linear and quadratic)
and maximum movement than any other tool. The experiments also demonstrate that
by minimizing squared movement we also produce a smaller increase in net length
than the other tools.

Xiao, Z. ; Du, Y. ; Zhang, H. ; Wong, M.D.F.
A Polynomial Time Exact Algorithm for Overlay-Resistant Self-Aligned Double
Patterning (SADP) Layout Decomposition
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558884

Double patterning lithography (DPL) technologies have become a must for today's
sub-32 nm technology nodes.  Currently, there are two leading DPL technologies:
self-aligned double patterning (SADP) and litho-etch-litho-etch (LELE). Among
them, SADP has the significant advantage over LELE in its ability to avoid
overlay, making it the likely DPL candidate for the next technology node of 14
nm. In any DPL technology, layout decomposition is the key problem. While the
layout decomposition problem for LELE has been well studied in the literature,
only a few attempts have been made to address the SADP layout decomposition
problem. In this paper, we present a polynomial time exact (optimal) algorithm
to determine if a given layout has SADP decompositions that do not have any
overlay at specified critical edges. The previous approaches tried to minimize
the total overlay of a given layout, which may be a problematic objective.
Furthermore, all previous exact algorithms were computationally expensive
exponential time algorithms based on SAT or ILP. Other previous algorithms for
the problem were heuristics without having any guarantee that an overlay-free
solution can be found even if one exists.

SYSTEM-LEVEL DESIGN

van Stralen, P. ; Pimentel, A.
Fitness Prediction Techniques for Scenario-Based Design Space Exploration
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559130

Modern embedded systems are becoming increasingly multifunctional. The dynamism
in multifunctional embedded systems manifests itself with more dynamic
applications and the presence of multiple applications executing on a single
embedded system. This dynamism in the application workload must be taken into
account during the early system-level design space exploration (DSE) of
multiprocessor system-on-a-chip (MPSoC)-based embedded systems. Scenario-based
DSE utilizes the concept of application scenarios to search for optimal
mappings of a multi- application workload onto an MPSoC. The scenario-based DSE
uses a multi-objective genetic algorithm (GA) to identifying the mapping with
the best average quality for all the application scenarios in the workload. In
order to keep the exploration of the scenario-based DSE efficient, fitness
prediction is used to obtain the quality of a mapping. This fitness prediction
is performed using a representative subset of application scenarios that is
obtained using co-exploration of the scenario subset space. In this paper,
multiple fitness prediction techniques are presented: stochastic,
deterministic, and a hybrid combination. Results show that, for our test cases,
accurate fitness prediction is already provided for subsets containing only
1Đ4% of the application scenarios. Larger subsets will obtain a similar
accuracy, but the DSE will require more time to identify promising mappings
that meet the requirements of multifunctional embedded systems.

TEST

Lien, W.-C. ; Lee, K.-J. ; Hsieh, T.-Y. ; Ang, W.-L. 
An Efficient On-Chip Test Generation Scheme Based on Programmable and Multiple
Twisted-Ring Counters
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559094

Twisted-ring-counters (TRCs) have been used as built-in test pattern generators
for high-performance circuits due to their small area overhead, low performance
impact and simple control circuitry. However, previous work based on a single,
fixed-order TRC often requires long test time to achieve high fault coverage
and large storage space to store required control data and TRC seeds. In this
paper, a novel programmable multiple-TRC-based on-chip test generation scheme
is proposed to minimize both the required test time and test data volume. The
scan path of a circuit under test is divided into multiple equal-length scan
segments, each converted to a small-size TRC controlled by a programmable
control logic unit. An efficient algorithm to determine the required seeds and
the control vectors is developed. Experimental results on ISCAS'89, ITC'99 and
IWLS'05 benchmark circuits show that, on average, the proposed scheme using
only a single programmable TRC design can achieve 35.58%Đ98.73% reductions on
the number of test application cycles with smaller storage data volume compared
with previous work. When using more programmable TRC designs, 83.60%Đ99.59%
reductions can be achieved with only slight increase on test data volume.

Huang, S.-Y. ; Lin, Y.-H. ; Huang, L.-R. ; Tsai, K.-H. ; Cheng, W.-T. 
Programmable Leakage Test and Binning for TSVs With Self-Timed Timing Control
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559087

Leakage tests have been a challenge for through-silicon vias (TSVs) in a 3-D
IC. Most existing methods are still inadequate in terms of the range of
testable leakage currents. In this paper, we borrow the wisdom of the IO-pin
leakage test while enhancing it with two features. First, we make it more
suitable for a TSV, which has a much smaller capacitance than an IO pin.
Second, we support a wide range of leakage test (e.g., from 0.125 uA to 16 uA),
and thereby allowing for flexible test threshold setting and leakage
characterization. To achieve this goal, we present two sets of techniquesÑ1)
wait-time generation by programmable delay line, and 2) wait-time propagation
with a self-timed timing control scheme to overcome the timing skew problem due
to signal routing. We demonstrate that the entire scheme can be done in only
logic gates, making it easy to integrate into the common design flow.

VERIFICATION

Mitra, S. ; Banerjee, A. ; Dasgupta, P. ; Ghosh, P. ; Kumar, H. 
Formal Guarantees for Localized Bug Fixes
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558891

Bug traces produced in simulation serve as the basis for patching the RTL code
in order to fix a bug. It is important to prove that the patch covers all
instances of the bug scenario; otherwise, the bug may return with a different
valuation of the variables involved in the bug scenario. For large circuits,
formal methods do not scale well enough to comprehensively eliminate the bug,
and achieving adequate coverage in simulation and regression testing becomes
expensive. This paper proposes formal methods for analyzing the control trace
leading to the observed manifestation of the bug and verifying the robustness
of the bug fix with respect to that control trace. We propose a classification
of the bug fix based on the guarantee that our analysis can provide about the
quality of the bug fix. Our method also prescribes the types of tests that are
recommended to validate the bug fix on other types of scenarios.  Since our
methods are more scalable by orders of magnitude than model checking the entire
design, we believe that the proposed formal methods hold immense promise in
analyzing bug fixes in practice.


SHORT PAPERS

Kahng, A.B. ; Kang, S. ; Rosing, T.S. ; Strong, R. 
Many-Core Token-Based Adaptive Power Gating
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559076

Among power dissipation components, leakage power has become more dominant with
each successive technology node. Leakage energy waste can be reduced by power
gating. In this paper, we extend token-based adaptive power gating (TAP), a
technique to power gate an actively executing core during memory accesses, to
many-core Chip Multi-Processors (CMPs). TAP works by tracking every system
memory request and its estimated time of arrival so that a core may power gate
itself without performance or energy loss. Previous work on TAP shows several
benefits compared to earlier state-of-the-art techniques , including zero
performance hit and 2.58 times average energy savings for out-of-order cores.
We show that TAP can adapt to increasing memory contention by increasing power-
gated time by 3.69 times compared to a low memory-pressure case. We also scale
TAP to many-core architectures with a distributed wake-up controller that is
capable of supporting staggered wake-ups and able to power gate each core for
99.07% of the time, achieved by a non-scalable centralized scheme.