May 2012 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-05.txt 


Announcing the 2012 Donald O. Pederson Best Paper Award...

The Donald O. Pederson Award recognizes the best paper published in the IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems in the
two calendar years preceding the award. This year's award, which will be
presented at the Design Automation Conference in San Francisco, goes to:

Ogras, U. Y.; Bogdan, P.; Marculescu, R., 
An Analytical Approach for Network-on-Chip Performance Analysis
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5621037

Networks-on-chip (NoCs) have recently emerged as a scalable alternative to
classical bus and point-to-point architectures. To date, performance evaluation
of NoC designs is largely based on simulation which, besides being extremely
slow, provides little insight on how different design parameters affect the
actual network performance.  Therefore, it is practically impossible to use
simulation for optimization purposes. In this paper, we present a mathematical
model for on-chip routers and utilize this new model for NoC performance
analysis. The proposed model can be used not only to obtain fast and accurate
performance estimates, but also to guide the NoC design process within an
optimization loop. The accuracy of our approach and its practical use is
illustrated through extensive simulation results.


REGULAR PAPERS

EMBEDDED SYSTEMS

Jin, X.; Goto, S.
Hilbert Transform-Based Workload Prediction and Dynamic Frequency Scaling for
Power-Efficient Video Encoding 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186868

With the popularity of mobile devices with embedded video cameras, real-time
video encoding on hand-held devices becomes increasingly popular. Reducing the
power consumption during real-time video encoding to suspend the battery life
with the same encoding performance is very important to improve the quality of
service. Although some workload estimation techniques have been developed for
video decoding to reduce power consumption for video playback applications,
they present inefficiency in being transferred to video encoding directly
because the compressed information cannot be retrieved before encoding and the
future input video content is often nondeterministic. In this paper, a workload
estimation scheme targeting video encoding applications is proposed.  Based on
the definition of video encoding workload and the analysis of the features, a
Hilbert transform-based workload estimation model is proposed to predict the
overall variation trend in the encoding workload to overcome the workload
fluctuations and the nondeterministic content variations, e.g., burst motion.
The effectiveness of the proposed algorithm is demonstrated on two H.264/AVC
encoders on PC and an embedded platform by encoding different video contents at
different bit-rates. The proposed algorithm provides a negligible deadline
missing ratio around 4.8%, which is much lower than the previous solutions,
together with platform and content robustness.  Compared with the previous
solutions, the proposed algorithm provides up to 61.69% power reduction under
the same performance constraint.

LOGIC SYNTHESIS

Wan, L.; Chen, D. 
Analysis of Digital Circuit Dynamic Behavior With Timed Ternary Decision
Diagrams for Better-Than- Worst-Case Design 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186868

Modern logic optimization tools tend to optimize circuits in a balanced way so
that all primary outputs (POs) have the similar delay close to the cycle time.
However, in reality, certain POs will be exercised more frequently than the
rest. Among the critical POs, some may be stabilized very quickly by input
vectors, even if their topological delays from primary inputs are very long.
Knowing the dynamic behavior of a circuit can help optimize the most commonly
activated paths and help engineers understand how resilient a PO is against
dynamic environmental variations such as voltage fluctuations. In this paper,
we describe a tool to analyze the dynamic behavior of a digital circuit
utilizing probabilistic information. The techniques exploit the use of timed
ternary decision diagrams (tTDD) to encode stabilization conditions for POs. To
compute probabilities based on a tTDD, we propose false assignment pruning and
random variable compaction to preserve probability calculation accuracy. To
deal with the scalability issue, this paper proposes a new circuit partitioning
heuristic to reduce the inaccuracy introduced by partitioning.  Compared to the
timed simulation results, on average our tool has a mean absolute error of 1.7%
and a root mean square error of 3.9% for MCNC benchmarks and a mean absolute
error of 2.2% and a root mean square error of 4.8% for ISCAS benchmarks.
Compared to a state-of-the-art dynamic behavior analysis tool, our tool is
$15times$ faster on average for MCNC benchmarks and $65times$ faster on average
for ISCAS benchmarks, and can also handle circuits that the previous tool
cannot. This dynamic behavior analyzer would enable fast and effective circuit
optimizations for better-than-worst-case design.

MODELING AND SIMULATION

Melamed, S.; Thorolfsson, T.; Harris, T. R.; Priyadarshi, S.; Franzon, P.;
Steer, M. B.; Davis, W. R. 
Junction-Level Thermal Analysis of 3-D Integrated Circuits Using High
Definition Power Blurring 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186865

The degraded thermal path of 3-D integrated circuits (3DICs) makes thermal
analysis at the chip-scale an essential part of the design process. Performing
an appropriate thermal analysis on such circuits requires a model with
junction-level fidelity; however, the computational burden imposed by such a
model is tremendous. In this paper, we present enhancements to two thermal
modeling techniques for integrated circuits to make them applicable to 3DICs.
First, we present a resistive mesh-based approach that improves on the fidelity
of prior approaches by constructing a thermal model of the full structure of
3DICs, including the interconnect. Second, we introduce a method for dividing
the thermal response caused by a heat load into a high fidelity "near response"
and a lower fidelity "far response" in order to implement Power Blurring high
definition (HD), a hierarchical thermal simulation approach based on Power
Blurring that incorporates the resistive mesh-based models and allows for
junction-level accuracy at the full- chip scale. The Power Blurring HD
technique yields approximately three orders of magnitude of improvement in
memory usage and up to six orders of magnitude of improvement in runtime for a
three-tier synthetic aperture radar circuit, as compared to using a full-chip
junction-scale resistive mesh-based model. Finally, measurement results are
presented showing that Power Blurring high definition (HD) accurately
determines the shape of the thermal profile of the 3DIC surface after a
correction factor is added to adjust for a discrepancy in the absolute
temperature values.

PHYSICAL DESIGN

Kagalwalla, A. A.; Gupta, P.; Progler, C. J.; McDonald, S. 
Design-Aware Mask Inspection 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186861

Mask inspection has become a major bottleneck in the manufacturing flow taking
up as much as 40% of the total mask manufacturing time. In this paper, we
explore techniques to improve the reticle inspection flow by increasing its
design awareness. We develop an algorithm to locate nonfunctional features in a
postoptical proximity correction layout without using any design information.
Using this, and the timing information of the design (if available), the
smallest defect size that could cause the design to fail is assigned to each
reticle feature. The criticality of various reticle features is then used to
partition the reticle such that each partition is inspected at a different
pixel size and sensitivity so that the false and nuisance defect count is
reduced without missing any critical defect. We also develop an analytical
model to estimate the false and nuisance defect count. Using those models, our
simulation results show that this design-aware mask inspection can reduce the
false and nuisance defect count for a critical polysilicon layer from 80
defects down to 49 defects, leading to substantial reduction in defect review
load. We also develop a model to estimate first pass yield (FPY) and show that
our method can improve the FPY for a polysilicon layer from 11% to 30%. Apart
from the polysilicon layer, the potential benefit of this approach is analyzed
for active, contact and all the metal/via layers.

Fang, S.-Y.; Chen, S.-Y.; Chang, Y.-W. 
Native-Conflict and Stitch-Aware Wire Perturbation for Double Patterning
Technology 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186851

Double patterning technology (DPT), in which a dense layout pattern is
decomposed into two separate masks to relax its pitch, is the most popular
lithography solution for the sub-22 nm node to enhance pattern printability.
Previous work focused on stitch insertion to improve the decomposition success
rate. However, there exist native conflicts (NCs) which cannot be resolved by
any kind of stitch insertion. A design with NCs is not DPT-compliant and may
fail the decomposition, resulting in design for manufacturability redesign and
longer design cycles. In this paper, we give a sufficient condition for the NC
existence and propose a geometry-based method for NC prediction to develop an
early-stage analyzer for DPT decomposability checking. Then, a wire
perturbation algorithm is presented to fix as many NCs in the layout as
possible. The algorithm is based on iterative 1-D compaction and can easily be
embedded into existing industrial compaction systems. The algorithm is then
further applied to further reduce the number of stitches required for the
decomposition process. Experimental results show that the proposed algorithm
can significantly reduce the number of NCs by an average of 85% and reduce the
number of stitches by an average of 39%, which may effectively increase the
decomposition success rate for the next stage.

SYSTEM-LEVEL DESIGN

Salamy, H.; Ramanujam, J. 
An Effective Solution to Task Scheduling and Memory Partitioning for
Multiprocessor System-on-Chip 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186867

The growing trend in current complex embedded systems is to deploy a
multiprocessor system-on-chip (MPSoC). A MPSoC consists of multiple
heterogeneous processing elements, a memory hierarchy, and input/output
components which are linked together by an on-chip interconnect structure. Such
an architecture provides the flexibility to meet the performance requirements
of multimedia applications while respecting the constraints on memory, cost,
size, time, and power. Many embedded systems employ software-managed memories
known as scratch-pad memories (SPM). Unlike caches, SPMs are
software-controlled and hence the execution time of applications on such
systems can be accurately predicted. Scheduling the tasks of an embedded
application on the processors and partitioning the available SPM budget among
these processors are two critical issues in such systems. Often, these are
considered separately; such a decoupled approach may miss better quality
schedules. In this paper, we present an integrated approach to task scheduling
and SPM partitioning to further reduce the execution time of embedded
applications.  Results on several real-life benchmarks show the significant
improvement from our proposed technique.

DeOrio, A.; Fick, D.; Bertacco, V.; Sylvester, D.; Blaauw, D.; Hu, J.; Chen, G. 
A Reliable Routing Architecture and Algorithm for NoCs 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186857

Aggressive transistor scaling continues to drive increasingly complex digital
designs. The large number of transistors available today enables the
development of chip multiprocessors that include many cores on one die
communicating through an on-chip interconnect. As the number of cores
increases, scalable communication platforms, such as networks-on-chip (NoCs),
have become more popular. However, as the sole communication medium, these
interconnects are a single point of failure so that any permanent fault in the
NoC can cause the entire system to fail. Compounding the problem, transistors
have become increasingly susceptible to wear-out related failures as their
critical dimensions shrink. As a result, the on-chip network has become a
critically exposed unit that must be protected. To this end, we present Vicis,
a fault-tolerant architecture and companion routing protocol that is robust to
a large number of permanent failures, allowing communication to continue in the
face of permanent transistor failures. Vicis makes use of a two-level approach.
First, it attempts to work around errors within a router by leveraging
reconfigurable architectural components. Second, when faults within a router
disable a link's connectivity, or even an entire router, Vicis reroutes around
the faulty node or link with a novel, distributed routing algorithm for meshes
and tori. Tolerating permanent faults in both the router components and the
reliability hardware itself, Vicis enables graceful performance degradation of
networks-on-chip.

Mariani, G.; Palermo, G.; Zaccaria, V.; Silvano, C. 
OSCAR: An Optimization Methodology Exploiting Spatial Correlation in Multicore
Design Spaces 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186864

This paper presents OSCAR, an optimization methodology exploiting spatial
correlation of multicore design spaces.  This paper builds upon the observation
that power consumption and performance metrics of spatially close design
configurations (or points) are statistically correlated. We propose to exploit
the correlation by using a response surface model (RSM), i.e., a closed-form
expression suitable for predicting the quality of nonsimulated design points.
This model is useful during the design space exploration (DSE) phase to quickly
converge to the Pareto set of the multiobjective problem without executing
lengthy simulations. To this end, we introduce a multiobjective optimization
heuristic which iteratively updates and queries the RSM to identify the design
points with the highest expected improvement. The RSM allows to consolidate the
Pareto set by reducing the number of simulations required, thus speeding up the
exploration process. We compare the proposed heuristic with state-of-the-art
approaches [conventional, RSM-based, and structured design of experiments
(DoEs)]. Experimental results show that OSCAR is a faster heuristic with
respect to state-of-the-art techniques such as response-surface Pareto
iterative refinement ReSPIR and nondominated-sorting genetic algorithm NSGA-II.
In fact, OSCAR used a lower number of simulations to produce a similar
solution, i.e., an average of 150 simulations instead of 320 simulations
(NSGA-II) and 178 simulations (ReSPIR). When the number of design points is
fixed to an average of 300, OSCAR achieves less than 0.6% in terms of average
distance with respect to the reference solution while NSGA-II achieves 3.4%.
Reported results also show that OSCAR can significantly improve structured DoE
approaches by slightly increasing the number of experiments.

TEST

Lee, K.-J.; Hsieh, T.-Y.; Breuer, M. A. 
Efficient Overdetection Elimination of Acceptable Faults for Yield Improvement 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186858

Acceptable faults in a circuit under test (CUT) refer to those faults that have
no or only minor impacts on the performance of the CUT. A circuit with an
acceptable fault may be marketable for some specific applications.  Therefore,
by carefully dealing with these faults during testing, significant yield
improvement can be achieved.  Previous studies have shown that the patterns
generated by a conventional automatic test pattern generation procedure to
detect all unacceptable faults also detect many acceptable ones, resulting in a
severe loss on achievable yield improvement. In this paper, we present a novel
test methodology called multiple test set detection (MTSD) to totally eliminate
this overdetection problem. A basic test set generation method is first
presented, which depicts a fundamental scheme to generate appropriate test sets
for MTSD. We then describe an enhanced test generation method that can
significantly reduce the total number of test patterns. Solid theoretical
derivations are provided to validate the effectiveness of the proposed methods.
Experimental results show that in general an 80%-99% reduction in the number of
test patterns can be achieved compared with previous work addressing this
problem.

VERIFICATION

Chockler, H.; Kroening, D.; Purandare, M. 
Computing Mutation Coverage in Interpolation-Based Model Checking
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186866

Coverage is a means to quantify the quality of a system specification, and is
frequently applied to assess progress in system validation. Coverage is a
standard measure in testing, but is very difficult to compute in the context of
formal verification. We present efficient algorithms for identifying those
parts of the system that are covered by a given property. Our algorithm is
integrated into state-of-the-art Boolean satisfiability problem-based model
checking using Craig interpolation. The key insight into our algorithm is the
re-use of previously computed inductive invariants and counterexamples. This
re-use permits a a rapid completion of the vast majority of tests, and enables
the computation of a coverage measure with 96% accuracy with only $5times$ the
runtime of the model checker.

Welp, T.; Kitchen, N.; Kuehlmann, A. 
Hardware Acceleration for Constraint Solving for Random Simulation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186869

Constrained random simulation has been widely adopted in contemporary hardware
verification flows. In this methodology, a set of user-specified declarative
constraints describe valid input stimuli for the design under test (DUT). A
constraint solver produces the simulation input vectors; their generation is
interleaved with the actual simulation of the design for these vectors. Besides
its distribution, the solver's performance is one of the most critical
characteristics that determines the overall verification efficiency. There are
no general approaches to hardware acceleration for solving declarative
constraints. Current setups for hardware acceleration-based verification
combine a software constraint solver running on a general-purpose processor
with the hardware- accelerated DUT. This approach suffers from a major
efficiency bottleneck caused by the significant performance mismatch between
the solver executed in software and the DUT running on an accelerator. In this
paper, we present a hardware constraint solver that uses a set of parallel
solving units executing Markov chain Monte Carlo sampling.  We propose to
combine this solver and the DUT on the same device and run both entities
hardware-accelerated in order to eliminate the performance mismatch. We discuss
the details of the solver architecture and its implementation and report
comprehensive results on performance and distribution characteristics as well
as experience obtained from our case study where we used our solver to verify a
real-world hardware design.

Liu, L.; Sheridan, D.; Tuohy, W.; Vasudevan, S. 
A Technique for Test Coverage Closure Using GoldMine 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186862

We propose a methodology to generate input stimulus to achieve coverage closure
using GoldMine, an automatic assertion generation engine that uses data mining
and formal verification. GoldMine mines the simulation traces of a behavioral
register transfer level (RTL) design using a decision tree based learning
algorithm to produce candidate assertions. These candidate assertions are
passed to a formal verification engine. If a candidate assertion is false, a
counterexample trace is generated. In this paper, we feed these counterexample
traces to iteratively refine the original simulation trace data. We introduce
an incremental decision tree to mine the new traces in each iteration.  The
algorithm converges when all the candidate assertions are true. We formally
prove that our algorithm will always converge and capture the complete
functionality of each output of a sequential design on convergence. We show
that our method always results in a monotonic increase in simulation coverage.
We also present an output- centric notion of coverage, and argue that we can
attain coverage closure with respect to this notion of coverage. We elaborate
the technique step by step using a nontrivial arbiter design. Experimental
results to validate our arguments are presented on several designs from Rigel,
OpenRisc, and SpaceWire. Some practical limitations to achieve 100% coverage
and the differences between final decision tree and binary decision diagram are
discussed.

SHORT PAPERS

Jiang, Y.-L.; Chen, H.-B. 
Application of General Orthogonal Polynomials to Fast Simulation of Nonlinear
Descriptor Systems Through Piecewise-Linear Approximation 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186859

In this letter, we report an approach combining piecewise-linear (PWL)
approximation with general orthogonal polynomials to efficiently simulate large
nonlinear descriptor systems in time-domain. The main idea of this approach is
first to approximate a nonlinear function by a piecewise-linear representation.
Then, using the recursive formulae of general orthogonal polynomials,
orthonormal bases can be produced for fast simulation of the PWL model. The
effectiveness of our approach is demonstrated on two nonlinear circuit models.

Alpaslan, E.; Kruseman, B.; Majhi, A. K.; Heuvalman, W. M.; Dworak, J. 
NIM-X: A Noise Index Model-Based X-Filling Technique to Overcome the Power
Supply Switching Noise Effects on Path Delay Test 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6186856

Power supply noise (PSN) has become a critical issue during high-quality
at-speed testing. Discrepancies between the circuit's switching activity during
functional and test mode can cause overtesting and lead to yield loss.
Alternatively, reduced PSN effects around critical paths can result in
undertesting the chip, causing test escapes. To achieve a high-quality at-speed
test, it is necessary to solve these problems simultaneously. Our previous work
introduced a noise index model (NIM), which can be used to predict the mismatch
between expected and real path delays. This paper quantitatively investigates
and compares NIM values for critical paths during functional and test mode. We
then propose a test pattern modification method that harnesses the NIM. The
method fills a subset of the don't care bits in partially specified test
vectors such that the worst observed functional NIM for the targeted critical
path is replicated during test mode.