October 2013 Newsletter
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-10.txt


KEYNOTE PAPER

Pan, D.Z. ; Yu, B. ; Gao, J.-R.
Design for Manufacturing With Emerging Nanolithography
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601025

In this paper, we survey key design for manufacturing issues for extreme
scaling with emerging nanolithography technologies, including double/multiple
patterning lithography, extreme ultraviolet lithography, and electron-beam
lithography. These nanolithography and nanopatterning technologies have
different manufacturing processes and their unique challenges to very large
scale integration (VLSI) physical design, mask synthesis, and so on. It is
essential to have close VLSI design and underlying process technology
co-optimization to achieve high product quality (power/performance, etc.) and
yield while making future scaling cost-effective and worthwhile. Recent results
and examples will be discussed to show the enablement and effectiveness of such
design and process integration, including lithography model/analysis, mask
synthesis, and lithography friendly physical design.

REGULAR PAPERS

EMERGING TECHNOLOGIES

Chen, Y.-C. ; Wang, C.-Y. ; Huang, C.-Y. 
Verification of Reconfigurable Binary Decision Diagram-Based Single-Electron
Transistor Arrays
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600827

Recently, single-electron transistors (SETs) have been attracting substantial
attention and are considered candidate devices for future integrated circuits
due to their ultralow power consumption. To realize SETs, a binary decision
diagram-based SET array is proposed as a suitable candidate for implementing
Boolean circuits. Then, some works started developing computer-aided design
techniques for this new architecture. However, most of them focused on the
development of mapping techniques. How to verify the mapping results is still
an open problem. Thus, in this paper, we address this problem and develop a
satisfiability (SAT)-based verification method. We propose a transformation
approach to model the functionality of a mapped SET array as a conjunctive
normal form formula.  Then, the problem that whether the SET array is
functionally equivalent to its specification circuit can be solved with a SAT
solver. The experimental results show that the proposed method can successfully
verify correct and incorrect SET array implementations with reasonable
verification time.

Huang, J.-D. ; Liu, C.-H. ; Lin, H.-S. 
Reactant and Waste Minimization in Multitarget Sample Preparation on Digital
Microfluidic Biochips 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600878

Sample preparation is one of essential processes in biochemical reactions. Raw
reactants are diluted in this process to achieve given target concentrations. A
bioassay may require several different target concentrations of a reactant.
Both the dilution operation count and the reactant usage can be minimized if
multiple target concentrations are considered simultaneously during sample
preparation. Hence, in this paper, we propose a multitarget sample preparation
algorithm that extensively exploits the ideas of waste recycling and
intermediate droplet sharing to reduce both reactant usage and waste amount for
digital microfluidic biochips. Experimental results show that our waste
recycling algorithm can reduce the waste and operation count by 48% and 37%,
respectively, as compared to an existing state-of-the-art multitarget sample
preparation method if the number of target concentrations is ten. The reduction
can be up to 97% and 73% when the number of target concentrations goes even
higher.

HIGH-LEVEL SYNTHESIS

Damavandpeyma, M. ; Stuijk, S. ; Basten, T. ; Geilen, M. ; Corporaal, H.
Schedule-Extended Synchronous Dataflow Graphs
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600844

Synchronous dataflow graphs (SDFGs) are used extensively to model streaming
applications. An SDFG can be extended with scheduling decisions, allowing SDFG
analysis to obtain properties, such as throughput or buffer sizes for the
scheduled graphs. Analysis times depend strongly on the size of the SDFG. SDFGs
can be statically scheduled using static-order schedules. The only generally
applicable technique to model a static-order schedule in an SDFG is to convert
it to a homogeneous SDFG (HSDFG). This may lead to an exponential increase in
the size of the graph and to suboptimal analysis results (e.g., for buffer
sizes in multiprocessors). We present techniques to model two types of
static-order schedules, i.e., periodic schedules and periodic single appearance
schedules, directly in an SDFG. Experiments show that both techniques produce
more compact graphs compared to the technique that relies on a conversion to an
HSDFG. This results in reduced analysis times for performance properties and
tighter resource requirements.

MODELING AND SIMULATION

Paek, S. ; Shin, W. ; Sim, J. ; Kim, L.-S.
PowerField: A Probabilistic Approach for Temperature-to-Power Conversion Based
on Markov Random Field Theory
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600867

Temperature-to-power technique is useful for post-silicon power model
validation. However, the previous works were applicable only to the
steady-state analysis. In this paper, we propose a new temperature-to-power
technique, named PowerField, supporting both transient and steady-state
analysis based on a probabilistic approach. Unlike the previous works,
PowerField uses two consecutive thermal images to find the most feasible power
distribution that causes the change between the two input images. To obtain the
power map with the highest probability, we adopted maximum a posteriori Markov
random field (MAP-MRF). For MAP-MRF framework, we modeled the spatial thermal
system as a set of thermal nodes and derived an approximated transient heat
transfer equation that requires only the local information of each thermal
node. Experimental results with a thermal simulator show that PowerField
outperforms the previous method in transient analysis reducing the error by
half on average. We also show that our framework works well for steady-state
analysis by using two identical steady-state thermal maps as inputs. Lastly, an
application to determining the binary power patterns of an FPGA device is
presented achieving 90.7% average accuracy.

Han, Y. ; Zhao, J. 
Accurate Substrate Analysis Based on a Novel Finite Difference Method via
Synchronization Method on Layered and Adaptive Meshing 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600985

In this paper, we present a finite difference method (FDM) based on layered and
adaptive meshing to extract substrate resistance. To allow adaptive meshing, a
novel synchronization method is introduced which expresses the potential at
other points than the centroids of panels by modifying the finite difference
equations. This method makes it possible to overcome the high computational
cost of the FDM due to tight uniform meshing requirements while enjoying a
straightforward implementation and easy handling of irregular/arbitrary
substrate structures to extract the substrate coupling of integrated circuits.

Zhang, Z. ; El-Moselhy, T.A. ; Elfadel, I.M. ; Daniel, L.
Stochastic Testing Method for Transistor-Level Uncertainty Quantification Based
on Generalized Polynomial Chaos
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600994

Uncertainties have become a major concern in integrated circuit design. In
order to avoid the huge number of repeated simulations in conventional Monte
Carlo flows, this paper presents an intrusive spectral simulator for
statistical circuit analysis. Our simulator employs the recently developed
generalized polynomial chaos expansion to perform uncertainty quantification of
nonlinear transistor circuits with both Gaussian and non-Gaussian random
parameters. We modify the nonintrusive stochastic collocation (SC) method and
develop an intrusive variant called stochastic testing (ST) method. Compared
with the popular intrusive stochastic Galerkin (SG) method, the coupled
deterministic equations resulting from our proposed ST method can be solved in
a decoupled manner at each time point. At the same time, ST requires fewer
samples and allows more flexible time step size controls than directly using a
nonintrusive SC solver. These two properties make ST more efficient than SG and
than existing SC methods, and more suitable for time-domain circuit simulation.
Simulation results of several digital, analog and RF circuits are reported.
Since our algorithm is based on generic mathematical models, the proposed ST
algorithm can be applied to many other engineering problems.

PHYSICAL DESIGN

He, X. ; Huang, T. ; Xiao, L. ; Tian, H. ; Young, E.F.Y.
Ripple: A Robust and Effective Routability-Driven Placer 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600938

The significant mismatch between the objective of wirelength and routing
congestion makes the routability issue even more important in placement. In
this paper, we describe a routability-driven placer called Ripple. Each step,
including global placement, legalization, and detailed placement, is made to
trade-off between routability and wirelength. We propose a robust and effective
flow by using cell inflation to relieve routing congestion. Cell inflation has
traditionally been used to deal with congestion and we will discuss how this
technique can be used easily and robustly in the global placement. Besides,
unlike many previous works that focus on different types of swapping
strategies, we analyze and propose some simple and effective approaches when
considering routability in the legalization and detailed placement steps.
Experimental results show that Ripple is particularly effective in improving
routability. When compared to the top results in the ISPD 2011 Contest and
SimPLR, Ripple can obtain the smallest overflow and half-perimeter wirelength
on average, while the congestion hot spots are also distributed sparsely in
Ripple.

Chang, F.-Y. ; Tsay, R.-S. ; Mak, W.-K. ; Chen, S.-H. 
MANA: A Shortest Path Maze Algorithm Under Separation and Minimum Length
Nanometer Rules 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600923

Due to process limitations, wiring rules are imposed on chip layout by
foundries. Under nanometer wiring rules, the required separation between two
wire ends is dependent on their surrounding wires, and there is a limit on the
minimum length of each wire segment. However, traditional shortest path
algorithms are not properly designed for these rules. In the paper, we propose
a maze routing algorithm, called MANA, capable of finding legal shortest paths
under these rules. Experiments with seven industrial cases show that by
handling these rules during maze routing, 94% of the violations are prevented
on average, and the overall runtime of a commercial router is reduced by 71%.
In addition, the total wire length is also reduced by 3% on average.

SYSTEM-LEVEL DESIGN

Nikitin, N. ; de San Pedro, J. ; Cortadella, J.
Architectural Exploration of Large-Scale Hierarchical Chip Multiprocessors 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600876

The continuous scaling of nanoelectronics is increasing the complexity of chip
multiprocessors (CMPs) and exacerbating the memory wall problem. As CMPs become
more complex, the memory subsystem is organized into more hierarchical
structures to better exploit locality. To efficiently discover promising
architectures within the rapidly growing search space, exhaustive exploration
is replaced with tools that implement intelligent search strategies. Moreover,
faster analytical models are preferred to costly simulations for estimating the
performance and power of CMP architectures. The memory traffic generated by CMP
cores has a cyclic dependency with the latency of the memory subsystem, which
critically affects the overall system performance. Based on this observation, a
novel scalable analytical method is proposed to estimate the performance of
highly parallel CMPs (hundreds or thousands of cores) with hierarchical
interconnect networks. The method can use customizable probabilistic models and
solves the cyclic dependencies between traffic and latency by using a
fixed-point strategy. By using the analytical model as a performance and power
estimator, an efficient metaheuristic-based search is proposed for the
exploration of large design spaces. The proposed techniques are shown to be
very accurate and a promising strategy when compared to the results obtained by
simulation.

TEST

Bao, F. ; Peng, K. ; Tehranipoor, M. ; Chakrabarty, K.
Generation of Effective 1-Detect TDF Patterns for Detecting Small-Delay Defects
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601027

Testing for small-delay defects (SDDs) has become necessary for high-quality
products (e.g., automotive applications) as technology scales further and
functional frequency increases. Traditional timing-unaware transition- delay
fault (TDF) ATPG is not adequate for detecting SDDs. Commercial timing-aware
ATPGs suffer from large CPU runtime and huge pattern count. However, a small
pattern set with high SDD coverage is desired by industry.  In this paper, a
comprehensive procedure named parametric pattern generation (PPG) is proposed
in order to meet these requirements. In the evaluation phase of PPG, a new
metric is proposed to represent pattern characteristics when detecting SDDs and
gross TDFs. In the selection phase of PPG, pattern quality is considered by
excluding detection redundancy. PPG is mathematically modeled and solved using
the gradient descent concept. By learning from the previous pattern selection
efficiency, a new selection is performed in a more effective way until the
pattern set is finally generated. While utilizing PPG, the final pattern set
can be framed in 1-detect volume with high SDD detection efficiency while
meeting the target TDF coverage. Experimental results on both IWLS and ISCAS'89
benchmarks demonstrate the efficiency of the proposed method.

Guo, X. ; Karri, R. 
Recomputing with Permuted Operands: A Concurrent Error Detection Approach
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600917

Naturally occurring and maliciously injected faults reduce the reliability of
cryptographic hardware and may leak confidential information. We develop a
concurrent error detection technique (CED) called recomputing with permuted
operands (REPO). We show that it is cost effective in advanced encryption
standard (AES) and a secure hash function Græstl. We provide experimental
results and formal proofs to show that REPO detects all single-bit and
single-byte faults. Experimental results show that REPO achieves close to 100%
fault coverage for multiple byte faults. The hardware and throughput overheads
are compared with those of previously reported CED techniques on two Xilinx
Virtex FPGAs. The hardware overhead is 12.4%Š27.3%, and the throughput is
1.2Š23 Gbps, depending on the AES architecture, FPGA family, and detection
latency. The performance overhead ranges from 10% to 100% depending on the
security level. Moreover, the proposed technique can be integrated into various
block cipher modes of operation. We also discuss the limitation of REPO and its
potential vulnerabilities.

VERIFICATION

Keng, B. ; Veneris, A. 
Path-Directed Abstraction and Refinement for SAT-Based Design Debugging 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601029

Functional verification has become one of the most time-consuming tasks in the
very large scale integration design flow accounting for up to 57% of the total
project time. The largest component of this task is that of design debugging
due to its resource-intensive manual nature. With the ever growing size of
modern designs and their error traces, the complexity of the debugging problem
poses a great challenge to automated debugging techniques. To overcome this
challenge, this paper introduces a novel path-directed abstraction and
refinement algorithm for design debugging to manage excessive error trace
lengths. A sliding window of the error trace is iteratively analyzed in a
time-windowing framework, which is made possible by the use of the
path-directed abstraction. This abstraction forms a concise approximation of
nonmodeled parts of the error trace while simultaneously providing an efficient
representation for refinement. The result is an algorithm that dramatically
reduces the memory requirements of debugging while mitigating the incomplete
results of past techniques. Experimental results on industrial designs with
long error traces show that the proposed approach can analyze traces that are
64.6% longer while simultaneously decreasing peak memory usage compared to
previous work.

SHORT PAPERS

Shelar, R.S. ; Patyra, M.
Impact of Local Interconnects on Timing and Power in a High Performance
Microprocessor
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601022

In nanometer technologies, local interconnects are believed to cause a major
impact on timing and power in VLSI circuits. To assess the impact of the
interconnects on timing and power in a real high performance microprocessor
design in a quantitative manner, this article presents results from an
extensive study carried out on RTL-to-layout synthesized blocks in a 45-nm
technology core. The study shows that the interconnects in these blocks account
for 30% of the cycle time, on an average, on the worst internal timing paths
and contribute nearly one-third to the power dissipation. This points to
severity of impact due to the interconnects in today's high performance
designs.

Long, J. ; Li, D. ; Memik, S.O. ; Ulgen, S. 
Theory and Analysis for Optimization of On-Chip Thermoelectric Cooling Systems
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601030

We established a novel theoretical analysis framework for optimizing the
cooling system configuration of chips employing thermoelectric cooling (TEC)
elements by extending the theory of inverse-positive matrices and the
eigenvalue/eigenvector theory in linear algebra. In this brief, we present a
new theorem and its formal proof, which is the key enabler to achieving a
provably optimal solution for configuring bias current levels of TEC devices.

Zhang, C. ; Yu, W. 
Efficient Space Management Techniques for Large-Scale Interconnect Capacitance
Extraction With Floating Random Walks
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600918

In the capacitance extraction with the floating random walk (FRW) algorithm,
the space management approach is required to facilitate finding the nearest
conductor. The Octree and grid-based spatial structures have been used to
decompose the whole domain into cells and to store information of local
conductors. In this letter, the techniques with the distance limit of cell and
only searching in cell's neighbor region are proposed to accelerate the
construction of the spatial structures. A fast inquiry technique is proposed to
fasten the nearest conductor query. We also propose a gridŠOctree hybrid
structure, which has advantages over existing structures. Experiments on large
very large scale integration structures with up to 484441 conductors have
validated the efficiency of the proposed techniques. The improved FRW algorithm
is faster than RWCap for thousands times while extracting a single net, and
several to tens times while extracting 100 nets.