February 2013 Newsletter
Placing you one click away from the best new CAD research!
Plain-text version at 
http://www.umn.edu/~tcad/newsletter/2013-02.txt


SPECIAL SECTION ON CROSS-DOMAIN PHYSICAL OPTIMIZATION

GUEST EDITORIAL
Hu, J.; Koh, C.-K.
Guest Editorial
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416102

SPECIAL SECTION REGULAR PAPERS

Yan, J. Z.; Chu, C.
SDS: An Optimal Slack-Driven Block Shaping Algorithm for Fixed-Outline
Floorplanning
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416107

This paper presents an efficient, scalable, and optimal slack-driven shaping
algorithm for soft blocks in nonslicing floorplan. The proposed algorithm is
called SDS. SDS is specifically formulated for fixed-outline floorplanning.
Given a fixed upper bound on the layout width, SDS minimizes the layout height
by only shaping the soft blocks in the design. Iteratively, SDS shapes some
soft blocks to minimize the layout height with the guarantee that the layout
width would not exceed the given upper bound. Rather than using some simple
heuristic as in previous work, the amount of change on each block is determined
by systematically distributing the global total amount of available slack to
individual block. During the whole shaping process, the layout height
monotonically reduces and eventually converges to an optimal solution. Two
optimality conditions are presented to check the optimality of a shaping
solution for fixed-outline floorplanning. In practice, to terminate the process
of convergence early, we propose two different stopping criteria. We also
extend SDS to handle other floorplanning problems, e.g., classical
floorplanning. To validate the efficiency and effectiveness of SDS,
comprehensive experiments are conducted on MCNC and HB benchmarks. Compared
with previous work, SDS achieves the best experimental result with a
significantly faster runtime.

Fang, S.-Y.; Chen, W.-Y.; Chang, Y.-W.
Graph-Based Subfield Scheduling for Electron-Beam Photomask Fabrication
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416095

Electron beam lithography has shown great promise in photomask fabrication;
however, its successive heating process centralizing in a small region may
cause a severe problem of critical dimension (CD) distortion. Consequently,
subfield scheduling that reorders the sequence of the writing process is needed
to avoid successive writing of neighboring subfields. In addition, the writing
process of a subfield raises the temperature of neighboring regions and may
block other subfields for writing. This paper presents the first work to solve
the subfield scheduling problem while considering blocked regions by
formulating the problem into a constrained maximum scatter traveling salesman
problem (constrained MSTSP). To tackle the constrained MSTSP that can be shown
to be NP-complete in general, we identify a special case thereof with points on
two parallel lines and solve it optimally in linear time. We then decompose the
constrained MSTSP into subproblems conforming to the special case, solve each
subproblem optimally and efficiently by a graph-based algorithm, and then merge
the subsolutions into a complete scheduling solution. We also extend our
algorithm to handle the cases when the moving time of an e-beam writing head is
comparable with the writing time of a subfield. Experimental results show that
our algorithms are effective and efficient in finding good subfield scheduling
solutions that can alleviate the successive heating problem (and thus reduce CD
distortion) for e-beam photomask fabrication.

Ghaida, R. S.; Agarwal, K. B.; Nassif, S. R.; Yuan, X.; Liebmann, L. W.; Gupta, P.
Layout Decomposition and Legalization for Double-Patterning Technology
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416090

The use of multiple-patterning (MP) optical lithography for sub-20 nm
technologies has inevitably become slow to adopt the next generation of
lithography systems. The biggest technical challenge of MP is failure to reach
a manufacturable layout-coloring solution, especially in dense layouts. This
paper offers a postlayout solution for the removal of conflicts, i.e., patterns
that cannot be assigned to different masks without violating spacing rules. The
proposed method essentially consists of three steps: 1) layout coloring; 2)
exposure layers; 3) geometric rules definition; and 4) layout legalization
using compaction and MP rules as constraints. The method is general and can be
used for different MP technologies, including lithography-etch,
lithography-etch double-patterning (DP), triple patterning/MP (i.e., multiple
litho-etch steps), and self-aligned DP (SADP). For demonstration purposes, we
apply the proposed method in this paper to remove conflicts in DP. We offer an
$O(n)$ layout-coloring heuristic algorithm for DP, which is up to $80times$
faster than the integer linear program-based approach. The conflict-removal
problem is formulated as a linear program, which permits an extremely fast
runtime (less than 1 min in real time for macro layouts). The method was tested
on standard cells and macro layouts from a commercial 22-nm library designed
without any MP awareness. For many cells, the method removes all conflicts
without any area increase. For some complex cells and macros, the method still
removes all conflicts but with a modest 6% average increase in area.

Chang, J.-W.; Yeh, S.-H.; Huang, T.-W.; Ho, T.-Y.
Integrated Fluidic-Chip Co-Design Methodology for Digital Microfluidic Biochips
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416099

Recently, digital microfluidic biochips (DMFBs) have revolutionized many
biochemical laboratory procedures and received much attention due to their many
advantages, such as high throughput, automatic control, and low cost. To meet
the challenges of increasing design complexity, computer-aided-design (CAD)
tools have been used to build DMFBs efficiently. Current CAD tools generally
conduct a two-stage based design flow of fluidic-level synthesis followed by
chip-level design to optimize fluidic behaviors and chip architecture
separately. Nevertheless, existing fluidic-chip design gap will become even
wider with a rapid escalation in the number of assay operations incorporated
into a single DMFB. As more and more large-scale assay protocols are delivered
in the current emerging marketplace, this problem may potentially restrict the
effectiveness and feasibility of the entire DMFB realization and thus needs to
be solved quickly. In this paper, we propose the first fluidic-chip co-design
methodology for DMFBs to effectively bridge the fluidic-chip design gap. Our
work provides a comprehensive integration throughout fluidic-operation
scheduling, chip layout generation, control pin assignment, and wiring solution
to achieve higher design performance and feasibility. Experimental results show
the effectiveness, robustness, and scalability of our co-design methodology on
a set of real-life assay applications.

Ward, S. I.; Kim, M.-C.; Viswanathan, N.; Li, Z.; Alpert, C. J.; Swartzlander,
E. E.; Pan, D. Z.
Structure-Aware Placement Techniques for Designs With Datapaths
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416106

As technology scales and frequencies increase, a new hybrid design style
emerges, wherein designs contain a mixture of random logic and datapath
standard-cell components. This paper demonstrates that conventional
half-perimeter wirelength driven placers underperform in terms of regularity
and Steiner wirelength (StWL) for such hybrid designs. In addition, the quality
gap between manual and automatic placement is more pronounced as the designs
become more datapath oriented. To effectively handle hybrid designs, this paper
proposes a new unified placement flow that simultaneously places random logic
and datapath cells. This flow is built on the top of a leading academic
force-directed placer and significantly improves the quality of datapath
placement while leveraging the speed and flexibility of existing random-logic
placement algorithms. It consists of a suite of novel global and detailed
placement techniques, collectively called structure-aware placement techniques
(SAPT). These techniques effectively integrate alignment constraints into
placement, thereby overcoming the deficiencies of existing random-logic placers
when handling designs with embedded datapaths. Compared to other
state-of-the-art placers, SAPT improves total StWL by more than 28% and total
routing overflow by over six times on the ISPD 2011 datapath benchmark suite.
In addition, it improves total StWL by 5.8% on industrial hybrid designs.

SPECIAL SECTION SHORT PAPER

Chang, C.-L.; Jiang, I. H.-R.
Pulsed-Latch Replacement Using Concurrent Time Borrowing and Clock Gating
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416100

Flip-flops are the most common form of sequencing elements; however, they have
a significantly higher sequencing overhead than latches in terms of delay,
power, and area. Hence, pulsed latches are a promising option to reduce power
for high-performance circuits. In this paper, to save power and compensate for
timing violations, we fully utilize the intrinsic time borrowing property of
pulsed latches and consider clock gating during pulsed-latch replacement.
Experimental results show that our approach can generate very power efficient
results.

REGULAR PAPERS

EMBEDDED SYSTEMS

Lee, J.; Kim, Y.; Shipman, G. M.; Oral, S.; Kim, J.
Preemptible I/O Scheduling of Garbage Collection for Solid State Drives
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416101

Unlike hard disks, flash devices use out-of-place updates operations and
require a garbage collection (GC) process to reclaim invalid pages to create
free blocks. This GC process is a major cause of performance degradation when
running concurrently with other I/O operations as internal bandwidth is
consumed to reclaim these invalid pages. The invocation of the GC process is
generally governed by a low watermark on free blocks and other internal device
metrics that different workloads meet at different intervals. This results in
an I/O performance that is highly dependent on workload characteristics. In
this paper, we examine the GC process and propose a semipreemptible GC (PGC)
scheme that allows GC processing to be preempted while pending I/O requests in
the queue are serviced. Moreover, we further enhance flash performance by
pipelining internal GC operations and merge them with pending I/O requests
whenever possible. Our experimental evaluation of this semi-PGC scheme with
realistic workloads demonstrates both improved performance and reduced
performance variability. Write-dominant workloads show up to a 66.56%
improvement in average response time with a 83.30% reduced variance in response
time compared to the non-PGC scheme. In addition, we explore opportunities of a
new NAND flash device that supports suspend/resume commands for read, write,
and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC
enabled flash device show that request response time can be improved by up to
14.57% compared to semi-PGC.

MODELING AND SIMULATION

Chen, X.; Wang, Y.; Yang, H. 
NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416098

The sparse matrix solver has become a bottleneck in simulation program with
integrated circuit emphasis (SPICE)-like circuit simulators. It is difficult to
parallelize the solver because of the high data dependency during the numeric
LU factorization and the irregular structure of circuit matrices. This paper
proposes an adaptive sparse matrix solver called NICSLU, which uses a
multithreaded parallel LU factorization algorithm on shared-memory computers
with multicore/multisocket central processing units to accelerate circuit
simulation. The solver can be used in all the SPICE-like circuit simulators. A
simple method is proposed to predict whether a matrix is suitable for parallel
factorization, such that each matrix can achieve optimal performance. The
experimental results on 35 matrices reveal that NICSLU achieves speedups of
$2.08timessim 8.57times~({rm on~the~geometric~mean})$, compared with KLU, with
1Ð12 threads, for the matrices which are suitable for the parallel algorithm.
NICSLU can be downloaded from http://nicslu.weebly.com.

Shi, G.
Graph-Pair Decision Diagram Construction for Topological Symbolic Circuit
Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416104

Symbolic circuit analysis is concerned with analytical construction of circuit
response in the frequency (or time) domain, for which an efficient data
structure is required. Recent research has justified that the binary decision
diagram (BDD) is a superior data structure with the following distinguishing
feature: a large number of product terms can be compactly represented by a BDD,
on which numerical computations and analytical deductions can be performed
directly. Using BDD for symbolic circuit analysis requires an efficient method
for construction. In this paper, a graph-based construction method, called
graph-pair decision diagram (GPDD), is developed. Given a small-signal circuit,
a pair of graphs representing the circuit is created, from which a GPDD is
constructed by successively reducing the graph pair. The GPDD algorithm, which
generates cancellation-free symbolic terms, differs from the existing
determinant decision diagram (DDD) algorithm. Detailed theory and implementable
algorithms for the GPDD construction are developed, and a runtime performance
comparison to DDD is made. It is demonstrated that the runtime performance
using GPDD is comparable to that of DDD in terms of time and memory complexity
for exact symbolic analysis, although the GPDD algorithm has to generate a much
larger number of symbolic product terms.

TEST

Sinanoglu, O.  Scan to Nonscan Conversion via Test Cube Analysis
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416105

Increasing complexity of integrated circuits has forced the industry to abandon
partial scan, which necessitates a computationally demanding and unaffordable
sequential automatic test pattern generation (ATPG), and to instead adopt full
scan, despite its costs. In this paper, we propose a partial scan scheme driven
by a computationally efficient test cube analysis. We tackle the challenges
associated with the identification of the conditions to restore the
controllability and observability compromised due to partial scan, and with the
formulation of these conditions in terms of test cube operations. Upon the
identification of a maximal-sized set of scan flip-flops that are converted to
nonscan, a simple postprocessing of the test cubes helps compute the values to
be loaded into the scan flip-flops, eliminating the need to rerun ATPG, while
at the same time ensuring the quality of full scan. We further enhance this
framework through techniques that process the test data before and after the
application of the proposed test cube analysis-driven partial scan technique,
in order to enlarge the size of the nonscan flip-flop set. The proposed scheme
combines the simplicity of the conventional ATPG flow with the area,
performance, test time, and test power reduction benefits of partial scan. The
proposed test cube analysis-driven partial scan scheme is orthogonal and thus
fully compatible with other test cost-reduction techniques, such as test data
compression and test power reduction, which can be applied in conjunction.

Arumi, D.; Rodriguez-Montanes, R.; Figueras, J.; Eichenberger, S.; Hora, C.;
Kruseman, B.
Diagnosis of Interconnect Full Open Defects in the Presence of Gate Leakage
Currents
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416092

Accurate diagnosis of open defects is key to identifying process problems and
achieving fast yield improvement. Current diagnosis methodologies for
interconnect full open defects have demonstrated their efficiency, assuming
that the defective line voltage is mainly determined by neighboring lines and
downstream transistor parasitic capacitances. However, the continuous reduction
of oxide thickness with every technology node increases gate leakage current
significantly, even for high-$k$ dielectrics. In this context, in nanometer
CMOS technologies, the defective line cannot be assumed to be electrically
isolated because of the impact of gate leakage currents, which may invalidate
diagnosis results provided by present methodologies. In this paper, a new
methodology is proposed to diagnose interconnect full open defects under the
influence of gate leakage currents in CMOS technologies. To the authors'
knowledge, such influence has not been previously considered by any of the
existing methodologies. Simulation and experimental results demonstrate the
proposal efficiency.

SHORT PAPERS

Traversa, F. L.; Bonani, F.
Selective Determination of Floquet Quantities for the Efficient Assessment of
Limit Cycle Stability and Oscillator Noise
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416094

We present a mixed time-frequency domain algorithm for the approximate
computation of the Floquet quantities (exponents and both direct and adjoint
eigenvectors) resulting from the linearization of index-1
differential-algebraic equations around a periodic limit cycle. The approach
allows to select the number of Floquet exponents to be calculated approximating
the matrix of the obtained eigenvalue problem. The error in the evaluation (for
both exponents and eigenvectors) is proved to tend to zero along with the ratio
between the norms of the neglected and retained rows of the relevant matrix.

Li, M.; Chen, R.-S.; Wang, H.; Fan, Z.; Hu, Q.
A Multilevel FFT Method for the 3-D Capacitance Extraction
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416103

We present a new multilevel fast Fourier transform (MLFFT) method and its
application for the 3-D structures capacitance extraction. The multilevel
octree structure, multilevel interpolation/projection, and subdomain FFTs
techniques are employed in the MLFFT. Distinctions between the MLFFT and the
conventional FFT-based methods are discussed carefully. Numerical results
demonstrate that the MLFFT is more efficient than conventional FFT based
methods, especially when analyzing inhomogeneous problems.