February 2013 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-02.txt SPECIAL SECTION ON CROSS-DOMAIN PHYSICAL OPTIMIZATION GUEST EDITORIAL Hu, J.; Koh, C.-K. Guest Editorial http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416102 SPECIAL SECTION REGULAR PAPERS Yan, J. Z.; Chu, C. SDS: An Optimal Slack-Driven Block Shaping Algorithm for Fixed-Outline Floorplanning http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416107 This paper presents an efficient, scalable, and optimal slack-driven shaping algorithm for soft blocks in nonslicing floorplan. The proposed algorithm is called SDS. SDS is specifically formulated for fixed-outline floorplanning. Given a fixed upper bound on the layout width, SDS minimizes the layout height by only shaping the soft blocks in the design. Iteratively, SDS shapes some soft blocks to minimize the layout height with the guarantee that the layout width would not exceed the given upper bound. Rather than using some simple heuristic as in previous work, the amount of change on each block is determined by systematically distributing the global total amount of available slack to individual block. During the whole shaping process, the layout height monotonically reduces and eventually converges to an optimal solution. Two optimality conditions are presented to check the optimality of a shaping solution for fixed-outline floorplanning. In practice, to terminate the process of convergence early, we propose two different stopping criteria. We also extend SDS to handle other floorplanning problems, e.g., classical floorplanning. To validate the efficiency and effectiveness of SDS, comprehensive experiments are conducted on MCNC and HB benchmarks. Compared with previous work, SDS achieves the best experimental result with a significantly faster runtime. Fang, S.-Y.; Chen, W.-Y.; Chang, Y.-W. Graph-Based Subfield Scheduling for Electron-Beam Photomask Fabrication http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416095 Electron beam lithography has shown great promise in photomask fabrication; however, its successive heating process centralizing in a small region may cause a severe problem of critical dimension (CD) distortion. Consequently, subfield scheduling that reorders the sequence of the writing process is needed to avoid successive writing of neighboring subfields. In addition, the writing process of a subfield raises the temperature of neighboring regions and may block other subfields for writing. This paper presents the first work to solve the subfield scheduling problem while considering blocked regions by formulating the problem into a constrained maximum scatter traveling salesman problem (constrained MSTSP). To tackle the constrained MSTSP that can be shown to be NP-complete in general, we identify a special case thereof with points on two parallel lines and solve it optimally in linear time. We then decompose the constrained MSTSP into subproblems conforming to the special case, solve each subproblem optimally and efficiently by a graph-based algorithm, and then merge the subsolutions into a complete scheduling solution. We also extend our algorithm to handle the cases when the moving time of an e-beam writing head is comparable with the writing time of a subfield. Experimental results show that our algorithms are effective and efficient in finding good subfield scheduling solutions that can alleviate the successive heating problem (and thus reduce CD distortion) for e-beam photomask fabrication. Ghaida, R. S.; Agarwal, K. B.; Nassif, S. R.; Yuan, X.; Liebmann, L. W.; Gupta, P. Layout Decomposition and Legalization for Double-Patterning Technology http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416090 The use of multiple-patterning (MP) optical lithography for sub-20 nm technologies has inevitably become slow to adopt the next generation of lithography systems. The biggest technical challenge of MP is failure to reach a manufacturable layout-coloring solution, especially in dense layouts. This paper offers a postlayout solution for the removal of conflicts, i.e., patterns that cannot be assigned to different masks without violating spacing rules. The proposed method essentially consists of three steps: 1) layout coloring; 2) exposure layers; 3) geometric rules definition; and 4) layout legalization using compaction and MP rules as constraints. The method is general and can be used for different MP technologies, including lithography-etch, lithography-etch double-patterning (DP), triple patterning/MP (i.e., multiple litho-etch steps), and self-aligned DP (SADP). For demonstration purposes, we apply the proposed method in this paper to remove conflicts in DP. We offer an $O(n)$ layout-coloring heuristic algorithm for DP, which is up to $80times$ faster than the integer linear program-based approach. The conflict-removal problem is formulated as a linear program, which permits an extremely fast runtime (less than 1 min in real time for macro layouts). The method was tested on standard cells and macro layouts from a commercial 22-nm library designed without any MP awareness. For many cells, the method removes all conflicts without any area increase. For some complex cells and macros, the method still removes all conflicts but with a modest 6% average increase in area. Chang, J.-W.; Yeh, S.-H.; Huang, T.-W.; Ho, T.-Y. Integrated Fluidic-Chip Co-Design Methodology for Digital Microfluidic Biochips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416099 Recently, digital microfluidic biochips (DMFBs) have revolutionized many biochemical laboratory procedures and received much attention due to their many advantages, such as high throughput, automatic control, and low cost. To meet the challenges of increasing design complexity, computer-aided-design (CAD) tools have been used to build DMFBs efficiently. Current CAD tools generally conduct a two-stage based design flow of fluidic-level synthesis followed by chip-level design to optimize fluidic behaviors and chip architecture separately. Nevertheless, existing fluidic-chip design gap will become even wider with a rapid escalation in the number of assay operations incorporated into a single DMFB. As more and more large-scale assay protocols are delivered in the current emerging marketplace, this problem may potentially restrict the effectiveness and feasibility of the entire DMFB realization and thus needs to be solved quickly. In this paper, we propose the first fluidic-chip co-design methodology for DMFBs to effectively bridge the fluidic-chip design gap. Our work provides a comprehensive integration throughout fluidic-operation scheduling, chip layout generation, control pin assignment, and wiring solution to achieve higher design performance and feasibility. Experimental results show the effectiveness, robustness, and scalability of our co-design methodology on a set of real-life assay applications. Ward, S. I.; Kim, M.-C.; Viswanathan, N.; Li, Z.; Alpert, C. J.; Swartzlander, E. E.; Pan, D. Z. Structure-Aware Placement Techniques for Designs With Datapaths http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416106 As technology scales and frequencies increase, a new hybrid design style emerges, wherein designs contain a mixture of random logic and datapath standard-cell components. This paper demonstrates that conventional half-perimeter wirelength driven placers underperform in terms of regularity and Steiner wirelength (StWL) for such hybrid designs. In addition, the quality gap between manual and automatic placement is more pronounced as the designs become more datapath oriented. To effectively handle hybrid designs, this paper proposes a new unified placement flow that simultaneously places random logic and datapath cells. This flow is built on the top of a leading academic force-directed placer and significantly improves the quality of datapath placement while leveraging the speed and flexibility of existing random-logic placement algorithms. It consists of a suite of novel global and detailed placement techniques, collectively called structure-aware placement techniques (SAPT). These techniques effectively integrate alignment constraints into placement, thereby overcoming the deficiencies of existing random-logic placers when handling designs with embedded datapaths. Compared to other state-of-the-art placers, SAPT improves total StWL by more than 28% and total routing overflow by over six times on the ISPD 2011 datapath benchmark suite. In addition, it improves total StWL by 5.8% on industrial hybrid designs. SPECIAL SECTION SHORT PAPER Chang, C.-L.; Jiang, I. H.-R. Pulsed-Latch Replacement Using Concurrent Time Borrowing and Clock Gating http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416100 Flip-flops are the most common form of sequencing elements; however, they have a significantly higher sequencing overhead than latches in terms of delay, power, and area. Hence, pulsed latches are a promising option to reduce power for high-performance circuits. In this paper, to save power and compensate for timing violations, we fully utilize the intrinsic time borrowing property of pulsed latches and consider clock gating during pulsed-latch replacement. Experimental results show that our approach can generate very power efficient results. REGULAR PAPERS EMBEDDED SYSTEMS Lee, J.; Kim, Y.; Shipman, G. M.; Oral, S.; Kim, J. Preemptible I/O Scheduling of Garbage Collection for Solid State Drives http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416101 Unlike hard disks, flash devices use out-of-place updates operations and require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in an I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-PGC scheme with realistic workloads demonstrates both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-PGC scheme. In addition, we explore opportunities of a new NAND flash device that supports suspend/resume commands for read, write, and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC enabled flash device show that request response time can be improved by up to 14.57% compared to semi-PGC. MODELING AND SIMULATION Chen, X.; Wang, Y.; Yang, H. NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416098 The sparse matrix solver has become a bottleneck in simulation program with integrated circuit emphasis (SPICE)-like circuit simulators. It is difficult to parallelize the solver because of the high data dependency during the numeric LU factorization and the irregular structure of circuit matrices. This paper proposes an adaptive sparse matrix solver called NICSLU, which uses a multithreaded parallel LU factorization algorithm on shared-memory computers with multicore/multisocket central processing units to accelerate circuit simulation. The solver can be used in all the SPICE-like circuit simulators. A simple method is proposed to predict whether a matrix is suitable for parallel factorization, such that each matrix can achieve optimal performance. The experimental results on 35 matrices reveal that NICSLU achieves speedups of $2.08timessim 8.57times~({rm on~the~geometric~mean})$, compared with KLU, with 1Ð12 threads, for the matrices which are suitable for the parallel algorithm. NICSLU can be downloaded from http://nicslu.weebly.com. Shi, G. Graph-Pair Decision Diagram Construction for Topological Symbolic Circuit Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416104 Symbolic circuit analysis is concerned with analytical construction of circuit response in the frequency (or time) domain, for which an efficient data structure is required. Recent research has justified that the binary decision diagram (BDD) is a superior data structure with the following distinguishing feature: a large number of product terms can be compactly represented by a BDD, on which numerical computations and analytical deductions can be performed directly. Using BDD for symbolic circuit analysis requires an efficient method for construction. In this paper, a graph-based construction method, called graph-pair decision diagram (GPDD), is developed. Given a small-signal circuit, a pair of graphs representing the circuit is created, from which a GPDD is constructed by successively reducing the graph pair. The GPDD algorithm, which generates cancellation-free symbolic terms, differs from the existing determinant decision diagram (DDD) algorithm. Detailed theory and implementable algorithms for the GPDD construction are developed, and a runtime performance comparison to DDD is made. It is demonstrated that the runtime performance using GPDD is comparable to that of DDD in terms of time and memory complexity for exact symbolic analysis, although the GPDD algorithm has to generate a much larger number of symbolic product terms. TEST Sinanoglu, O. Scan to Nonscan Conversion via Test Cube Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416105 Increasing complexity of integrated circuits has forced the industry to abandon partial scan, which necessitates a computationally demanding and unaffordable sequential automatic test pattern generation (ATPG), and to instead adopt full scan, despite its costs. In this paper, we propose a partial scan scheme driven by a computationally efficient test cube analysis. We tackle the challenges associated with the identification of the conditions to restore the controllability and observability compromised due to partial scan, and with the formulation of these conditions in terms of test cube operations. Upon the identification of a maximal-sized set of scan flip-flops that are converted to nonscan, a simple postprocessing of the test cubes helps compute the values to be loaded into the scan flip-flops, eliminating the need to rerun ATPG, while at the same time ensuring the quality of full scan. We further enhance this framework through techniques that process the test data before and after the application of the proposed test cube analysis-driven partial scan technique, in order to enlarge the size of the nonscan flip-flop set. The proposed scheme combines the simplicity of the conventional ATPG flow with the area, performance, test time, and test power reduction benefits of partial scan. The proposed test cube analysis-driven partial scan scheme is orthogonal and thus fully compatible with other test cost-reduction techniques, such as test data compression and test power reduction, which can be applied in conjunction. Arumi, D.; Rodriguez-Montanes, R.; Figueras, J.; Eichenberger, S.; Hora, C.; Kruseman, B. Diagnosis of Interconnect Full Open Defects in the Presence of Gate Leakage Currents http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416092 Accurate diagnosis of open defects is key to identifying process problems and achieving fast yield improvement. Current diagnosis methodologies for interconnect full open defects have demonstrated their efficiency, assuming that the defective line voltage is mainly determined by neighboring lines and downstream transistor parasitic capacitances. However, the continuous reduction of oxide thickness with every technology node increases gate leakage current significantly, even for high-$k$ dielectrics. In this context, in nanometer CMOS technologies, the defective line cannot be assumed to be electrically isolated because of the impact of gate leakage currents, which may invalidate diagnosis results provided by present methodologies. In this paper, a new methodology is proposed to diagnose interconnect full open defects under the influence of gate leakage currents in CMOS technologies. To the authors' knowledge, such influence has not been previously considered by any of the existing methodologies. Simulation and experimental results demonstrate the proposal efficiency. SHORT PAPERS Traversa, F. L.; Bonani, F. Selective Determination of Floquet Quantities for the Efficient Assessment of Limit Cycle Stability and Oscillator Noise http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416094 We present a mixed time-frequency domain algorithm for the approximate computation of the Floquet quantities (exponents and both direct and adjoint eigenvectors) resulting from the linearization of index-1 differential-algebraic equations around a periodic limit cycle. The approach allows to select the number of Floquet exponents to be calculated approximating the matrix of the obtained eigenvalue problem. The error in the evaluation (for both exponents and eigenvectors) is proved to tend to zero along with the ratio between the norms of the neglected and retained rows of the relevant matrix. Li, M.; Chen, R.-S.; Wang, H.; Fan, Z.; Hu, Q. A Multilevel FFT Method for the 3-D Capacitance Extraction http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6416103 We present a new multilevel fast Fourier transform (MLFFT) method and its application for the 3-D structures capacitance extraction. The multilevel octree structure, multilevel interpolation/projection, and subdomain FFTs techniques are employed in the MLFFT. Distinctions between the MLFFT and the conventional FFT-based methods are discussed carefully. Numerical results demonstrate that the MLFFT is more efficient than conventional FFT based methods, especially when analyzing inhomogeneous problems.