October 2013 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-10.txt KEYNOTE PAPER Pan, D.Z. ; Yu, B. ; Gao, J.-R. Design for Manufacturing With Emerging Nanolithography http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601025 In this paper, we survey key design for manufacturing issues for extreme scaling with emerging nanolithography technologies, including double/multiple patterning lithography, extreme ultraviolet lithography, and electron-beam lithography. These nanolithography and nanopatterning technologies have different manufacturing processes and their unique challenges to very large scale integration (VLSI) physical design, mask synthesis, and so on. It is essential to have close VLSI design and underlying process technology co-optimization to achieve high product quality (power/performance, etc.) and yield while making future scaling cost-effective and worthwhile. Recent results and examples will be discussed to show the enablement and effectiveness of such design and process integration, including lithography model/analysis, mask synthesis, and lithography friendly physical design. REGULAR PAPERS EMERGING TECHNOLOGIES Chen, Y.-C. ; Wang, C.-Y. ; Huang, C.-Y. Verification of Reconfigurable Binary Decision Diagram-Based Single-Electron Transistor Arrays http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600827 Recently, single-electron transistors (SETs) have been attracting substantial attention and are considered candidate devices for future integrated circuits due to their ultralow power consumption. To realize SETs, a binary decision diagram-based SET array is proposed as a suitable candidate for implementing Boolean circuits. Then, some works started developing computer-aided design techniques for this new architecture. However, most of them focused on the development of mapping techniques. How to verify the mapping results is still an open problem. Thus, in this paper, we address this problem and develop a satisfiability (SAT)-based verification method. We propose a transformation approach to model the functionality of a mapped SET array as a conjunctive normal form formula. Then, the problem that whether the SET array is functionally equivalent to its specification circuit can be solved with a SAT solver. The experimental results show that the proposed method can successfully verify correct and incorrect SET array implementations with reasonable verification time. Huang, J.-D. ; Liu, C.-H. ; Lin, H.-S. Reactant and Waste Minimization in Multitarget Sample Preparation on Digital Microfluidic Biochips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600878 Sample preparation is one of essential processes in biochemical reactions. Raw reactants are diluted in this process to achieve given target concentrations. A bioassay may require several different target concentrations of a reactant. Both the dilution operation count and the reactant usage can be minimized if multiple target concentrations are considered simultaneously during sample preparation. Hence, in this paper, we propose a multitarget sample preparation algorithm that extensively exploits the ideas of waste recycling and intermediate droplet sharing to reduce both reactant usage and waste amount for digital microfluidic biochips. Experimental results show that our waste recycling algorithm can reduce the waste and operation count by 48% and 37%, respectively, as compared to an existing state-of-the-art multitarget sample preparation method if the number of target concentrations is ten. The reduction can be up to 97% and 73% when the number of target concentrations goes even higher. HIGH-LEVEL SYNTHESIS Damavandpeyma, M. ; Stuijk, S. ; Basten, T. ; Geilen, M. ; Corporaal, H. Schedule-Extended Synchronous Dataflow Graphs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600844 Synchronous dataflow graphs (SDFGs) are used extensively to model streaming applications. An SDFG can be extended with scheduling decisions, allowing SDFG analysis to obtain properties, such as throughput or buffer sizes for the scheduled graphs. Analysis times depend strongly on the size of the SDFG. SDFGs can be statically scheduled using static-order schedules. The only generally applicable technique to model a static-order schedule in an SDFG is to convert it to a homogeneous SDFG (HSDFG). This may lead to an exponential increase in the size of the graph and to suboptimal analysis results (e.g., for buffer sizes in multiprocessors). We present techniques to model two types of static-order schedules, i.e., periodic schedules and periodic single appearance schedules, directly in an SDFG. Experiments show that both techniques produce more compact graphs compared to the technique that relies on a conversion to an HSDFG. This results in reduced analysis times for performance properties and tighter resource requirements. MODELING AND SIMULATION Paek, S. ; Shin, W. ; Sim, J. ; Kim, L.-S. PowerField: A Probabilistic Approach for Temperature-to-Power Conversion Based on Markov Random Field Theory http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600867 Temperature-to-power technique is useful for post-silicon power model validation. However, the previous works were applicable only to the steady-state analysis. In this paper, we propose a new temperature-to-power technique, named PowerField, supporting both transient and steady-state analysis based on a probabilistic approach. Unlike the previous works, PowerField uses two consecutive thermal images to find the most feasible power distribution that causes the change between the two input images. To obtain the power map with the highest probability, we adopted maximum a posteriori Markov random field (MAP-MRF). For MAP-MRF framework, we modeled the spatial thermal system as a set of thermal nodes and derived an approximated transient heat transfer equation that requires only the local information of each thermal node. Experimental results with a thermal simulator show that PowerField outperforms the previous method in transient analysis reducing the error by half on average. We also show that our framework works well for steady-state analysis by using two identical steady-state thermal maps as inputs. Lastly, an application to determining the binary power patterns of an FPGA device is presented achieving 90.7% average accuracy. Han, Y. ; Zhao, J. Accurate Substrate Analysis Based on a Novel Finite Difference Method via Synchronization Method on Layered and Adaptive Meshing http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600985 In this paper, we present a finite difference method (FDM) based on layered and adaptive meshing to extract substrate resistance. To allow adaptive meshing, a novel synchronization method is introduced which expresses the potential at other points than the centroids of panels by modifying the finite difference equations. This method makes it possible to overcome the high computational cost of the FDM due to tight uniform meshing requirements while enjoying a straightforward implementation and easy handling of irregular/arbitrary substrate structures to extract the substrate coupling of integrated circuits. Zhang, Z. ; El-Moselhy, T.A. ; Elfadel, I.M. ; Daniel, L. Stochastic Testing Method for Transistor-Level Uncertainty Quantification Based on Generalized Polynomial Chaos http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600994 Uncertainties have become a major concern in integrated circuit design. In order to avoid the huge number of repeated simulations in conventional Monte Carlo flows, this paper presents an intrusive spectral simulator for statistical circuit analysis. Our simulator employs the recently developed generalized polynomial chaos expansion to perform uncertainty quantification of nonlinear transistor circuits with both Gaussian and non-Gaussian random parameters. We modify the nonintrusive stochastic collocation (SC) method and develop an intrusive variant called stochastic testing (ST) method. Compared with the popular intrusive stochastic Galerkin (SG) method, the coupled deterministic equations resulting from our proposed ST method can be solved in a decoupled manner at each time point. At the same time, ST requires fewer samples and allows more flexible time step size controls than directly using a nonintrusive SC solver. These two properties make ST more efficient than SG and than existing SC methods, and more suitable for time-domain circuit simulation. Simulation results of several digital, analog and RF circuits are reported. Since our algorithm is based on generic mathematical models, the proposed ST algorithm can be applied to many other engineering problems. PHYSICAL DESIGN He, X. ; Huang, T. ; Xiao, L. ; Tian, H. ; Young, E.F.Y. Ripple: A Robust and Effective Routability-Driven Placer http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600938 The significant mismatch between the objective of wirelength and routing congestion makes the routability issue even more important in placement. In this paper, we describe a routability-driven placer called Ripple. Each step, including global placement, legalization, and detailed placement, is made to trade-off between routability and wirelength. We propose a robust and effective flow by using cell inflation to relieve routing congestion. Cell inflation has traditionally been used to deal with congestion and we will discuss how this technique can be used easily and robustly in the global placement. Besides, unlike many previous works that focus on different types of swapping strategies, we analyze and propose some simple and effective approaches when considering routability in the legalization and detailed placement steps. Experimental results show that Ripple is particularly effective in improving routability. When compared to the top results in the ISPD 2011 Contest and SimPLR, Ripple can obtain the smallest overflow and half-perimeter wirelength on average, while the congestion hot spots are also distributed sparsely in Ripple. Chang, F.-Y. ; Tsay, R.-S. ; Mak, W.-K. ; Chen, S.-H. MANA: A Shortest Path Maze Algorithm Under Separation and Minimum Length Nanometer Rules http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600923 Due to process limitations, wiring rules are imposed on chip layout by foundries. Under nanometer wiring rules, the required separation between two wire ends is dependent on their surrounding wires, and there is a limit on the minimum length of each wire segment. However, traditional shortest path algorithms are not properly designed for these rules. In the paper, we propose a maze routing algorithm, called MANA, capable of finding legal shortest paths under these rules. Experiments with seven industrial cases show that by handling these rules during maze routing, 94% of the violations are prevented on average, and the overall runtime of a commercial router is reduced by 71%. In addition, the total wire length is also reduced by 3% on average. SYSTEM-LEVEL DESIGN Nikitin, N. ; de San Pedro, J. ; Cortadella, J. Architectural Exploration of Large-Scale Hierarchical Chip Multiprocessors http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600876 The continuous scaling of nanoelectronics is increasing the complexity of chip multiprocessors (CMPs) and exacerbating the memory wall problem. As CMPs become more complex, the memory subsystem is organized into more hierarchical structures to better exploit locality. To efficiently discover promising architectures within the rapidly growing search space, exhaustive exploration is replaced with tools that implement intelligent search strategies. Moreover, faster analytical models are preferred to costly simulations for estimating the performance and power of CMP architectures. The memory traffic generated by CMP cores has a cyclic dependency with the latency of the memory subsystem, which critically affects the overall system performance. Based on this observation, a novel scalable analytical method is proposed to estimate the performance of highly parallel CMPs (hundreds or thousands of cores) with hierarchical interconnect networks. The method can use customizable probabilistic models and solves the cyclic dependencies between traffic and latency by using a fixed-point strategy. By using the analytical model as a performance and power estimator, an efficient metaheuristic-based search is proposed for the exploration of large design spaces. The proposed techniques are shown to be very accurate and a promising strategy when compared to the results obtained by simulation. TEST Bao, F. ; Peng, K. ; Tehranipoor, M. ; Chakrabarty, K. Generation of Effective 1-Detect TDF Patterns for Detecting Small-Delay Defects http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601027 Testing for small-delay defects (SDDs) has become necessary for high-quality products (e.g., automotive applications) as technology scales further and functional frequency increases. Traditional timing-unaware transition- delay fault (TDF) ATPG is not adequate for detecting SDDs. Commercial timing-aware ATPGs suffer from large CPU runtime and huge pattern count. However, a small pattern set with high SDD coverage is desired by industry. In this paper, a comprehensive procedure named parametric pattern generation (PPG) is proposed in order to meet these requirements. In the evaluation phase of PPG, a new metric is proposed to represent pattern characteristics when detecting SDDs and gross TDFs. In the selection phase of PPG, pattern quality is considered by excluding detection redundancy. PPG is mathematically modeled and solved using the gradient descent concept. By learning from the previous pattern selection efficiency, a new selection is performed in a more effective way until the pattern set is finally generated. While utilizing PPG, the final pattern set can be framed in 1-detect volume with high SDD detection efficiency while meeting the target TDF coverage. Experimental results on both IWLS and ISCAS'89 benchmarks demonstrate the efficiency of the proposed method. Guo, X. ; Karri, R. Recomputing with Permuted Operands: A Concurrent Error Detection Approach http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600917 Naturally occurring and maliciously injected faults reduce the reliability of cryptographic hardware and may leak confidential information. We develop a concurrent error detection technique (CED) called recomputing with permuted operands (REPO). We show that it is cost effective in advanced encryption standard (AES) and a secure hash function Gr¿stl. We provide experimental results and formal proofs to show that REPO detects all single-bit and single-byte faults. Experimental results show that REPO achieves close to 100% fault coverage for multiple byte faults. The hardware and throughput overheads are compared with those of previously reported CED techniques on two Xilinx Virtex FPGAs. The hardware overhead is 12.4%Ð27.3%, and the throughput is 1.2Ð23 Gbps, depending on the AES architecture, FPGA family, and detection latency. The performance overhead ranges from 10% to 100% depending on the security level. Moreover, the proposed technique can be integrated into various block cipher modes of operation. We also discuss the limitation of REPO and its potential vulnerabilities. VERIFICATION Keng, B. ; Veneris, A. Path-Directed Abstraction and Refinement for SAT-Based Design Debugging http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601029 Functional verification has become one of the most time-consuming tasks in the very large scale integration design flow accounting for up to 57% of the total project time. The largest component of this task is that of design debugging due to its resource-intensive manual nature. With the ever growing size of modern designs and their error traces, the complexity of the debugging problem poses a great challenge to automated debugging techniques. To overcome this challenge, this paper introduces a novel path-directed abstraction and refinement algorithm for design debugging to manage excessive error trace lengths. A sliding window of the error trace is iteratively analyzed in a time-windowing framework, which is made possible by the use of the path-directed abstraction. This abstraction forms a concise approximation of nonmodeled parts of the error trace while simultaneously providing an efficient representation for refinement. The result is an algorithm that dramatically reduces the memory requirements of debugging while mitigating the incomplete results of past techniques. Experimental results on industrial designs with long error traces show that the proposed approach can analyze traces that are 64.6% longer while simultaneously decreasing peak memory usage compared to previous work. SHORT PAPERS Shelar, R.S. ; Patyra, M. Impact of Local Interconnects on Timing and Power in a High Performance Microprocessor http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601022 In nanometer technologies, local interconnects are believed to cause a major impact on timing and power in VLSI circuits. To assess the impact of the interconnects on timing and power in a real high performance microprocessor design in a quantitative manner, this article presents results from an extensive study carried out on RTL-to-layout synthesized blocks in a 45-nm technology core. The study shows that the interconnects in these blocks account for 30% of the cycle time, on an average, on the worst internal timing paths and contribute nearly one-third to the power dissipation. This points to severity of impact due to the interconnects in today's high performance designs. Long, J. ; Li, D. ; Memik, S.O. ; Ulgen, S. Theory and Analysis for Optimization of On-Chip Thermoelectric Cooling Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601030 We established a novel theoretical analysis framework for optimizing the cooling system configuration of chips employing thermoelectric cooling (TEC) elements by extending the theory of inverse-positive matrices and the eigenvalue/eigenvector theory in linear algebra. In this brief, we present a new theorem and its formal proof, which is the key enabler to achieving a provably optimal solution for configuring bias current levels of TEC devices. Zhang, C. ; Yu, W. Efficient Space Management Techniques for Large-Scale Interconnect Capacitance Extraction With Floating Random Walks http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6600918 In the capacitance extraction with the floating random walk (FRW) algorithm, the space management approach is required to facilitate finding the nearest conductor. The Octree and grid-based spatial structures have been used to decompose the whole domain into cells and to store information of local conductors. In this letter, the techniques with the distance limit of cell and only searching in cell's neighbor region are proposed to accelerate the construction of the spatial structures. A fast inquiry technique is proposed to fasten the nearest conductor query. We also propose a gridÐOctree hybrid structure, which has advantages over existing structures. Experiments on large very large scale integration structures with up to 484441 conductors have validated the efficiency of the proposed techniques. The improved FRW algorithm is faster than RWCap for thousands times while extracting a single net, and several to tens times while extracting 100 nets.