February 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-02.txt SPECIAL SECTION ON THE 2011 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN Hu, J. Koh, C. K. Guest Editorial http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132644 Yuan, K. Yu, B. Pan, D. Z. E-Beam Lithography Stencil Planning and Optimization With Overlapped Characters http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132658 Electronic beam lithography (EBL) is one of the promising emerging technologies in the sub-22 nm regime. In EBL, the desired circuit patterns are directly shot into the wafer, which overcomes the diffraction limit of light in the current optical lithography system. However, the low throughput becomes its key technical hurdle. In the conventional EBL system, each rectangle in the layout will be projected by one electronic shot through a variable shape beam (VSB). This could be extremely slow. As an improved EBL technology, character projection (CP) shoots complex shapes, so-called characters, in one time, by putting them into a predesigned stencil. However, only a limited number of characters can be employed, due to the area constraint. Those patterns, not contained by any character, are still required to be written by VSB. A key problem is how to select an optimal set of characters and pack them on the CP stencil to minimize total processing time. In this paper, we investigate a problem of electronic beam lithography stencil design with overlapped characters. Different from previous works, besides selecting appropriate characters, their placements on the stencil are also optimized in our framework. Specifically, we propose a Hamilton-path-based iterative algorithm to handle 1-D stencil design problem, and an effective simulated annealing framework for the generalized 2-D case with an efficient look-ahead sequence pair evaluation technique. The experimental results show that, compared to conventional stencil design methodology without overlapped characters, we are able to reduce total projection time by 51%. Wang, S.-H. Liang, Y.-Y. Kuo, T.-Y. Mak, W.-K. Power-Driven Flip-Flop Merging and Relocation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132656 We propose a power-driven flip-flop (FF) merging and relocation approach that can be applied after conventional timing-driven placement and before clock network synthesis. It targets to reduce the clock network size and thus the clock power consumption while controlling the switching power of the nets connected to the FFs by selectively merging FFs into multibit FFs and relocating them under timing and placement density constraints. The experimental results are very encouraging. For a set of benchmarks, our approach reduced the switching capacitance of clock network by 36%Ð43% after gated clock tree synthesis. Finally, the total switching capacitance of clock network and nets connected to the FFs is reduced by 24%Ð29%. Jiang, I. H.-R. Chang, C.-L. Yang, Y.-M. INTEGRA: Fast Multibit Flip-Flop Clustering for Clock Power Saving http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132648 Clock power is the major contributor to dynamic power for modern integrated circuit design. A conventional single- bit flip-flop cell uses an inverter chain with a high drive strength to drive the clock signal. Clustering several such cells and forming a multibit flip-flop can share the drive strength, dynamic power, and area of the inverter chain, and can even save the clock network power and facilitate the skew control. Hence, in this paper, we focus on postplacement multibit flip-flop clustering to gain these benefits. Utilizing the properties of Manhattan distance and coordinate transformation, we model the problem instance by two interval graphs and use a pair of linear-sized sequences as our representation. Without enumerating all possible combinations, we identify only partial sequences that are necessary to cluster flip-flops, thus leading to an efficient clustering scheme. Moreover, our fast coordinate transformation also makes the execution of our algorithm very efficient. The experiments are conducted on industrial circuits. Our results show that concise representation delivers superior efficiency and effectiveness. Even under timing and placement density constraints, clock power saving via multibit flip-flop clustering can still be substantial at postplacement. Lee, D.-J. Markov, I. L. Obstacle-Aware Clock-Tree Shaping During Placement http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132650 Traditional integrated circuit (IC) design flows optimize clock networks before signal-net routing are limited by the quality of register placement. Existing publications also reflect this bias and focus mostly on clock routing. The few known techniques for register placement exhibit significant limitations and do not account for recent progress in large-scale placement and obstacle-aware clock-network synthesis. In this paper, we integrate clock network synthesis within global placement by optimizing register locations. We propose: 1) obstacle-aware virtual clock-tree synthesis; 2) arboreal clock-net contraction force with virtual-node insertion, which can handle multiple clock domains and gated clocks; and 3) an obstacle-avoidance force (OAF). Our work is validated on large benchmarks with numerous macroblocks. Experimental results indicate that our software implementation, called Lopper, prunes clock-tree branches to reduce their length by 30.0%Ð36.6% and average total dynamic power consumption by 6.8%Ð 11.6% versus conventional wirelength-driven approaches. SPICE-driven simulations show that our methods improve robustness of clock trees. Lu, J. Mao, X. Taskin, B. Integrated Clock Mesh Synthesis With Incremental Register Placement http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132652 A clock mesh planning and synthesis method is proposed which significantly reduces the power dissipation on the network while considering the power density and timing slack simultaneously. The proposed method is performed at the postplacement stage and consists of three major steps: 1) feasible moving region construction of each register considering timing slack; 2) mesh grid wire generation and placement; and 3) incremental register placement for stub wire minimization considering power density and timing slack. The advantages of the proposed method are the reduced power dissipationÑ28% on average on the benchmark circuitsÑthe optimized power density, and the guaranteed nonnegative timing slack. These advantages are possible through a decreased timing slack (1.1% of the clock period) and change in the logic wirelength $(+5.9%)$ on the benchmark circuits. Knechtel, J. Markov, I. L. Lienig, J. Assembling 2-D Blocks Into 3-D Chips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132649 Despite numerous advantages of 3-D integrated circuits (ICs), their commercial success remains limited. In part, this is due to the wide availability of trustworthy intellectual property (IP) blocks developed for 2-D ICs and proven through repeated use. Block-based design reuse is imperative for heterogeneous 3-D ICs where memory, logic, analog, and microelectromechanical systems dies are manufactured at different technology nodes and circuit modules cannot be partitioned among several dies. In this paper, we show how to integrate 2-D IP blocks into 3-D chips without altering their layout. Experiments indicate that the overhead of proposed integration is small, which can help accelerate industry adoption of 3-D integration. Zhao, Y. Chakrabarty, K. Simultaneous Optimization of Droplet Routing and Control-Pin Mapping to Electrodes in Digital Microfluidic Biochips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132659 The number of independent input pins used to control the electrodes in digital microfluidic ÒbiochipsÓ is an important cost-driver in the emerging market place, especially for disposable PCB devices that are being developed for clinical and point-of-care diagnostics. However, most prior work on pin-constrained biochip design considers droplet routing and the assignment of pins to electrodes as independent problems. In this paper, we propose optimization methods to solve the droplet routing and pin-constrained design problems concurrently. First, we formulate the co-optimization problem involving droplet routing and pin-mapping. Next, we present an integer linear programming-based optimization method to solve the droplet-routing and the pin-mapping design problems concurrently. The proposed co-optimization method minimizes the number of control pins. We also present an efficient heuristic approach to tackle the co-optimization problem. These methods overcome a major drawback of a recently proposed method, which leads to infeasible solutions involving conflicts in the mapping of pins to electrodes in different droplet-routing stages. The effectiveness of the proposed co-optimization method is demonstrated for two commercial biochips and an experimental university chip for multiplexed in-vitro diagnostics. Tian, H. Tang, W.-C. Young, E. F. Y. Sze, C. N. Postgrid Clock Routing for High Performance Microprocessor Designs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132655 Designing a high-quality clock network is very important in very large-scale integrated designs today, as it is the clock network that synchronizes all the elements of a chip, and it is also a major source of power dissipation of a system. Early study by Pham in 2006 shows that about 18.1% of the total clock capacitance was due to this postgrid clock routing (i.e., lower mesh wires plus clock twig wires). In this paper, we proposed a partition-based path expansion algorithm to solve this postgrid clock routing problem effectively. Experimental results on industrial test cases show that our algorithm can improve over the latest work by Shelar on this problem significantly by reducing the wire capacitance by 24.6% and the wirelength by 23.6%. REGULAR PAPERS LOGIC SYNTHESIS Chen, Y.-C. Wang, C.-Y. Logic Restructuring Using Node Addition and Removal http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132646 This paper presents a logic restructuring technique named node addition and removal (NAR). It works by adding a node into a circuit to replace an existing node and then removing the replaced node. Previous node-merging techniques focus on replacing one node with an existing node in a circuit, but fail to replace a node that has no substitute node. To enhance the node-merging techniques on logic restructuring and optimization, we propose an NAR approach in this paper. We first present two sufficient conditions that state the requirements of added nodes for safely replacing a target node. Then, an NAR approach is proposed to quickly detect the added nodes by performing logic implications based on these conditions. We apply the NAR approach to circuit minimization together with two techniques: redundancy removal and mandatory assignment reuse. We also apply it to satisfiability (SAT)-based bounded sequential equivalence checking (BSEC) to reduce the computation complexity of SAT solving. The experimental results show that our approach can enhance our prior automatic test pattern generation-based node- merging approach. Additionally, our approach has a competitive capability of circuit minimization with 44 times speedup compared to a SAT-based node-merging approach. For BSEC, our approach can work together with other optimization technique to save a total of approximately 39-h verification time for all the benchmarks. MODELING AND SIMULATION Mangassarian, H. Veneris, A. Najm, F. N. Maximum Circuit Activity Estimation Using Pseudo-Boolean Satisfiability http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132653 With lower supply voltages, increased integration densities and higher operating frequencies, power grid verification has become a crucial step in the very large-scale integration design cycle. The accurate estimation of maximum instantaneous power dissipation aims at finding the worst-case scenario where excessive simultaneous switching could impose extreme current demands on the power grid. This problem is highly input-pattern dependent and is proven to be NP-hard. In this paper, we capitalize on the compelling advancements in satisfiability (SAT) solvers to propose a pseudo-Boolean SAT-based framework that reports the input patterns maximizing circuit activity, and consequently peak dynamic power, in combinational and sequential circuits. The proposed framework is enhanced to handle unit gate delays and output glitches. In order to disallow unrealistic input transitions, we show how to integrate input constraints in the formulation. Finally, a number of optimization techniques, such as the use of gate switching equivalence classes, are described to improve the scalability of the proposed method. An extensive suite of experiments on ISCAS85 and ISCAS89 circuits confirms the robustness of the approach compared to simulation- based techniques and encourages further research for low-power solutions using Boolean SAT. PHYSICAL DESIGN Yan, T. Wong, M. D. F. Correctly Model the Diagonal Capacity in Escape Routing http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132657 Escape routing for packages and printed circuit boards (PCBs) has been studied extensively in the past. Network flow is pervasively used to model this problem. However, none of the previous works correctly models the diagonal capacity, which is essential for 45 degree routing in most packages and PCBs. As a result, existing algorithms may either produce routing solutions that violate the diagonal capacity or fail to output a legal routing even though there exists one. In this paper, we propose a new network flow model that guarantees the correctness when diagonal capacity is taken into consideration. This model leads to the first optimal algorithm for escape routing. We also extend our model to handle missing pins. SYSTEM-LEVEL DESIGN Hernandez, C. Roca, A. Silla, F. Flich, J. Duato, J. On the Impact of Within-Die Process Variation in GALS-Based NoC Performance http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132647 Current integration scales allow designing chip multiprocessors (CMP), where cores are interconnected by means of a network-on-chip (NoC). Unfortunately, the small feature size of current integration scales causes some unpredictability in manufactured devices because of process variation. In NoCs, variability may affect links and routers causing them not to match the parameters established at design time. In this paper, we first analyze the way that manufacturing deviations affect the components of a NoC by applying a new comprehensive and detailed within-die variability model to 200 instances of an 8x8 mesh NoC synthesized using 45 nm technology. Later, we show that GALS-based NoCs present communication bottlenecks under process variation which cannot be avoided by using just device-level solutions but higher level architectural approaches are required. Therefore, to overcome this performance reduction, we draft a novel architectural approach, called performance domains, intended to reduce the negative impact of variability on application execution time. This mechanism is suitable when several applications are simultaneously running in the CMP chip. TEST Lin, Y.-T. Poku, O. Blanton, R. D. S. Nigh, P. Lloyd, P. Iyengar, V. Physically-Aware N-Detect Test http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132651 Physically-aware $N$-detect ($PAN$-detect) test improves defect coverage by exploiting defect locality. This paper presents physically-aware test selection (PATS) to efficiently generate $PAN$-detect tests for large industrial designs. Compared to traditional $N$-detect test, the quality resulting from PAN-detect is enhanced without any increase in test execution cost. Experiment results from an IBM in-production application-specific integrated circuit demonstrate the effectiveness of PATS in improving defect coverage. Moreover, utilizing novel test-metric evaluation, we compare the effectiveness of traditional N-detect and PAN-detect, and demonstrate the impact of automatic test pattern generation parameters on the effectiveness of PAN-detect. SHORT PAPERS Pomeranz, I. Multipattern Scan-Based Test Sets With Small Numbers of Primary Input Sequences http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132654 When a multipattern scan-based test is applied at-speed to detect delay defects, it is necessary to change the primary input vectors at-speed. However, tester limitations can make this infeasible. The solution where the primary input vectors are held constant during the test reduces the fault coverage. An alternative solution is to store the primary input sequences of a multipattern test set on-chip and apply them at-speed from an on-chip memory. To support such a solution, this paper describes a procedure for computing a multipattern test set that requires a small number of different primary input sequences. Experimental results for single stuck-at faults and for transition faults show that a multipattern test set that detects all the detectable faults requires a number of primary input sequences that is significantly smaller than the number of tests. Errata to "Accelerating FPGA Routing Through Parallelization and Engineering Enhancements Special Section on PAR-CAD 2010" http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6132645