TCAD Newsletter – February 2010 Issue Placing you one click away from the best new CAD research! Regular Papers ============== Enhanced Double Via Insertion Using Wire Bending Lee, K.-Y.; Lin, S.-T.; Wang, T.C http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395744&isnumber=5395722&tag=1 Redundant via insertion is highly recommended for improving chip yield and reliability. In this paper, we studied the problem of simultaneous double via insertion and wire bending (DVI/WB) in a postrouting stage, where a single via can have at most one redundant via inserted next to it. Aside from this, we are allowed to bend existing signal wires for enhancing the insertion rate of double vias. The primary goal of the DVI/WB problem is to insert as many double vias as possible; the secondary objective is to minimize the amount of layout perturbation. We formulate the DVI/WB problem as that of finding a minimum-weight maximum independent set (mWMIS) on an enhanced conflict graph. We proposed algorithms to perform wire bending and to construct the enhanced conflict graph from a given design. We also proposed a zero-one integer linear program (0-1 ILP)-based approach to solve the mWMIS problem. Moreover, we studied the problem of DVI/WB with the consideration of via density and extended our 0-1 ILP-based approach to solve it. Experimental results show that our approaches can improve the insertion rate by up to 6.34% at the expense of up to 1.29% wirelength increase when compared with the state-of-the-art double via insertion methods that do not consider wire bending. Moreover, when compared with an existing method that considers wire bending, our DVI/WB approach can insert 2% more double vias and produce 32% less wirelength increase rate on average. Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Yuan, K.; Yang, J.-S.; Pan, D. Z., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395742&isnumber=5395722 Double patterning lithography (DPL) is considered as a most likely solution for 32 nm/22nm technology. In DPL, the layout patterns are decomposed into two masks (colors), and manufactured through two exposures and etch steps. If the spacing between two features (polygons) is less than certain minimum coloring distance, they have to be assigned opposite colors. However, a proper coloring is not always feasible because two neighboring patterns within the minimum distance may be in the same mask due to complex pattern configurations. In that case, a feature may need to be split into two parts to resolve the conflict, resulting in stitch insertion which causes yield loss due to overlay and line-end effect. While previous layout decomposition approaches perform coloring and splitting separately, in this paper, we propose a simultaneous conflict and stitch minimization algorithm with an integer linear programming (ILP) formulation. Since ILP is in class NP-hard, the algorithm includes three speed-up techniques: 1) grid merging; 2) independent component computation; and 3) layout partition. In addition, our algorithm can be extended to handle design rules such as overlap margin and minimum width for practical use as well as off-grid layout. Our approach can reduce 33% of stitches and remove conflicts by 87.6% compared with two phase greedy decomposition. Layout Generator for Transistor-Level High-Density Regular Circuits Lin, Y.-W.; Marek-Sadowska, M.; Maly, W. P., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395731&isnumber=5395722 In this paper, we describe an automatic place and route strategy for a high-density, super-regular, double-gate, transistor-array-based layout. Interconnects on all metal layers are strictly parallel and can be manufactured by an optical proximity correction free process. Our objective is to achieve a circuit layout area equal to the transistor footprint. Such layout constraints limit routing flexibility and render traditional approaches impractical. Our tools automatically generate circuits with several tens of transistors. Experimental results demonstrate both the efficiency of the proposed algorithms and the high quality of the layouts produced. Capturing Post-Silicon Variations Using a Representative Critical Path Liu, Q.; Sapatnekar, S. S., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395738&isnumber=5395722 In nanoscale technologies that experience large levels of process variation, post-silicon adaptation is an important step in circuit design. These adaptation techniques are often based on measurements of a replica of the nominal critical path, whose variations are intended to reflect those of the entire circuit after manufacturing. For realistic circuits, where the number of critical paths can be large, the notion of using a single critical path is too simplistic. This paper overcomes this problem by introducing the idea of synthesizing a representative critical path (RCP), which captures these complexities of the variations. We first prove that the requirement on the RCP is that it should be highly correlated with the circuit delay. Next, we present three novel algorithms to automatically build the RCP. Our experimental results demonstrate that over a number of samples of manufactured circuits, the delay of the RCP captures the worst case delay of the manufactured circuit. The average prediction error of all circuits is shown to be below 2.8% for all three approaches. For both our approach and the critical path replica method, it is essential to guard-band the prediction to ensure pessimism: on average our approach requires a guard band 31% smaller than for the critical path replica method. A New Algorithm for Simultaneous Gate Sizing and Threshold Voltage Assignment Liu, Y.; Hu, J., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395732&isnumber=5395722 Gate sizing and threshold voltage (Vt) assignment are popular techniques for circuit timing and power optimization. Existing methods, by and large, are either sensitivity-driven heuristics or based on discretizing continuous optimization solutions. Sensitivity-driven heuristics are easily trapped in local optima and the discretization may be subject to remarkable errors. In this paper, we propose a systematic combinatorial approach for simultaneous gate sizing and Vt assignment. The core idea of this approach is joint relaxation and restriction, which employs consistency relaxation and coupled bi-directional solution search. The process of joint relaxation and restriction is conducted iteratively to systematically improve solutions. Our algorithm is compared with a state-of-the-art previous work on benchmark circuits. The results from our algorithm can lead to about 22% less power dissipation subject to the same timing constraints. Modeling the Overshooting Effect for CMOS Inverter Delay Analysis in Nanometer Technologies Huang, Z.; Kurokawa, A.; Hashimoto, M.; Sato, T.; Jiang, M.; Inoue, Y, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395729&isnumber=5395722 With the scaling of complementary metal–oxide–semiconductor (CMOS) technology into the nanometer regime, the overshooting effect due to the input-to-output coupling capacitance has more significant influence on CMOS gate analysis, especially on CMOS gate static timing analysis. In this paper, the overshooting effect is modeled for CMOS inverter delay analysis in nanometer technologies. The results produced by the proposed model are close to simulation program with integrated circuit emphasis (SPICE). Moreover, the influence of the overshooting effect on CMOS inverter analysis is discussed. An analytical model is presented to calculate the CMOS inverter delay time based on the proposed overshooting effect model, which is verified to be in good agreement with SPICE results. Furthermore, the proposed model is used to improve the accuracy of the switch-resistor model for approximating the inverter output waveform. Victim Alignment in Crosstalk-Aware Timing Analysis Gandikota, R.; Chopra, K.; Blaauw, D.; Sylvester, D., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395735&isnumber=5395722 Modeling the effect of coupling-noise on circuit delay is a key issue in static timing analysis and involves the victim–aggressor alignment problem. As delay-noise strongly depends on the skew between the victim–aggressor driver input transitions, it is not possible a priori identify the victim-driver input transition that results in the worst-case delay-noise. Several approaches have been proposed in literature which heuristically search for the worst-case victim–aggressor alignment. This paper presents an analytical result that obviates the need to search for the optimal victim-driver input transition, thereby simplifying the victim–aggressor alignment problem significantly. Using the properties of standard nonlinear complementary metal-oxide semiconductor drivers, it is shown that for monotonic input transitions the worst-case victim-driver input transition is the one that switches at the latest point in its timing window. Similarly, the victim-driver input alignment at the earliest point in the timing window is optimal for early-mode analysis. Although this result has been empirically observed in the industry, to the best of our knowledge this is the first paper which provides a rigorous analysis and shows that the above result holds for both linear and nonlinear drivers. It is also shown that the latest alignment of the victim-driver input transition results in the latest victim receiver output arrival time even for the cases where the victim is coupled to multiple aggressors. Finally, experimental results show that limiting the alignment of the victim to only the latest victim-driver input transition can significantly reduce the runtime of existing approaches with no loss of accuracy. New Reconfigurable Architectures for Implementing FIR Filters with Low Complexity Mahesh, R.; Vinod, A. P., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395736&isnumber=5395722+ Reconfigurability and low complexity are the two key requirements of finite impulse response (FIR) filters employed in multistandard wireless communication systems. In this paper, two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method. The proposed FIR filter architecture is capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry. We show that dynamically reconfigurable filters can be efficiently implemented by using common subexpression elimination algorithms. The proposed architectures have been implemented and tested on Virtex 2v3000ff1152-4 field-programmable gate array and synthesized on 0.18 ?m complementary metal–oxide–semiconductor technology with a precision of 16 bits. Design examples show that the proposed architectures offer good area and power reductions and speed improvement compared to the best existing reconfigurable FIR filter implementations in the literature. A Physical-Location-Aware X-Filling Method for IR-Drop Reduction in At-Speed Scan Test Hsieh, W.-W.; Chen, S.-L.; Lin, I.-S.; Hwang, T. T., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395734&isnumber=5395722 The IR-drop problem during test mode exacerbates delay defects and results in false failures. In this paper, we take the X-filling approach to reduce the IR-drop effect during an at-speed test. The main difference between our approach and the previous X-filling approaches lies in two aspects. The first one is that we take the spatial information into consideration in our approach. The second one is how X-filling is performed. We propose a backward-propagation technique instead of a forward-propagation approach taken in previous work. The experimental results show that our approach can reduce 21.1% of the maximum IR-drop in the best case and 9.1% on the average as compared to previous work. Using Launch-on-Capture for Testing BIST Designs Containing Synchronous and Asynchronous Clock Domains Wang, L.-T.; Wen, X.; Wu, S.; Furukawa, H.; Chao, H.-J.; Sheu, B.; Guo, J.; Jone, W.-B., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395739&isnumber=5395722 This paper presents a new at-speed logic built-in self-test (BIST) architecture supporting two launch-on-capture schemes, namely aligned double-capture and staggered doublecapture, for testing multi-frequency synchronous and asynchronous clock domains in a scan-based BIST design. The proposed architecture also includes BIST debug and diagnosis circuitry to help locate BIST failures. The aligned scheme detects and allows diagnosis of structural and delay faults among all synchronous clock domains, whereas the staggered scheme detects and allows diagnosis of structural and delay faults among all asynchronous clock domains. Both schemes solve the longstanding problem of using the conventional one-hot scheme, which requires testing each clock domain one at a time, or the simultaneous scheme, which requires adding isolation logic to normal functional paths across interacting clock domains. Physical implementation is easily achieved by the proposed solution due to the use of a slow-speed, global scan enable signal and reduced timing-critical design requirements. Application results for industrial designs demonstrate the effectiveness of the proposed architecture. Special Section Short Papers A Routing Approach to Reduce Glitches in Low Power FPGAs Dinh, Q.; Chen, D.; Wong, M. D. F. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395747&isnumber=5395722 This paper presents a novel approach to reduce dynamic power in field-programmable gate arrays (FPGAs) by reducing glitches during routing. It finds alternative routes for early-arriving signals so that signal arrival times at look-up tables are aligned. We developed an efficient algorithm to find routes with target delays and then built a glitch-aware router aiming at reducing dynamic power. To the best of our knowledge, this is the first glitch-aware routing algorithm for FPGAs. Experiments show that an average of 27% reduction in glitch power is achieved, which translates into an 11% reduction in dynamic power, compared to the glitch-unaware versatile place and route’s router. A Metal-Only-ECO Solver for Input-Slew and Output-Loading Violations Lu, C.-P.; Chao, M. C.-T.; Lo, C.-H.; Chang, C. W., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395740&isnumber=5395722 To reduce the time-to-market and photomask cost for advanced process technologies, metal-only engineering change order (ECO) has become a practical and attractive solution to handle incremental design changes. Due to limited spare cells in metal-only ECO, the new added netlist may often violate the input-slew and output-loading constraints and, in turn, delay or even fail the timing closure. This paper presents a framework, named metal-only ECO slew/cap solver (MOESS), to resolve the input-slew and output-loading violations by connecting spare cells onto the violated nets as buffers. MOESS performs two buffer insertion schemes in a sequential manner to first minimize the number of inserted buffers and then resolve timing violations, if any. The experimental results based on industrial designs demonstrate that MOESS can resolve more violations with fewer inserted buffers and less central processing unit runtime compared to an electronic design automation vendor’s solution. Routing With Constraints for Post-Grid Clock Distribution in Microprocessors Shelar, R. S., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395733&isnumber=5395722 Microprocessors typically employ a global grid followed by block-level buffered trees for clock distribution. The trees are connected to the grid by routing wires along reserved tracks. The routing of these clock wires, which present load to the grid, is constrained by delay/slope requirements at inputs of the block-level trees. This leads to a capacitance minimization problem during multiterminal routing, where routes use the reserved tracks and obey the constraints. This paper presents an algorithm that addresses the problem, improving wirelength by 14% over a competitive approach. The algorithm is employed for post-grid clock distribution in a 45 nm technology microprocessor. Short Papers Placement Optimization for Yield Improvement of Switched-Capacitor Analog Integrated Circuits Chen, J.-E.; Luo, P.-W.; Wey, C.-L., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395737&isnumber=5395722 Capacitor mismatch can generally result from two sources of error: random mismatch and systematic mismatch. Random mismatch is caused by process variation, while systematic mismatch is mainly due to an asymmetrical layout and processing gradients. A common centroid structure may be used to reduce systematic mismatch errors, but not random mismatch errors. Based on the spatial correlation model, this paper formulates the placement optimization problem of analog circuits using switched-capacitor techniques. A placement with higher correlation coefficients of the unit capacitors results in a higher acceptance rate, or chip yield. This paper proposes a heuristic algorithm that quickly and automatically derives the placement of the unit capacitors with the highest, or near-highest, correlation coefficients for yield improvement. Results show that the resultant placement derived from the proposed algorithm achieves better yield improvement than that from a common centroid approach. The proposed heuristic algorithm can be applied for any arbitrary capacitor ratios, i.e., more than two capacitors. Optimal Double Via Insertion with On-Track Preference Lee, K.-Y.; Wang, T.-C.; Koh, C.-K.; Chao, K.-Y., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5395741&isnumber=5395722 As on-track double vias take less routing resources and have better electrical characteristics, we study in this paper the problem of double via insertion with a preference for on-track double vias (DVI/ON) in a postrouting stage. The primary goal is to insert as many double vias as possible, and maximizing the number of on-track double vias is a secondary objective. We present a zero-one integer linear program-based approach to optimally solve the DVI/ON problem. Moreover, we also discuss a special case of the DVI/ON problem and present a maximum weighted bipartite matching-based optimal approach. Experimental results indicate that our approaches outperform existing algorithms in terms of solution quality.