TCAD Newsletter - Feb 2011 Issue Placing you one click away from the best new CAD research! SPECIAL SECTION ON THE 2010 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN ============== Guest Editorial Saxena, P.  Chang, Y.-W. Guest Editorial The six selected papers in the Special Section exemplify the broad scope of modern physical design (PD), covering system-level physical synthesis, layout optimization, analog circuit placement, beyond-die routing, biochip design automation, and emerging device performance optimization. Wang, R.  Zhang, Y.  Chou, N.-C.  Young, E. F. Y.  Cheng, C.-K.  Graham, R. “Bus Matrix Synthesis Based on Steiner Graphs for Power Efficient System-on-Chip Communications” Power consumption and the thermal wall have become the major factors limiting the speed of very-large-scale integration (VLSI) circuits, while interconnect is becoming a primary power consumer. These factors bring new demands on the communication architecture of system-on-chips (SoCs). High bandwidth is desired to enhance parallelism for better performance, and the power efficiency on this bandwidth is critical to the overall SoC power consumption. Current bus architectures such as AMBA, Coreconnect, and Avalon are convenient for designers but not efficient on power. This paper proposes a physical synthesis scheme for on-chip buses and bus matrices to minimize the power consumption, without changing the interface or arbitration protocols. By using a bus gating technique, data transactions can take shortest paths on chip, reducing the power consumption of bus wires to minimal. Routing resource and bandwidth capacity are also optimized by the construction of a shortest-path Steiner graph, wire sharing among multiple data transactions, and wire reduction heuristics on the Steiner graph. Experiments indicate that the gated bus from our synthesis flow can save more than 90% dynamic power on average data transactions in current AMBA bus systems, which is about 5–10% of total SoC power consumption, based on comparable amount of chip area and routing resources. Eick, M.  Strasser, M.  Lu, K.  Schlichtmann, U.  Graeb, H. E. “Comprehensive Generation of Hierarchical Placement Rules for Analog Integrated Circuits” This paper presents a new method to automatically generate hierarchical placement rules, which are crucial for a successful analog placement. The method is based on a novel symmetry computation method, introducing the structural signal flow graph. Five types of proximity, matching and symmetry constraints are determined. According to the priority of the constraint types, a constraint requirement graph and a hierarchical partitioning of the circuit into matching, proximity and symmetry groups is then automatically computed. Based on experimental results with a state-of-the-art placement tool, we show that the new approach generates more placement rules and can lead to better circuit performance and parametric yield according to post-layout simulation. Ajwani, G.  Chu, C.  Mak, W.-K. “FOARS: FLUTE Based Obstacle-Avoiding Rectilinear Steiner Tree Construction” In this paper, we present an algorithm called FOARS for obstacle-avoiding rectilinear Steiner minimal tree (OARSMT) construction. FOARS applies a top-down approach which first partitions the set of pins into several subsets uncluttered by obstacles. Then an obstacle-avoiding Steiner tree is generated for each subset by an obstacle aware version of the rectilinear Steiner minimal tree algorithm FLUTE. Finally, the trees are merged and refined to form the OARSMT. To guide the partitioning of pins, we propose a novel algorithm to construct a linear-sized obstacle-avoiding spanning graph which guarantees to contain a rectilinear minimum spanning tree if there is no obstacle. Experimental results show that FOARS is among the best algorithms in terms of both wirelength and runtime for testcases both with and without obstacles. Luo, L.  Yan, T.  Ma, Q.  Wong, M. D. F.  Shibuya, T. “A New Strategy for Simultaneous Escape Based on Boundary Routing” Simultaneous escape routing on dense circuit boards is a very challenging task and a great amount of manual effort is still needed in order to achieve high routability. In this paper, we present a new simultaneous escape routing algorithm which is based upon a novel boundary routing approach. Our algorithm can solve complicated escape problems in a very short time. For a set of industrial escape problems, our algorithm successfully solved all of them while Cadence Allegro PCB router was only able to complete the routing of half of the problems. In addition, we propose a clustering strategy targeting at large escape routing problems. Experimental results show that this clustering strategy can significantly cut down the runtime of our router when solving large problems. Huang, T.-W.  Ho, T.-Y. “A Two-Stage Integer Linear Programming-Based Droplet Routing Algorithm for Pin-Constrained Digital Microfluidic Biochips” With the increasing design complexities, the design of pin-constrained digital microfluidic biochips (PDMFBs) is of practical importance for the emerging marketplace. However, solutions of current pin-count reduction are inevitably limited by simply adopting it after the droplet routing stage. In this paper, we propose the first droplet routing algorithm for PDMFBs that can integrate pin-count reduction with droplet routing stage. Furthermore, our algorithm is capable of minimizing the number of control pins, the number of used cells, and the droplet routing time. We first present a basic integer linear programming (ILP) formulation to optimally solve the droplet routing problem for PDMFBs with simultaneous multiobjective optimization. Due to the complexity of this ILP formulation, we also propose a two-stage technique of global routing followed by incremental ILP-based routing to reduce the solution space. To further reduce the runtime, we present a deterministic ILP formulation that casts the original routing optimization problem into a decision problem, and solve it by a binary solution search method that searches in logarithmic time. Extensive experiments demonstrate that in terms of the number of the control pins, the number of the used cells, and the routing time, we obtain much better achievement than all the state-of-the-art algorithms in any aspect. Lin, Y.-W.  Marek-Sadowska, M.  Maly, W. P. “On Cell Layout-Performance Relationships in VeSFET-Based, High-Density Regular Circuits” In this paper, we study circuits implemented using high-density arrays composed of vertical slit field effect transistors. This layout style could dramatically increase transistor density and, therefore, reduce fabrication cost. However, its geometrical restrictions, imposed by the super-regular transistor arrangement and strictly parallel metal tracks, pose new design challenges. Our experiments reveal that very dense cell-level interconnect pattern may be responsible for unnecessary 15% increase of the circuit level, critical path delays. We demonstrate that these extra delays can be avoided by constructing appropriate cell interconnect layouts and by more flexible usage of available metal layers for intra-cell routing. To balance the performance and metal layer usage, we propose a linear programming-based technique for critical net re-routing. REGULAR PAPERS EMERGING TECHNOLOGIES Ben-Jamaa, M. H.  Mohanram, K.  De Micheli, G. “An Efficient Gate Library for Ambipolar CNTFET Logic” Recently, several emerging technologies have been reported as potential candidates for controllable ambipolar devices. Controllable ambipolarity is a desirable property that enables the on-line configurability of n-type and p-type device polarity. In this paper, we introduce a new design methodology for logic gates based on controllable ambipolar devices, with an emphasis on carbon nanotubes as the candidate technology. Our technique results in ambipolar gates with a higher expressive power than conventional complementary metal-oxide-semiconductor (CMOS) libraries. We propose a library of static ambipolar carbon nanotube field effect transistor (CNTFET) gates based on generalized NOR-NAND-AOI-OAI primitives, which efficiently implements XOR-based functions. Technology mapping of several multi-level logic benchmarks that extensively use the XOR function, including multipliers, adders, and linear circuits, with ambipolar CNTFET logic gates indicates that on average, it is possible to reduce the number of logic levels by 42%, the delay by 26%, and the power consumption by 32%, resulting in a energy-delay-product (EDP) reduction of 59% over the same circuits mapped with unipolar CNTFET logic gates. Based on the projections in , where it is stated that defect-free CNTFETs will provide a 5times performance improvement over metal-oxide-semiconductor field effect transistors, the ambipolar library provides a performance improvement of 7times, a 57% reduction in power consumption, and a 20times improvement in EDP over the CMOS library. MODELING AND SIMULATION Rizzoli, V.  Masotti, D.  Mastri, F.  Montanari, E. “System-Oriented Harmonic-Balance Algorithms for Circuit-Level Simulation” This paper discusses a self-consistent set of modern computational concepts providing an effective approach to the circuit-level harmonic-balance (HB) simulation of nonlinear microwave systems of complex topology. The system is automatically split into the interconnection of a near-optimal number of nonlinear blocks at run time. The resulting structure is then exploited by the domain-partitioning concept. A block-wise constant spectrum is used rather than a common spectrum by considering for each block only the set of spectral lines that are relevant to its electrical function, which leads to a very significant reduction in the number of problem unknowns. System simulation under digitally modulated RF drive is reduced to a sequence of modified multitone HB analyses that are backward coupled through the envelope dynamics. Besides providing high numerical efficiency, this set of techniques opens the way to an effective co-simulation of RF and baseband transceiver sections. Gong, M.  Zhou, H.  Li, L.  Tao, J.  Zeng, X. “Binning Optimization for Transparently-Latched Circuits” With increasing process variation, binning has become an important technique to improve the values of fabricated chips, especially in high performance microprocessors where transparent latches are widely used. In this paper, we formulate and solve the binning optimization problem that decides the bin boundaries and their testing order to maximize the profit (considering the test cost) for a transparently-latched circuit. The problem is decomposed into four sub-problems. First, to compute the clock period distribution of the transparently-latched circuit, a sample-based statistical static timing analysis (SSTA) approach is developed which is based on the generalized stochastic collocation method with the sparse grid technique. The minimal clock period on each sample point is found by solving a minimal cycle ratio problem in the constraint graph. Second, a greedy method is proposed to maximize profit considering both the sales revenue and the test cost by iteratively assigning each boundary to its optimal position. Third, an optimal algorithm of O(nlog n) runtime is used to generate the optimal testing order to minimize the test cost, based on alphabetic tree. Last, a simple approach is presented to decide the optimal number of bins, which helps to complete the whole binning scheme with maximal profit. Experiments on all the ISCAS'89 sequential benchmarks with 65 nm technology show 10.68% profit improvement in average. Some comparisons with other methods suggest the advantage of our method. The results also demonstrate that the proposed SSTA method achieves an error of 0.70% and speedup of 110 X in average compared with the Monte Carlo simulation. PHYSICAL DESIGN Hsu, C.-H.  Chang, Y.-W.  Nassif, S. R. “Simultaneous Layout Migration and Decomposition for Double Patterning Technology” Double patterning technology (DPT) and layout migration (LM) are two closely related problems on design for manufacturability in the nanometer era. DPT decomposes a layout into two masks and applies double exposure patterning to increase pitch size and, thus, printability. In this paper, we present the first algorithm in the literature for the simultaneous layout migration and decomposition (SMD) problem. Our algorithm first constructs a potential conflict graph and DPT-aware constraint graphs, and then applies integer linear programming (ILP) corresponding to the graphs to obtain a decomposed and migrated layout. We further present an effective graph-based reduction technique to prune the ILP solution space, which maintains the same DPT conflicts. We also present a new DPT-aware objective for the SMD problem to minimize the difference between the original and migrated layouts while considering the DPT effects. In addition, we present an approach to generate DPT-aware standard cells by considering the DPT effects on the cell boundaries; this technique improves the layout printability and facilitates electronic design automation tools to consider DPT. Experimental results show that our algorithms can effectively generate conflict-free migrated layouts with 11% smaller layout areas and 21% smaller layout changes, compared with the traditional method of layout decomposition followed by LM. In particular, our reduction technique reduces the ILP variables by 45.7%, the ILP constraints by 58.5%, and the DPT edges by 79.9% over the basic ILP formulation, leading to a substantial speedup. For example, it can reduce the runtimes for the test cases from more than one day to only seconds. SYSTEM-LEVEL DESIGN Wang, F.  Chen, Y.  Nicopoulos, C.  Wu, X.  Xie, Y.  Vijaykrishnan, N. “Variation-Aware Task and Communication Mapping for MPSoC Architecture” As technology scales, the delay uncertainty caused by process variations has become increasingly pronounced in deep submicrometer designs. As a result, a paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable. In this paper, we propose a variation-aware task and communication mapping methodology for multiprocessor system-on-chips that uses network-on-chip communication architecture so that the impact of parameter variations can be mitigated. Our mapping scheme accounts for variability in both the processing cores and the communication links to ensure a complete and accurate model of the entire system. A new design metric, called performance yield and defined as the probability of the assigned schedule meeting the predefined performance constraints, is used to guide both the task scheduling and the routing path allocation procedure. An efficient yield computation method for this mapping complements and significantly improves the effectiveness of the proposed variation-aware mapping algorithm. Experimental results show that our variation-aware mapper achieves significant yield improvements over worst-case and nominal-case deterministic mapper. SHORT PAPERS Zhou, T. Y.  Liu, H.  Zhou, D.  Tarim, T. “A Fast Analog Circuit Analysis Algorithm for Design Modification and Verification” This paper presents a fast analog circuit analysis algorithm, fundamental circuit-based circuit analysis, for circuits being repeatedly modified and verified in product development. The algorithm reuses previous circuit simulation result on successive changed circuit analysis to achieve simulation operation reduction. The algorithm is implemented with SPICE simulator on linear and nonlinear circuit applications with the proposed device delta models. The experiments show that the algorithm increases the speed of the circuit simulation five to ten times over directly simulations under the same simulation accuracy. Yu, G.  Li, P. “Hierarchical Analog/Mixed-Signal Circuit Optimization Under Process Variations and Tuning” A hierarchical optimization methodology is presented to achieve robust analog/mixed-signal circuit design with consideration of process variations. Hierarchical optimization using building circuit block Pareto models is an efficient approach for optimizing nominal performances of large analog circuits. However, yield-aware system optimization, as dictated by the need for safeguarding chip manufacturability in scaled technologies, is completely nontrivial. Two fundamental difficulties are addressed for achieving such a methodology: yield-aware Pareto performance characterization at the building block level and yield-aware optimization problem formulation at the system level. In addition, postsilicon tuning in complex mixed-signal system designs is investigated and the proposed optimization framework is extended for such systems. The presented methodology is demonstrated by hierarchical optimization of a phased-locked loop consisting of multiple building blocks and self-tuning function blocks. Yao, C.  Saluja, K. K.  Ramanathan, P. “Power and Thermal Constrained Test Scheduling Under Deep Submicron Technologies” Conventional power constrained test scheduling methods do not guarantee a thermal-safe solution. In this paper, we propose a test scheduling algorithm that satisfies the resource, power, and thermal constraints. First, in contrast to existing schemes, the proposed algorithm exploits superposition principle to perform fast and accurate thermal simulation, which, in turn, allows the algorithm to search for solutions which introduce cooling periods between tests to reduce the overall test length. Second, we propose a test partition-based method to further improve the performance of the test scheduling. We apply our test scheduling algorithm to ITC'02 SoC benchmarks and the results show considerable improvement in the total test length over existing methods.