TCAD Newsletter – January 2010 Issue Placing you one click away from the best new CAD research! Regular Papers ============== Analysis of SRAM and eDRAM Cache Memories Under Spatial Temperature Variations Meterelliyoz, M.; Kulkarni, J. P.; Roy, K., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356299&isnumber=5356275 In scaled technologies, cache memories which are traditionally known as “cold” sections of the chip are expected to occupy a larger die area. Hence, different sections of a cache memory may experience different temperature profiles depending on their proximity to the active logic units such as the execution unit. In this paper, we performed thermal analysis of cache memories under the influence of hot-spots. In particular, 6-transistor (T) static random access memory (SRAM), 8-T SRAM, and embedded dynamic random access memory (eDRAM) cache memories were investigated. Thermal maps of the entire caches were generated using hierarchical compact thermal models while solving the leakage and temperature self-consistently. The 6-T and the 8-T SRAM bitcells were investigated in terms of stability, noise immunity, and performance under temperature variations for various technology nodes. The 3-T micro sense amplifier used in eDRAM cache memories was investigated for its robustness. Thermal-aware circuit design techniques were explored to improve cache stability under thermal gradients. Results show that, for all cache memories, spatial temperature variations have to be considered to achieve the optimal memory design. A New Approach to Modeling Multiport Systems from Frequency-Domain Data Lefteriu, S.; Antoulas, A. C., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356286&isnumber=5356275 This paper addresses the problem of modeling systems from measurements of their frequency response. For multiport devices, currently available techniques are expensive. We propose a new approach which is based on a system-theoretic tool, the Loewner matrix pencil constructed in the context of tangential interpolation. Several implementations are presented. They are fast, accurate, they build low order models and are especially designed for a large number of terminals. Moreover, they identify the underlying system, rather than merely fitting the measurements. The numerical results show that our algorithms yield smaller models in less time, when compared to vector fitting. Efficient Methods for Large Resistor Networks Rommes, J.; Schilders, W. H. A., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356296&isnumber=5356275 Large resistor networks arise during the design of very-large-scale integration chips as a result of parasitic extraction and electro static discharge analysis. Simulating these large parasitic resistor networks is of vital importance, since it gives an insight into the functional and physical performance of the chip. However, due to the increasing amount of interconnect and metal layers, these networks may contain millions of resistors and nodes, making accurate simulation time consuming or even infeasible. We propose efficient algorithms for three types of analysis of large resistor networks: 1) computation of path resistances; 2) computation of resistor currents; and 3) reduction of resistor networks. The algorithms are exact, orders of magnitude faster than conventional approaches, and enable simulation of very large networks. Predictive Formulae for OPC With Applications to Lithography-Friendly Routing Chen, T.-C.; Liao, G.-W.; Chang, Y.-W., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356285&isnumber=5356275 Due to the subwavelength lithography, manufacturing sub-90-nm feature sizes require intensive use of resolution enhancement techniques, among which optical proximity correction (OPC) is the most popular technique in industry. Considering the OPC effects during routing can significantly alleviate the cost of postlayout OPC operations. In this paper, we present an efficient, accurate, and economical analytical formula for intensity computation and develop the first modeling of postlayout OPC based on a quasi-inverse lithography technique. The technique provides key insights into a new direction for postlayout OPC modeling during routing. Extensive simulations with SPLAT, the golden lithography simulator in academia and industry, show that our intensity formula has high fidelity. Incorporating the OPC costs computed by the quasi-inverse lithography technique for our postlayout OPC modeling into a router, the router can be guided to maximize the effects of the correction. Compared with a rule-based OPC method, the experimental results show that our approach can achieve 15% and 16% reductions in the maximum and average layout distortions, respectively. Performance-Based Optical Proximity Correction Methodology Teh, S.-H.; Heng, C.-H.; Tay, A., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356284&isnumber=5356275 The rapid reduction in critical dimension of integrated circuits has lead to substantial mask data expansion for mask design based on traditional model-based full-chip optical proximity correction (OPC). Conventional EPE-OPC is mainly based on edge placement error (EPE) without consideration of its effect on circuit performance; often resulting in an overcorrected OPC mask with little improvement in circuit performance at the expense of much higher cost. In this paper, a performance-based OPC (PB-OPC) methodology is proposed taking into account both performance and cost. A less complex mask is generated based on the performance matching criteria. The framework exploits the in situ estimated postlithography performance deviation error to drive the customized mask design algorithm. In particular, device PB-OPC (DPB-OPC) was deployed to systematically synthesize both polysilicon and diffusion masks by using mean drive current deviation as the controlled performance index. The proposed approach is validated via detailed simulation using 65-nm foundry libraries and IEEE International Symposium on Circuits and Systems 1985 (ISCAS’85) benchmark circuits. When compared to conventional performance-aware EPE-OPC approach, the proposed DPB-OPC achieved 34% average reduction in mask size and up to 13.5% reduction in mean drive current deviation within reasonable run time. Comparison Study of Performance of Parallel Steady State Solver on Different Computer Architectures Soveiko, N.; Nakhla, M. S.; Achar, R., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356288&isnumber=5356275 A parallel solver for steady state analysis of nonlinear circuits is presented. The solver uses the Message Passing Interface specification for communications and is suitable for steady state simulation of very large nonlinear circuits. Performance of the solver is investigated on symmetric multiprocessing, non-uniform memory access, and distributed memory computer systems. Impact of memory subsystem constraints on the solver efficiency is evaluated. Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip Chou, C.-L.; Marculescu, R., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356293&isnumber=5356275 In this paper, we propose a run-time strategy for allocating application tasks to embedded multiprocessor systems-on-chip platforms where communication happens via the network-on-chip approach. As a novel contribution, we incorporate the user behavior information in the resource allocation process; this allows the system to better respond to real-time changes and to adapt dynamically to different user needs. Several algorithms are proposed for solving the task allocation problem while minimizing the communication energy consumption and network contention. When the user behavior is taken into consideration, we observe more than 70% communication energy savings (with negligible energy and run-time overhead) compared to an arbitrary contiguous task allocation strategy. Verification and Codesign of the Package and Die Power Delivery System Using Wavelets Ferzli, I. A.; Chiprout, E.; Najm, F. N., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356294&isnumber=5356275 As part of the design of large integrated circuits, one must verify that the power delivery network provides supply and ground voltages to the circuit that are within specified ranges. We introduce the concept of time–frequency description of circuit currents using wavelets, and use that to set up an optimization framework that finds the worst-case supply/ground voltage fluctuations. This framework allows for the quick determination of the impact of either the package or the die on the worstcase behavior, which enables their codesign. This approach has been applied to an industrial microprocessor design, resulting in realistic and nonobvious worst-case waveforms. A Flexible Parallel Simulator for Networks-on-Chip with Error Control Yu, Q.; Ampadu, P., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356290&isnumber=5356275 This paper presents a flexible parallel simulator to evaluate the impact of different error control methods on the performance and energy consumption of networks-on-chip (NoCs). Various error control schemes can be inserted into the simulator in a plug-and-play manner for evaluation. Moreover, a highly tunable fault injection feature is developed for modeling various fault injection scenarios, including different fault injection rates, fault types, fault injection locations, and faulty flit types. Case studies performed in the proposed flexible simulation environment are presented to demonstrate the impact of a set of error control schemes on NoC performance and energy in different noise scenarios. This paper also uses the simulator to provide design guidelines for NoCs with error control capabilities. On Compaction Utilizing Inter and Intra-Correlation of Unknown States Czysz, D.; Mrugalski, G.; Mukherjee, N.; Rajski, J.; Tyszer, J., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356297&isnumber=5356275 Unknown (X) states are increasingly often identified as having potential for rendering semiconductor tests useless. One of the key requirements for a reliable test response compactor is, therefore, to preserve observability of any scan cell for a wide range of X-profiles while maintaining very high-compaction ratios, providing ability to detect a variety of failures found in real silicon, and assuring design simplicity. We have proposed a fully X-tolerant test response compaction scheme which is based on a flexible scan chain selection mechanism. This new approach delivers extremely high compression of test results by observing that X states are typically not randomly distributed in test responses. Identical or similar patterns of correlated X states let the proposed scheme reduce the size of a scan chain selector and the amount of test data used to control it. It handles, moreover, a wide range of unknown state profiles such that all X states, including those being clustered and of high density, are suppressed in a per-cycle mode without compromising the test quality. A Scalable Test Structure for Multicore Chip Das, S.; Sikdar, B. K., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356291&isnumber=5356275 This paper reports an efficient synthesis scheme for pseudorandom pattern generators (PRPGs) of arbitrary length. The n-bit PRPG, synthesized in linear time (O(n)), generates quality pseudorandom patterns leading to a highly efficient test logic for the very-large-scale integration (VLSI) circuit. The cascadable structure of proposed n-cell PRPG is utilized to construct the (n + 1)-cell PRPG, in two time steps, without sacrificing the pseudorandomness quality. This eases the design of on-chip test pattern generators for the system-on-achip implementing multiple cores. It avoids the requirement of disparate test hardware for different cores and thereby ensures drastic reduction in the cost of test logic. The effective characterization of nonlinear cellular automata (CA) provides the foundation of such a design. Extensive experimentation confirms the better efficiency of the proposed test structure compared to that of the conventional designs, developed around maximal length CA/linear feedback shift register of O(n3) complexity. Increasing the Efficiency of Simulation-Based Functional Verification Through Unsupervised Support Vector Analysis Guzey, O.; Wang, L.-C.; Levitt, J. R.; Foster, H., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356289&isnumber=5356275 Success of simulation-based functional verification depends on the quality and diversity of the verification tests that are simulated. The objective of test generation methods is to generate tests that exercise as much different functionality of the hardware designs as possible. In this paper, we propose a novel methodology that generates a model of the verification tests in a given test set using unsupervised support vector analysis. One potential application is to use this model to select tests that are likely to exercise functionality that has not been tested so far. Since this selection can be done before simulation, it can be used to filter redundant tests and reduce required simulation cycles. Our methodology can be combined with a test generation method like constrained-random test generation to increase its effectiveness without making fundamental changes to the verification flow. Experimental results based on application of the proposed methodology to the OpenSparc T1 processor are reported to demonstrate the practicality of our approach. Short Papers ============== Formalization of a Parameterized Parallel Adder within the Coq Theorem Prover Chen, G., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356295&isnumber=5356275 This paper describes a new advancement in theorem proving based formal verification: a formalization of a parameterized parallel prefix adder developed in the proof assistant Coq. Design Space Exploration Acceleration Through Operation Clustering Schafer, B. C.; Wakabayashi, K., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356292&isnumber=5356275 This paper presents a clustering method called clustering design space exploration (CDS-ExpA) to accelerate the architectural exploration of behavioral descriptions in C and SystemC. The trade-offs between faster exploration versus optimality of results are investigated. Two variations of CDS-ExpA were developed: CDS-ExpA(min) and CDS-ExpA(max). CDS-ExpA(min) builds the smallest possible clusters while CDS-ExpA(max) builds the largest possible ones, reducing further the design space. Results show that CDS-ExpA(min) and CDSExpA( max) explore the design space 90% and 92% faster on average than a previously developed annealer-based exploration, method, at the expense of not finding 36% and 47% of the Pareto optimal designs and finding the smallest design that is 7% and 9% on average, larger, and the fastest design 28% and 32% slower, respectively. Effective Corner-Based Techniques for Variation-Aware IC Timing Verification Silva, L. G.; Phillips, J.; Silveira, L. M., http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5356298&isnumber=5356275 Traditional integrated circuit timing sign-off consists of verifying a design for a set of carefully chosen combinations of process and operating parameter extremes, referred to as corners. Such corners are usually chosen based on the knowledge of designers and process engineers, and are expected to cover the worst-case fabrication and operating scenarios. With increasingly more detailed attention to variability, the number of potential conditions to examine can be exponentially large, more than is possible to handle with straightforward exhaustive analysis. This paper presents efficient yet exact techniques for computing worst delay and worst-slack corners of combinational and sequential digital integrated circuits. Results show that the proposed techniques enable efficient and accurate detection of failing conditions while accounting for timing variability due to process variations.