March 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-03.txt CALL FOR PAPERS: Special Section on Three-dimensional Integrated Circuits and Microarchitectures (Deadline: March 15, 2012) Call for papers available at http://www.umn.edu/~tcad/special_sections/TCAD-3D-CFP.pdf Guest Editors Yuan Xie, Pennsylvania State University, yuanxie@cse.psu.edu Gabriel H. Loh, AMD Research. Gabe.loh@amd.com REGULAR PAPERS EMBEDDED SYSTEMS Ejlali, A. Al-Hashimi, B. M. Eles, P. Low-Energy Standby-Sparing for Hard Real-Time Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152774 Time-redundancy techniques are commonly used in real-time systems to achieve fault tolerance without incurring high energy overhead. However, reliability requirements of hard real-time systems that are used in safety-critical applications are so stringent that time-redundancy techniques are sometimes unable to achieve them. Standby sparing as a hardware-redundancy technique can be used to meet high reliability requirements of safety-critical applications. However, conventional standby-sparing techniques are not suitable for low-energy hard real-time systems as they either impose considerable energy overheads or are not proper for hard timing constraints. In this paper we provide a technique to use standby sparing for hard real-time systems with limited energy budgets. The principal contribution of this paper is an online energy-management technique which is specifically developed for standby-sparing systems that are used in hard real-time applications. This technique operates at runtime and exploits dynamic slacks to reduce the energy consumption while guaranteeing hard deadlines. We compared the low-energy standby-sparing (LESS) system with a low-energy time-redundancy system (from a previous work). The results show that for relaxed time constraints, the LESS system is more reliable and provides about 26% energy saving as compared to the time-redundancy system. For tight deadlines when the time-redundancy system is not sufficiently reliable (for safety-critical application), the LESS system preserves its reliability but with about 49% more energy consumption. HIGH-LEVEL SYNTHESIS Sarbishei, O. Radecka, K. Zilic, Z. Analytical Optimization of Bit-Widths in Fixed-Point LTI Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152780 Analyses of range and precision are important for high-level synthesis and verification of fixed-point circuits. Conventional range and precision analysis methods mostly focus on combinational arithmetic circuits and suffer from major inefficiencies when dealing with sequential linear-time-invariant circuits. Such problems mainly include inability to analyze precision when quantization of constant coefficients is taken into account, and lacking efficient word-length optimization algorithms to handle both variables and constants, while satisfying the error metrics. The algorithms presented in this paper solve these problems. Experiments illustrate the efficiency and robustness of our algorithms. MODELING AND SIMULATION Gao, M. Ye, Z. Wang, Y. Yu, Z. Efficient Full-Chip Statistical Leakage Analysis Based on Fast Matrix Vector Product http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152775 Power consumption has become a major concern since the integrated circuit industry entered the nanometer design regime. Due to the increasing process variation, deterministic leakage power analysis becomes inadequate and thus statistical analysis is required. The challenges of statistical leakage analysis are that the huge number of random variables make trivial computation of the variance in O(N^{2}) time impractical for realistic designs and that knowing only the first two moments is not sufficient to obtain the distribution of the full-chip leakage. In this paper, we introduce efficient linear time algorithms for statistical leakage analysis. To enable those algorithms, a fast matrix vector product technique is crucial, being applied not only to compute the second moment of the total leakage, but also, combined with a comonotonic approximation, to estimate the distribution function of the total leakage power. The computational complexity of the proposed algorithms is provably O(N), and the experimental result is presented with detailed discussion, indicating promising improvement in terms of accuracy. Wei, C.-J. Chen, H. Chen, S.-J. Design and Implementation of Block-Based Partitioning for Parallel Flip-Chip Power-Grid Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152783 Power-grid analysis is one of the critical design steps to ensure circuit reliability and achieve performance targets for very large scale integration systems. With each new technology generation, the circuit size has decreased and the power density has increased. Consequently, power-grid analysis has become ever more complex with greater CPU runtime and memory usage requirements. For a state-of-the-art power-grid design with more than 100-million nodes, it is often desirable to partition the power grid into smaller regions and analyze them in parallel by exploiting the locality of flip-chip packages. However, the traditional area-based partitioning strategy may not be best suited to analyze the DC current and ohmic IR voltage drop of a design that has irregular power rails and nonuniform power consumption because such nonuniformity affects the locality of power supply network and the accuracy of analysis. In this paper, we will present the analysis of a flip-chip design with 136-million nodes and propose a block-based partitioning scheme to improve the accuracy of parallel power-grid analysis. Lee, J. Chen, D. Balakrishnan, V. Koh, C.-K. Jiao, D. A Quadratic Eigenvalue Solver of Linear Complexity for 3-D Electromagnetics-Based Analysis of Large- Scale Integrated Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152776 It is of critical importance to efficiently and accurately predict global resonances of a 3-D integrated circuit system that involves arbitrarily shaped lossy conductors and inhomogeneous materials. A quadratic eigenvalue solver of linear complexity and electromagnetic accuracy is developed in this paper to fulfill this task. Without sacrificing accuracy, the proposed eigenvalue solver has shown a clear advantage over state-of-the-art eigenvalue solvers in fast CPU time. It successfully solves a quadratic eigenvalue problem of over 2.5 million unknowns associated with a large-scale 3-D on-chip circuit embedded in inhomogeneous materials in 40 min on a single 3 GHz 8222SE AMD Opteron processor. PHYSICAL DESIGN Seomun, J. Shin, I. Shin, Y. Synthesis of Active-Mode Power-Gating Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152781 Active leakage is transient, which can be suppressed by design techniques such as dual-Vt. Active-mode power- gating (AMPG) can further reduce active leakage by power-gating groups of gates that perform computations with results that are not loaded due to clock-gating. AMPG involves several challenges; the grouping of gates must take circuit timing into account, and current switches need to be sized to preserve power network integrity as well as circuit timing. We propose solutions to these problems in the content of the entire process of synthesizing AMPG circuits. The physical design of AMPG circuits is also difficult due to the large number of virtual ground rails that must be mutually isolated. We address these issues by integrating placement with power network synthesis. Experiments on several test circuits implemented in 45-nm technology demonstrate the effectiveness of AMPG in the circuits that we synthesized, in terms of power consumption, area, wirelength, and timing. SYSTEM-LEVEL DESIGN Kahng, A. B. Kang, S. Kumar, R. Sartori, J. Recovery-Driven Design: Exploiting Error Resilience in Design of Energy-Efficient Processors http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152777 Conventional computer-aided design (CAD) methodologies optimize a processor module for correct operation and prohibit timing violations during nominal operation. We propose recovery-driven design, a design approach that optimizes a processor module for a target timing error rate (ER) instead of correct operation. The target ER is chosen based on how many errors can be gainfully tolerated by a hardware or software error resilience mechanism. We show that significant power benefits are possible from a recovery-driven design approach that deliberately allows errors caused by voltage overscaling to occur during nominal operation, while relying on an error resilience technique to tolerate these errors. We present a detailed evaluation and analysis of such a CAD methodology that minimizes the power of a processor module for a target ER. We show how this design-level methodology can be extended to design recovery-driven processorsÑprocessors that are optimized to take advantage of hardware or software error resilience. We also discuss a gradual slack recovery-driven design approach that optimizes for a range of ERs to create soft processorsÑprocessors that have graceful failure characteristics and the ability to trade throughput or output quality for additional energy savings over a range of ERs. We demonstrate significant power benefits over conventional designÑ11.8% on average over all modules and ER targets, and up to 29.1% for individual modules. Processor-level benefits were 19.0%, on average. Benefits increase when recovery-driven design is coupled with an error resilience mechanism or when the number of available voltage domains increases. Kakoulli, E. Soteriou, V. Theocharides, T. Intelligent Hotspot Prediction for Network-on-Chip-Based Multicore Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152782 Hotspots are network-on-chip (NoC) routers or modules in multicore systems which occasionally receive packetized data from other networked element producers at a rate higher than they can consume it. This adverse phenomenon may greatly reduce the performance of NoCs, especially when wormhole flow-control is employed, as backpressure can cause the buffers of neighboring routers to quickly fill-up leading to a spatial spread in congestion. This can cause the network to saturate prematurely where in the worst scenario the NoC may be rendered unrecoverable. Thus, a hotspot prevention mechanism can be greatly beneficial, as it can potentially enable the interconnection system to adjust its behavior and prevent the rise of potential hotspots, subsequently sustaining NoC performance. The inherent unevenness of traffic patterns in an NoC-based general-purpose multicore system such as a chip multiprocessor, due to the diverse and unpredictable access patterns of applications, produces unexpected hotspots whose appearance cannot be known a priori, as application demands are not predetermined, making hotspot prediction and subsequently prevention difficult. In this paper, we present an artificial neural network-based (ANN) hotspot prediction mechanism that can be potentially used in tandem with a hotspot avoidance or congestion-control mechanism to handle unforeseen hotspot formations efficiently. The ANN uses online statistical data to dynamically monitor the interconnect fabric, and reactively predicts the location of an about to-be-formed hotspot(s), allowing enough time for the multicore system to react to these potential hotspots. Evaluation results indicate that a relatively lightweight ANN-based predictor can forecast hotspot formation(s) with an accuracy ranging from 65% to 92%. SHORT PAPERS Maffezzoni, P. Stochastic Analysis of Switched-Capacitor Circuits for Sampled Data Converters http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152779 This paper describes an original simulation-based method to derive the stochastic properties of the output noise of switched-capacitor circuits which are used in sampled-data converters. The method relies on a linear time-varying approximation of the large-signal transient response of the switched circuits. It is shown how switched-capacitor- circuit noise and quantization noise, due to the presence of harsh comparators, can be analyzed in a unified frame where the data converter is modeled as a discrete-time system. Li, Z. Zhou, Y. N. Shi, W. Time Algorithm for Optimal Buffer Insertion of Nets With Sinks http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152778 Buffer insertion is an effective technique to reduce interconnect delay. In this paper, we give a simple O(mn) time algorithm for optimal buffer insertion, where m is the number of sinks and n is the number of buffer positions. When m is small, our algorithm is a significant improvement over the recent O(nlog^{2}n) time algorithm by Shi and Li, and the O(n^2) time algorithm of van Ginneken. For b buffer types, our algorithms runs in O(b^2 n+bmn) time, an improvement of the recent O(bn^{2}) algorithm by Li and Shi. The improvement is made possible by an innovative linked list that can perform addition of a wire, addition of a buffer in amortized O(1) time, and smart design of pointers. We then present the extension of our algorithm for the buffer cost minimization problem, which improves the previous best algorithm. On industrial test cases, the new algorithms is faster than previous best algorithms by an order of magnitude. Yang, J.-S. Touba, N. A. Efficient Trace Signal Selection for Silicon Debug by Error Transmission Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152784 In this paper, a technique is presented for selecting signals to observe during silicon debug. Internal signals are used to analyze, understand, and debug circuit misbehavior. An automated procedure to select which signals to observe is proposed to facilitate early detection of circuit malfunction and to enhance the utilization of hardware resources for storage. Signals that are most often sensitized to possible errors are observed in sequential circuits. Given a functional input vector set, an error transmission matrix is generated by analyzing which flip-flops are sensitized to other flip-flops. Relatively independent flip-flops are identified and a set of signals that maximally cover the possible error sites with given constraints are identified through integer linear programming. Experimental results show that the proposed approach can rapidly and precisely identify the nonconforming chip behavior and thereby can speed up the post-silicon debug process. Das, S. Banerjee, A. Dasgupta, P. Early Analysis of Critical Faults: An Approach to Test Generation From Formal Specifications http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6152773 This paper presents a formal methodology for test generation from formal specifications. Our method can be used for test generation for critical faults in component-based designs. Test generation for critical faults is done entirely using formal specifications and therefore the theory inherently guarantees that a generated test will be applicable to any implementation of the specifications. The theory makes fault analysis possible at an abstract level of design where the complete logic is not specified.