October 2011 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-10.txt KEYNOTE PAPER Reddi, V. J. Brooks, D. Resilient Architectures via Collaborative Design: Maximizing Commodity Processor Performance in the Presence of Variations http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022007 Unintended variations in circuit lithography and undesirable fluctuations in circuit operating parameters such as supply voltage and temperature are threatening the continuation of technology scaling that microprocessor evolution relies on. Although circuit-level solutions for some variation problems may be possible, they are prohibitively expensive and impractical for commodity processors, on which not only the consumer market but also an increasing segment of the business market now depends. Solutions at the microarchitecture level and even the software level, on the other hand, overcome some of these circuit-level challenges without significantly raising costs or lowering performance. Using examples drawn from our Alarms Project and related work, we illustrate how collaborative design that encompasses circuits, architecture, and chip-resident software leads to a cost-effective solution for inductive voltage noise, sometimes called the $dI/dt$ problem. The strategy that we use for assuring correctness while preserving performance can be extended to other variation problems. REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Mukherjee, S. Dasgupta, P. Mukhopadhyay, S. Auxiliary Specifications for Context-Sensitive Monitoring of AMS Assertions http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022013 As research on developing assertion languages for the analog and mixed-signal (AMS) domain gains in momentum, it is increasingly being felt that extensions of existing assertion languages like property specification language and SystemVerilog assertions into the AMS domain are not adequate for expressing the analog design intent. This is largely due to the intricacy of the analog behavioral intent which cannot be captured purely in terms of logic. In this paper, we show that by using auxiliary forms of formal specifications such as abstract state machines and real- valued functions, called auxiliary functions, as references for AMS assertions, it becomes possible to model complex AMS behavioral properties. In addition, we present complexity results for the satisfiability problem of such specifications. This approach leverages the growing adoption of AMS behavioral modeling in the industry. This paper also shows that the use of auxiliary state machines allows us to separate out the scope of different analog assertions leading to significant performance gains in the assertion checking overhead. Liu, B. Zhao, D. Reynaert, P. Gielen, G. E. Synthesis of Integrated Passive Components for High-Frequency RF ICs Based on Evolutionary Computation and Machine Learning Techniques http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022011 State-of-the-art synthesis methods for microwave passive components suffer from the following drawbacks. They either have good efficiency but highly depend on the accuracy of the equivalent circuit models, which may fail the synthesis when the frequency is high, or they fully depend on electromagnetic (EM) simulations, with a high solution quality but are too time consuming. To address the problem of combining high solution quality and good efficiency, a new method, called memetic machine learning-based differential evolution (MMLDE), is presented. The key idea of MMLDE is the proposed online surrogate model-based memetic evolutionary optimization mechanism, whose training data are generated adaptively in the optimization process. In particular, by using the differential evolution algorithm as the optimization kernel and EM simulation as the performance evaluation method, high-quality solutions can be obtained. By using Gaussian process and artificial neural network in the proposed search mechanism, surrogate models are constructed online to predict the performances, saving a lot of expensive EM simulations. Compared with available methods with the best solution quality, MMLDE can obtain comparable results, and has approximately a tenfold improvement in computational efficiency, which makes the computational time for optimized component synthesis acceptable. Moreover, unlike many available methods, MMLDE does not need any equivalent circuit models or any coarse-mesh EM models. Experiments of 60 GHz syntheses and comparisons with the state-of-art methods provide evidence of the important advantages of MMLDE. MODELING AND SIMULATION Xiong, X. Wang, J. Dual Algorithms for Vectorless Power Grid Verification Under Linear Current Constraints http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022016 Vectorless power grid verification makes it possible to evaluate worst-case voltage noises without enumerating possible current waveforms. Under linear current constraints, the vectorless power grid verification problem can be formulated and solved as a linear programming (LP) problem. However, previous approaches suffer from long runtime due to the large problem size. In this paper, we design two dual algorithms that efficiently evaluate voltage noises in a resistor/capacitor power grid. Our algorithms combine novel dual approaches to solve the LP problem, and a preconditioned conjugate gradient power grid analyzer. The first algorithm exploits the structure of the problem to simplify its dual problem into a convex problem, which is then solved by the cutting-plane method. The second algorithm formulates a reduced-size LP problem for each node to compute upper bound of voltage noise. Experimental results show that both algorithms are highly efficient. Bhat, H. S. Osting, B. 2-D Inductor-Capacitor Lattice Synthesis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022033 We consider a general class of 2-D passive propagation media, represented as a planar graph where nodes are capacitors connected to a common ground and edges are inductors. Capacitances and inductances are fixed in time but vary in space. Kirchhoff's laws give the time dynamics of voltage and current in the system. By harmonically forcing input nodes and collecting the resulting steady-state signal at output nodes, we obtain a linear, analog device that transforms the inputs to outputs. We pose the lattice synthesis problem: given a linear transformation, find the inductances and capacitances for an inductor-capacitor circuit that can perform this transformation. Formulating this as an optimization problem, we numerically demonstrate its solvability using gradient-based methods. By solving the lattice synthesis problem for various desired transformations, we design several devices that can be used for signal processing and filtering. Ghasemazar, M. Pedram, M. Optimizing the Power-Delay Product of a Linear Pipeline by Opportunistic Time Borrowing http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022006 In this paper, we present and solve the problem of power-delay optimal soft linear pipeline design. The key idea is to use soft-edge flip-flops to allow time borrowing among consecutive stages of the pipeline in order to provide the timing-critical stages with more time and trade this timing slack for power saving. We formulate the problem of optimally designing the soft-edge flip-flops and setting the clock frequency and supply voltage so as to minimize the power-delay product of a linear pipeline under different scenarios using both deterministic and statistical static delay models. In our first problem formulation, timing violations are avoided by respecting deterministic worst case path delay bounds. Next, the same problem is formulated for a scenario where stage delays are assumed to be random variables, and we minimize the power-delay product while keeping the probability of timing violations bounded. The soft-edge flip flops are equipped with dynamic error detection (and correction) circuitry to detect and fix the errors that might arise from over-clocking. Although the system is capable of recovering from error, there is a tradeoff between performance and power saving, which is exploited to further minimize the power-delay product of the pipeline in our third formulation. Experimental results demonstrate the efficacy of our proposed algorithms for solving each of the aforesaid problems. SYSTEM-LEVEL DESIGN Chan, J. Hendry, G. Bergman, K. Carloni, L. P. Physical-Layer Modeling and System-Level Design of Chip- Scale Photonic Interconnection Networks http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022004 Photonic technology is becoming an increasingly attractive solution to the problems facing today's electronic chip- scale interconnection networks. Recent progress in silicon photonics research has enabled the demonstration of all the necessary optical building blocks for creating extremely high-bandwidth density and energy-efficient links for on-chip and off-chip communications. From the feasibility and architecture perspective however, photonics represents a dramatic paradigm shift from traditional electronic network designs due to fundamental differences in how electronics and photonics function and behave. As a result of these differences, new modeling and analysis methods must be employed in order to properly realize a functional photonic chip-scale interconnect design. In this paper, we present a methodology for characterizing and modeling fundamental photonic building blocks which can subsequently be combined to form full photonic network architectures. We also describe a set of tools which can be utilized to assess the physical-layer and system-level performance properties of a photonic network. The models and tools are integrated in a novel open-source design and simulation environment. We present a case study of two different photonic networks-on-chip to demonstrate how our improved understanding and modeling of the physical- layer details of photonic communications can be used to better understand the system-level performance impact. Jang, W. Pan, D. Z. Application-Aware NoC Design for Efficient SDRAM Access http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022008 In systems-on-chip (SoCs), a microprocessor demands guaranteed synchronous dynamic random access memory (SDRAM) latency whereas most of the other cores are served as a best-effort packet. However, a priority service for the guaranteed latency causes the SDRAM utilization and latency of an overall system to be degraded critically. In addition, the data size of SDRAM requested by various cores is not matched with an SDRAM access granularity such that the SDRAM utilization and latency are further deteriorated. In this paper, we propose an application-aware networks-on-chip (NoCs) design for an efficient SDRAM access, which can consider memory latency demands and memory access granularities in various applications. In order to provide short latency for priority memory requests with few penalties, memory request packets are scheduled by our guaranteed SDRAM service router that includes a hybrid flow controller of priority-first and priority-equal algorithms. In addition, our SDRAM access granularity matching NoC design further improves the memory performance by splitting a memory request packet to several short memory request packets and then controlling the short memory request packets with a partially open-page mode and an auto-precharge operation in a memory subsystem. Experimental results show that our cost-effective application-aware NoC design significantly improves, on average, memory latency for latency-sensitive cores up to 32.8%, overall memory latency up to 7.8%, and memory utilization up to 3.4%, compared to the state-of-the-art SDRAM-aware NoC design. TEST Lee, K.-J. Lien, W.-C. Hsieh, T.-Y. Test Response Compaction via Output Bit Selection http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022010 The conventional output compaction methods based on XOR-networks and/or linear feedback shift registers may suffer from the problems of aliasing, unknown-values, and/or poor diagnosability. In this paper, we present an alternative method called the output-bit-selection method to address the test compaction problem. By observing only a subset of output responses, this method can effectively deal with all the above-mentioned problems. Efficient algorithms that can identify near optimum subsets of output bits to cover all detectable faults in very large circuits are developed. Experimental results show that less than 10% of the output response bits of an already very compact test set are enough to achieve 100% single stuck-at fault coverage for most ISCAS benchmark circuits. Even better results are obtained for ITC 99 benchmark circuits as less than 3% of output bits are enough to cover all stuck-at faults in these circuits. The increase ratio of selected bits to cover other types of faults is shown to be quite small if these faults are taken into account during automatic test pattern generation. Furthermore, the diagnosis resolution of this method is almost the same as that achieved by observing all output response bits. Li, M. Hsiao, M. S. 3-D Parallel Fault Simulation With GPGPU http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022009 General purpose computing on graphical processing units (GPGPU) is a paradigm shift in computing that promises a dramatic increase in performance. GPGPU also brings an unprecedented level of complexity in algorithmic design and software development. In this paper, we present an efficient parallel fault simulator, ${rm FSimGP}^{2}$, that exploits the high degree of parallelism supported by a state-of-the-art graphic processing unit (GPU) with the NVIDIA compute unified device architecture. A novel 3-D parallel fault simulation technique is proposed to achieve extremely high computation efficiency on the GPU. Global communication is minimized by concentrating as much work as possible on the local device's memory. We present results on a GPU platform from NVIDIA (a GeForce GTX 285 graphics card) that demonstrate a speedup of up to 63times and 4times compared to two other GPU-based fault simulators and up to 95times over a state-of-the-art algorithm on conventional processor architectures. VERIFICATION Shen, S. Qin, Y. Xiao, L. Wang, K. Zhang, J. Li, S. A Halting Algorithm to Determine the Existence of the Decoder http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022015 Complementary synthesis automatically synthesizes the decoder circuit of an encoder. It determines the existence of the decoder by checking whether the encoder's input can be uniquely determined by its output. However, this algorithm will not halt if the decoder does not exist. To solve this problem, a novel halting algorithm is proposed. For every path of the encoder, this algorithm first checks whether the encoder's input can be uniquely determined by its output. If yes, the decoder exists; otherwise, this algorithm checks if this path contains loops, which can be further unfolded to prove the non-existence of the decoder for all those longer paths. To illustrate its usefulness and efficiency, this algorithm has been run on several complex encoders, including PCI-E and Ethernet. Experimental results indicate that this algorithm always halts properly by distinguishing correct encoders from incorrect ones, and it is more than three times faster than previous ones. SHORT PAPERS Yuan, L. Leventhal, S. R. Gu, J. Qu, G. TALk: A Temperature-Aware Leakage Minimization Technique for Real-Time Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022017 Due to the increasing chip temperature and the strong dependency of leakage power on temperature, thermal aware power management has received a considerable amount of attention recently in energy efficient system design. In this paper, we propose a temperature-aware intra-task scheduling algorithm to minimize leakage energy in real-time systems. The basic idea of our algorithm is to run tasks at full speed when the chip temperature is low or the work urgency is high, and switch the processor to a low-power state when the chip temperature is high or the workload is light. Our offline algorithm can achieve the optimal leakage reduction for a given task with the worst-case execution time, while the online algorithm has a very low runtime complexity. The simulation results show that the online algorithm is able to reduce 34% of total leakage energy on average in both real-world and artificial benchmarks. Finally, we demonstrate how to combine our algorithm with existing dynamic voltage scaling technique to optimize the overall energy consumption. Wei, T. Chen, X. Hu, S. Reliability-Driven Energy-Efficient Task Scheduling for Multiprocessor Real-Time Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022005 This paper proposes a reliability-driven task scheduling scheme for multiprocessor real-time embedded systems that optimizes system energy consumption under stochastic fault occurrences. The task scheduling problem is formulated as an integer linear program where a novel fault adaptation variable is introduced to model the uncertainties of fault occurrences. The proposed scheme, which considers both the dynamic power and the leakage power, is able to handle the scheduling of independent tasks and tasks with precedence constraints, and is capable of scheduling tasks with varying deadlines. Experimental results have demonstrated that the proposed reliability-driven parallel scheduling scheme achieves energy savings of more than 15% when compared to the approach of designing for the corner case of fault occurrences. Maffezzoni, P. D'Amore, D. Analysis of Phase Diffusion Process in Oscillators Due to White and Colored- Noise Sources http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022012 This paper presents an original phase-domain analysis of the diffusion process which occurs in oscillators due to small-signal white and colored-noise sources. It is shown that the proposed analysis is able to predict correctly the oscillator's power spectrum also very closely to the fundamental harmonic. The correctness of the analysis is verified through comparisons with commercially available tools. Pomeranz, I. Subsets of Primary Input Vectors in Sequential Test Generation for Single Stuck-at Faults http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6022014 The complexity of deterministic sequential test generation for a target fault in a circuit with $n$ primary inputs is determined by the need to explore a search space that consists of $2^{n}$ primary input vectors at every time unit. This paper studies the possibility of reducing the complexity of deterministic sequential test generation by using subsets of primary input vectors of limited sizes during test generation for target faults. It considers a test generation procedure that uses subsets of primary input vectors of size $N$, for increasing values of $N$ starting with $N=1$ . The subsets consist of primary input vectors from the test sequence already generated, and of random primary input vectors. The results indicate that all or most of the detectable single stuck-at faults in benchmark circuits can be detected using small subsets of primary input vectors.