Oct 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-10.txt REGULAR PAPERS KEYNOTE PAPER Pedram, M. Energy-Efficient Datacenters http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303941 Pervasive use of cloud computing and the resulting rise in the number of datacenters and hosting centers (that provide platform or software services to clients who do not have the means to set up and operate their own computing facilities) have brought forth many concerns, including the electrical energy cost, peak power dissipation, cooling, and carbon emission. With power consumption becoming an increasingly important issue for the operation and maintenance of the hosting centers, corporate and business owners are becoming increasingly concerned. Furthermore, provisioning resources in a cost-optimal manner so as to meet different performance criteria, such as throughput or response time, has become a critical challenge. The goal of this paper is to provide an introduction to resource provisioning and power or thermal management problems in datacenters, and to review strategies that maximize the datacenter energy efficiency subject to peak or total power consumption and thermal constraints, while meeting stipulated service level agreements in terms of task throughput and/or response time. ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Singh, A. K.; Ragab, K.; Lok, M.; Caramanis, C.; Orshansky, M. Predictable Equation-Based Analog Optimization Based on Explicit Capture of Modeling Error Statistics http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303944 Equation-based optimization using geometric programming (GP) for automated synthesis of analog circuits has recently gained broader adoption. A major outstanding challenge is the inaccuracy resulting from fitting the complex behavior of scaled transistors to posynomial functions. In this paper, we advance a novel optimization strategy that explicitly handles the error of the model in the course of optimization. The innovation is in enabling the successive refinement of transistor models within gradually reducing ranges of operating conditions and dimensions. Refining via a brute force requires exponential complexity. The key contribution is the development of a framework that optimizes efficient convex formulations, while using SPICE as a feasibility oracle to identify solutions that are feasible with respect to the accurate behavior rather than the fitted model. Due to the poor posynomial fit, standard GP can return grossly infeasible solutions. Our approach dramatically improves feasibility. We accomplish this by introducing robust modeling of the fitting error's sample distribution information explicitly within the optimization. To address cases of highly stringent constraints, we introduce an automated method for identifying a true feasible solution through minimal relaxation of design targets. We demonstrate the effectiveness of our algorithm on two benchmarks: a two-stage CMOS operational amplifier and a voltage-controlled oscillator designed in TSMC 0.18um CMOS technology. Our algorithm is able to identify superior solution points producing uniformly better power and area values under a gain constraint with improvements of up to 50% in power and 10% in area for the amplifier design. Moreover, whereas standard GP methods produced solutions with constraint violations as large as 45%, our method finds feasible solutions. Levantino, S.; Maffezzoni, P. Computing the Perturbation Projection Vector of Oscillators via Frequency Domain Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303938 This paper describes an original computational procedure to extract the perturbation projection vector of oscillators by means of a frequency domain technique. A key feature of the method is that it relies on the periodic transfer function analysis, which is available in most circuit simulators, and thus it can easily be exploited by oscillator designers. The accuracy of the proposed extraction procedure is verified for two relevant oscillator topologies. HIGH-LEVEL SYNTHESIS Sinha, R.; Patel, H. D. synASM: A High-Level Synthesis Framework With Support for Parallel and Timed Constructs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303940 This paper presents a high-level synthesis framework called synASM that synthesizes abstract state machines (ASMs) to VHDL for field-programmable gate arrays (FPGAs). In particular, this paper focuses on the specification, scheduling, and synthesis of parallel and timed constructs. ASMs possess well-defined formal semantics for sequential and parallel computation, and their composition. We extend ASMs to support the specification of timing requirements, which we call timed constructs. We also describe the composition of timed constructs with sequential and parallel computation. A key contribution of this paper is the extension of the force- directed scheduling algorithm to support both parallel and timed constructs. We implement the synthesis back-end in synASM that targets FPGAs. Our experiments show improvements of up to 52% in lookup table usage and 34% in total area for certain examples. MODELING AND SIMULATION Priyadarshi, S.; Saunders, C. S.; Kriplani, N. M.; Demircioglu, H.; Davis, W. R.; Franzon, P. D.; Steer, M. B. Parallel Transient Simulation of Multiphysics Circuits Using Delay-Based Partitioning http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303942 A parallel transient simulation technique for multiphysics circuits is presented. The technique develops partitions utilizing the inherent delay present within a circuit and between physical domains. A state-variable-based circuit delay element is presented, which implements the coupling between two spatially or temporally isolated circuit partitions. A parallel delay-based iterative approach for interfacing delay-partitioned subcircuits is applied, which achieves the reasonable accuracy of nonparallel circuit simulation if both incorporate the same interblock delay. The partitioned subcircuits are distributed to different cores of a shared-memory multicore processor and solved in parallel. A multithreaded implementation of the methodology using OpenMP is presented. Examples showing superlinear speedup compared to unpartitioned single-core simulation using the direct method are presented. This paper also discusses the impact of load balancing and absolute delay on simulation speedup. Kim, D.; Kim, H.; Eo, Y. Analytical Eye-Diagram Determination for the Efficient and Accurate Signal Integrity Verification of Single Interconnect Lines http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303934 In this paper, a new efficient and accurate analytical eye-diagram determination technique for interconnect lines is presented. The simplest input test signal model for the intersymbol interference analysis of high-speed data links is mathematically formulated. Since input test patterns for eye boundaries are determined analytically, it is considered very convenient and efficient. The proposed technique shows excellent agreement with the SPICE-based simulation in both eye height and jitter, i.e., within 5% error for nondiscontinuous data paths and 10% error for discontinuous data paths. The method is much more computation-time-efficient than the pseudorandom bit sequence-based SPICE simulation in the order of magnitude. PHYSICAL DESIGN Hsu, K.-T.; Sinha, S.; Pi, Y.-C.; Ho, T.-Y. A Hierarchy-Based Distributed Algorithm for Layout Geometry Operations http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303936 This paper introduces a novel distributed algorithm for performing the layout geometry operations usually found in design rule checking, layout verification, and mask synthesis. A large number of machines are typically available to the user during the mask synthesis flow. As multiple machines or cores become more ubiquitous, even designers using layout verification tools will have access to a large set of machines. Therefore, an efficient and scalable distributed algorithm for performing sequences of layout geometry operations will be of great value to both designers and mask synthesis engineers. Given a layout and a sequence of layout geometry operations, the proposed algorithm divides the layout into several partitions. The given sequence of layout geometry operations is executed in parallel on different partitions. New partitions are derived from the original set of partitions and the sequence of geometry operations is repeated on larger partitions with much fewer polygons. This process continues until it produces a partition that covers the entire layout area. A key feature of the proposed algorithm is that it is correct- by-construction, i.e., each partition is guaranteed to generate a subset of the correct results. Complete and correct results are generated for each layout geometry operation for the entire layout when the operation completes execution on all the partitions. The proposed algorithm was implemented in Gearman, an open-source distributed framework. Results on large industrial layouts show good performance and scalability. Ozdal, M. M.; Burns, S.; Hu, J. Algorithms for Gate Sizing and Device Parameter Selection for High-Performance Designs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303939 It is becoming increasingly important to design high-performance circuits with as low power as possible. In this paper, we study the gate sizing and device parameter selection problem for today's industrial designs. We first outline the typical practical problems that make it difficult to use traditional algorithms on high-performance industrial designs. Then, we propose a Lagrangian relaxation-based formulation that decouples timing analysis from optimization without a resulting loss in accuracy. We also propose a graph model that accurately captures discrete cell-type characteristics based on library data. We model the relaxed Lagrangian subproblem as a graph problem and propose algorithms to solve it. In our experiments, we demonstrate the importance of using the signoff timing engine to guide the optimization. We also show the benefit of the graph model we propose to solve the discrete optimization problem. Compared to a state-of-the art industrial optimization flow, we show that our algorithms can obtain up to 38% leakage power reductions and better overall timing for real high-performance microprocessor blocks. SYSTEM-LEVEL DESIGN Gladigau, J.; Haubelt, C.; Teich, J. Model-Based Virtual Prototype Acceleration http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303935 Today, virtual prototypes are often employed for software development early in the design flow. There, high simulation speed may support fast development. So, the acceleration of virtual prototype simulation is important in the early phases of design. To accelerate virtual prototypes, complex prototype simulation can be prevented by exploiting model-specific knowledge. We replace complex event-driven interaction with execution of predefined traces. In particular, we show that, for many dataflow-dominated application models, such accelerating traces may be efficiently determined. Trace determination is based on a novel symbolic search technique. We show that virtual prototypes exploiting such traces may lead to a significant simulation time reduction. The benefits are quantified for the prototype of a SystemC/TLM network packet filter, where traces result in up to 30% simulation acceleration. TEST Fang, H.; Chakrabarty, K.; Wang, Z.; Gu, X. Diagnosis of Board-Level Functional Failures Under Uncertainty Using Dempster-Shafer Theory http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303933 Despite recent advances in structural test methods, the diagnosis of the root cause of board-level failures for functional tests remains a major challenge. A promising approach to address this problem is to carry out fault diagnosis in two phasesÑsuspect faulty components on the board or modules within components (together referred to as blocks in this paper) are first identified and ranked, and then fine-grained diagnosis is used to target the suspect blocks in a ranked order. We propose a new method based on dataflow analysis and DempsterÐShafer (DS) theory for ranking faulty blocks in the first phase of diagnosis. The proposed approach transforms the information derived from one functional test failure into multiple-stage failures by partitioning the given functional test into multiple stages. A measure of ÒbeliefÓ is then assigned to each block based on the knowledge of each failing stage, and the DS theory is subsequently used to aggregate the beliefs from multiple failing stages. Blocks with higher beliefs are ranked on the top of the candidate list. Simulations on an industry design for a network interface application as well as on an open source system-on-a-chip show that the proposed method can provide accurate ranking for most board- level functional failures. Huang, Y.-J.; Li, J.-F. Built-In Self-Repair Scheme for the TSVs in 3-D ICs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303937 3-D integration using through-silicon-via (TSV) has been widely acknowledged as one future integrated-circuit (IC) technology. Test and yield are two big issues for volume production of 3-D ICs. In this paper, we propose a built-in self-repair (BISR) scheme to test and repair TSVs in 3-D ICs. The BISR scheme, arranging the TSVs into arrays similar to memories, can effectively enhance the yield of TSVs in a 3-D IC such that the yield of the 3-D IC is boosted. Furthermore, a global fusing methodology is proposed to reduce the requirement of fuses. Simulation and analysis results show that the proposed BISR scheme can drastically reduce the area cost and test time in comparison with an existing TSV repair scheme for the same final yield of TSVs under repair. For a 3-D wide-IO DRAM with 512 TSVs, for example, the proposed repair scheme can achieve 32.4% area reduction and 73.4% test time reduction. Yu, X.; Blanton, R. D. Improving Diagnosis Through Failing Behavior Identification http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6303946 Logic diagnosis analyzes the observed failing circuit responses to derive the potential defect sites. This paper describes a method for improving diagnosis through failing behavior identification (FBI). FBI captures defect behavior (i.e., activation conditions of the defect) by identifying the signal lines related to defect activation. This additional information allows the root cause to be estimated in order to improve yield, design quality, and test quality, as well as guide PFA to perform faster defect localization. FBI is accomplished by: 1) deriving the neighborhood states of the defect site, i.e., the actual values on the signal lines within logical or physical proximity to the defect site, and 2) identifying the signal lines that are most relevant to defect activation. The efficacy of FBI is validated using circuit-level and logic-level simulation experiments. The results show that FBI achieves an average accuracy of 94% in identifying signal lines that are relevant to defect activation, a 28% improvement over an existing approach. Moreover, by analyzing the neighborhood states of each defect site reported by logic diagnosis, sites that are not likely to be defective can be eliminated, which leads to improvement in diagnosis resolution. Experiment results show that with little influence on diagnosis accuracy, the number of incorrect defective sites reported by logic diagnosis can be reduced by 64%, on average.