September 2011 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-09.txt REGULAR PAPERS EMBEDDED SYSTEMS Kinsman, A.B. Nicolici, N.N. Automated Range and Precision Bit-Width Allocation for Iterative Computations http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989989 As scientific computing becomes more widespread in environments where form-factor considerations necessitate hardware acceleration, the problem of selecting numerical data representations (bit-width allocation), key to accelerator design, is faced with shortcomings in the existing techniques. To address this problem for scientific computing dataflows, we propose a methodology for determining custom hybrid fixed/floating-point data representations for iterative computations. LOGIC SYNTHESIS Qian W. Riedel, M.D. Zhou H. Bruck, J. Transforming Probabilities With Combinational Logic http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989992 Schemes for probabilistic computation can exploit physical sources to generate random values in the form of bit streams. Generally, each source has a fixed bias and so provides bits with a specific probability of being one. If many different probability values are required, it can be expensive to generate all of these directly from physical sources. This paper demonstrates novel techniques for synthesizing combinational logic that transforms source probabilities into different target probabilities. We consider three scenarios in terms of whether the source probabilities are specified and whether they can be duplicated. In the case that the source probabilities are not specified and can be duplicated, we provide a specific choice, the set {0.4, 0.5}; we show how to synthesize logic that transforms probabilities from this set into arbitrary decimal probabilities. Further, we show that for any integer n ³ 2, there exists a single probability that can be transformed into arbitrary base-n fractional probabilities. In the case that the source probabilities are specified and cannot be duplicated, we provide two methods for synthesizing logic to transform them into target probabilities. In the case that the source probabilities are not specified, but once chosen cannot be duplicated, we provide an optimal choice. MODELING AND SIMULATION Mizunuma, H. Lu Y.-C. Yang C.-L. Thermal Modeling and Analysis for 3-D ICs With Integrated Microchannel Cooling http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989994 Integrated microchannel liquid-cooling technology is envisioned as a viable solution to alleviate an increasing thermal stress imposed by 3-D stacked ICs. Thermal modeling for microchannel cooling is challenging due to its complicated thermal-wake effect, a localized temperature wake phenomenon downstream of a heated source in the flow. This paper presents a fast and accurate thermal-wake aware thermal model for integrated microchannel 3-D ICs. A combination of the microchannel thermal-wake function and the channel merging technique achieves more than 3300? speedup with less than 5% error in comparison with a commercial numerical finite volume simulation tool. With the proposed model, we characterize thermal behaviors of microchannel-cooled 3-D ICs and compare them with the case of conventional air-cooled 3-D ICs. We also demonstrate thermal-aware placements using our thermal model. It shows that the proposed model can be used to reduce peak temperatures, which is considered important for 3-D IC designs. Gu C. QLMOR: A Projection-Based Nonlinear Model Order Reduction Approach Using Quadratic-Linear Representation of Nonlinear Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5991229 We present a projection-based nonlinear model order reduction method, named model order reduction via quadratic-linear systems (QLMOR). QLMOR employs two novel ideas: 1) we show that nonlinear ordinary differential equations, and more generally differential-algebraic equations (DAEs) with many commonly encountered nonlinear kernels can be rewritten equivalently in a special representation, quadratic-linear differential algebraic equations (QLDAEs), and 2) we perform a Volterra analysis to derive the Volterra kernels, and we adapt the moment-matching reduction technique of nonlinear model order reduction method (NORM) to reduce these QLDAEs into QLDAEs of much smaller size. Because of the generality of the QLDAE representation, QLMOR has significantly broader applicability than Taylor-expansion-based methods since there is no approximation involved in the transformation from original DAEs to QLDAEs. Because the reduced model has only quadratic nonlinearities, its computational complexity is less than that of similar prior methods. In addition, QLMOR, unlike NORM, totally avoids explicit moment calculations, hence it has improved numerical stability properties as well. We compare QLMOR against prior methods on a circuit and a biochemical reaction-like system, and demonstrate that QLMOR-reduced models retain accuracy over a significantly wider range of excitation than Taylor-expansion-based methods. QLMOR, therefore, demonstrates that Volterra-kernel based nonlinear MOR techniques can in fact have far broader applicability than previously suspected, possibly being competitive with trajectory-based methods (e.g., trajectory piece-wise linear reduced order modeling) and nonlinear-projection based methods (e.g., maniMOR). Zhuo C. Chopra, K. Sylvester, D. Blaauw, D. Process Variation and Temperature-Aware Full Chip Oxide Breakdown Reliability Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989995 Gate oxide breakdown (OBD) is a key factor limiting the useful lifetime of an integrated circuit. Unfortunately, the conventional approach for full chip OBD reliability analysis assumes a uniform oxide thickness and worst-case temperature for all devices. In practice, however, gate oxide thickness varies from die-to-die and within-die and hence may cause different reliability for different devices even chips. Moreover, due to the increased across-die temperature variation, such difference may be exacerbated. Thus, as the precision of variation control worsens, an alternative reliability analysis approach is needed. In this paper, we propose a statistical framework for chip-level gate OBD reliability analysis while considering both die-to-die and within-die components of thickness variations as well as the across-die temperature variation. The thickness of each device is modeled as a distinct random variable and thus the full chip reliability estimation problem is defined on a huge sample space of several million devices. We observe that the chip-level OBD reliability function is independent of the relative location of the individual devices. This enables us to transform the problem such that the resulting representation can be expressed in terms of much fewer random variables. Using this transformation, we present a computationally efficient and accurate approach for estimating the full chip reliability while considering spatial correlations of gate oxide thickness as well as temperature variation. We show that, compared to Monte Carlo simulation, the proposed method incurs an error of only around 1% while improving the runtime by more than three orders of magnitude. PHYSICAL DESIGN Lin Y.-H. Chang S.-H. Li Y.-L. Critical-Trunk-Based Obstacle-Avoiding Rectilinear Steiner Tree Routings and Buffer Insertion for Delay and Slack Optimization http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989988 For modern designs, delay optimization significantly facilitates success in design closure owing to its more realistic metric than wirelength in routing. Obstacle-avoiding rectilinear Steiner tree (OARST) construction is an essential routing problem. With the trends toward Internet protocol-block-based system-on-chip designs, OARST with buffer insertion has been surveyed to diminish the delay of long wires. Previous works on performance-driven (PD) OARST without and with buffer insertion can only handle small circuits. This paper develops a novel routing algorithm in obstacle-avoiding spanning graph to construct OARST with optimized delay efficiently. The proposed multisource single-target maze routing is first employed to identify the critical trunks, and the critical-trunk-based tree growth mechanism connects the unconnected pins to critical trunks under delay constraints of every sink. We apply the proposed critical-trunk-based tree growth mechanism to solve PD and slack-driven (SD) OARST problems. The proposed algorithms are extended to consider buffer insertion during PD and SD OARST constructions. Experimental results demonstrate that the proposed algorithms achieve an average 25.84% improvement in the maximum delay over obstacle-avoiding rectilinear Steiner minimal tree in the PD OARST problem and successfully solve 66.67% worst negative slack violations in the SD OARST problem. Compared to the simultaneous routing and buffer insertion approach, the proposed buffer-aware (BA) algorithm generates satisfactory timing results with almost identical wire length (WL). Moreover, the proposed BA SD OARST algorithm utilizes less WL than the BA rectilinear Steiner tree construction does by 17.99% on average. The runtime comparison with previous works shows the efficiency and scalability of this paper. Tolbert, J.R. Zhao X. Lim S. K. Mukhopadhyay, S. Analysis and Design of Energy and Slew Aware Subthreshold Clock Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989993 In this paper, we analyze the effect of clock slew in subthreshold circuits. Specifically, we address the issue that variations in clock slew at the register control can cause serious timing violations. We show that clock slew variations can cause frequency targets to deviate by as much as 28% from the design goals. Based on these observations, we recognize the importance of clock slew control in subthreshold circuits. We propose a systematic approach to design the clock tree for subthreshold circuits to reduce the clock slew variations while minimizing the energy dissipation in the tree. The combined approach, including the wire sizing and dynamic nodal capacitance control, can achieve better slew control (and better timing control) at lower energy in subthreshold circuits. SYSTEM-LEVEL DESIGN Ayoub, R. Indukuri, K. Rosing, T.S. Temperature Aware Dynamic Workload Scheduling in Multisocket CPU Servers http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989984 In this paper, we propose a multitier approach for significantly lowering the cooling costs associated with fan subsystems without compromising the system performance. Our technique manages the fan speed by intelligently allocating the workload at the core level as well as at the CPU socket level. At the core level we propose a proactive dynamic thermal management scheme. We introduce a new predictor that utilizes the band-limited property of the temperature frequency spectrum. A big advantage of our predictor is that it does not require the costly training phase and still maintains high accuracy. At the socket level, we use control theoretic approach to develop a stable scheduler that reduces the cooling costs further by providing a better thermal distribution. Our thermal management scheme incorporates runtime workload characterization to perform efficient thermally aware scheduling. The experimental results show that our approach delivers an average cooling energy savings of 80% compared to the state of the art techniques. The reported results also show that our formal technique maintains stability while heuristic solutions fail in this aspect. Thong, J. Nicolici, N. An Optimal and Practical Approach to Single Constant Multiplication http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989990 Existing optimal algorithms are limited to constants of up to 19 bits. Our algorithm requires less than 10 s on average to find a solution for a 32 bit constant. Optimality is guaranteed via an exhaustive search. We analyze two common SCM frameworks and the corresponding search strategies that each framework facilitates. Combining the strengths of both frameworks, we obtain highly aggressive pruning. The various strategies used in our algorithm and their underlying intuition are discussed extensively in this paper. Gebhardt, D. You J. Stevens, K.S. Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989987 The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable battery-powered devices. We have designed and evaluated a network-on-chip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockless) network operates with efficient two-phase bundled-data links and four-phase routers. The topology and router floorplan is determined by our tool, ANetGen, which optimizes the network for energy and latency using simulated annealing and force-directed placement methods. We compare our solutions against a traditional synchronous NoC as specified by the COSI-2.0 framework and ORION 2.0 router and wire energy models. Traffic is simulated with SystemC functional models, and messages are generated with a "bursty" self-similar b-model. Results indicate our asynchronous network was more energy-efficient, lower in area, and provided comparable or superior message latency. TEST Bounceur, A. Mir, S. Stratigopoulos, H.-G. Estimation of Analog Parametric Test Metrics Using Copulas http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5991228 A new technique for the estimation of analog parametric test metrics at the design stage is presented in this paper. This technique employs the copulas theory to estimate the distribution between random variables that represent the performances and the test measurements of the circuit under test (CUT). A copulas-based model separates the dependencies between these random variables from their marginal distributions, providing a complete and scale-free description of dependence that is more suitable to be modeled using well-known multivariate parametric laws. The model can be readily used for the generation of an arbitrarily large sample of CUT instances. This sample is thereafter used for estimating parametric test metrics such as defect level (or test escapes) and yield loss. We demonstrate the usefulness of the proposed technique to evaluate a built-in-test technique for a radio frequency low noise amplifier and to set test limits that result in a desired tradeoff between test metrics. In addition, we compare the proposed technique with previous ones that rely on direct density estimation. SHORT PAPERS Eggersgluss, S. Drechsler, R. Efficient Data Structures and Methodologies for SAT-Based ATPG Providing High Fault Coverage in Industrial Application http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989985 ATPG based on Boolean satisfiability (SAT) turned out to be a robust alternative to classical structural automatic test pattern generation (ATPG) algorithms performing very well especially for hard-to-detect faults but suffer from the overhead for easy-to-detect faults. In this letter, we propose new efficient data structures and methodologies for SAT-based ATPG. The novel incremental SAT solving technique dynamic clause activation which makes use of structural information using dedicated data structures forms the core of a new flexible SAT-based ATPG approach. Experimental results on large industrial circuits show a significant performance gain and a removal of the limitations. At the same time, the robustness of SAT-based ATPG can even be strengthened resulting in very high fault efficiency and increased fault coverage for transition faults. Pomeranz, I. Scan Shift Power of Functional Broadside Tests http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989991 The power dissipation during the application of scan-based tests can be significantly higher than during functional operation. An exception is the second, fast functional capture cycles of functional broadside tests, where it is guaranteed that the power dissipation will not exceed that possible during functional operation. The power dissipation during the other clock cycles of functional broadside tests is studied here for the first time. The clock cycles under consideration are referred to as scan shift cycles. This paper describes a test generation procedure that limits the power dissipation during scan shift cycles of functional broadside tests. Experimental results for benchmark circuits demonstrate the extent to which the power dissipation during scan shift cycles can be limited without affecting the transition fault coverage. Erdogan, E. S. Ozev, S. A Multi-Site Test Solution for Quadrature Modulation RF Transceivers http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5989986 In this letter, we present a 2x-site test solution for radio frequency transceivers using only baseband signals for analysis. We perform all operations on communication standard-compliant signal packets, thereby putting the device under the normal operating conditions. The transmitter on one device under test (DUT) is coupled with a receiver on another DUT to form a complete transmitter-to-receiver path. Parameters of the two devices are decoupled from one another by carefully modeling the system and using signal processing techniques. Simulation as well as measurement results confirm the high accuracy of the proposed technique.