December 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-12.txt REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Lin, C.-W.; Lin, J.-M.; Chiu, Y.-C.; Huang, C.-P.; Chang, S.-J. Mismatch-Aware Common-Centroid Placement for Arbitrary-Ratio Capacitor Arrays Considering Dummy Capacitors http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349434 Switched capacitors are commonly used in analog circuits to increase the accuracy of analog signal processing and lower power consumption. To take full advantage of switched capacitors, it is very important to achieve accurate capacitance ratios in the layout of the capacitor arrays, which are affected by systematic and random mismatches. A good capacitor placement should have a common-centroid structure with the highest possible degree of dispersion to mitigate mismatches. Several dummy units should be inserted to make the placement shape more square and compact. This paper proposes a simulated-annealing-based approach for mismatch-aware common-centroid placement under the above constraints. A pair-sequence representation is used to record a placement, and a couple of associated operations are developed to find better solutions. The experimental results show that the proposed placements achieve smaller oxide-gradient-induced mismatch and larger overall correlation coefficients (i.e., higher degree of dispersion) than those of previous works. FPGAS AND RECONFIGURABLE COMPUTING Ansaloni, G.; Tanimura, K.; Pozzi, L.; Dutt, N. Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349426 Coarse-grained reconfigurable arrays (CGRAs) are a promising class of architectures conjugating flexibility and efficiency. Devising effective methodologies to map applications onto CGRAs is a challenging task, due to their parallel execution paradigm and constrained hardware resources. In order to handle complex applications, it is important to devise efficient strategies to partition a kernel into pieces that obey resource constraint and methodologies to schedule them on the underlying hardware. In this paper, we tackle these problems by proposing algorithms to address partitioning based on recursive searches over abstract trees. A novel scheduling strategy is also described that, leveraging differences in delays of various operations, is able to efficiently map operations on CGRA architectures. Experimental evidence on kernels derived from a diverse set of data flow graphs and EEMBC benchmarks demonstrate the efficacy of the described methods, which, when combined, achieve a higher runtime performance on a given mesh size than state-of-the-art approaches (as much as 38% for the benchmark applications considered). HIGH-LEVEL SYNTHESIS Del Barrio, A. A.; Hermida, R.; Memik, S. O.; Mendias, J. M.; Molina, M. C. Multispeculative Addition Applied to Datapath Synthesis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349430 Addition is the key arithmetic operation in most digital circuits and processors. Therefore, their performance and other parameters, such as area and power consumption, are highly dependent on the adders' features. In this paper, we present multispeculation as a way of increasing adders' performance with a low area penalty. In our proposed design, dividing an adder into several fragments and predicting the carry-in of each fragment enables computing every addition in two very short cycles at the most, with 99% or higher probability. Furthermore, based on multispeculation principles, we propose a new strategy for implementing addition chains and hiding most of the penalty cycles due to mispredictions, while keeping at the same time the resource sharing capabilities that are sought in high-level synthesis. Our results show that it is possible to build linear and logarithmic adders more than 4.7x and 1.7x faster than the nonspeculative case, respectively. Moreover, this is achieved with a low area penalty (38% for linear adders) or even an area reduction (-8% for logarithmic adders). Finally, applying multispeculation principles to signal processing benchmarks that use addition chains will result in 25% execution time reduction, with an additional 3% decrease in datapath area with respect to implementations with logarithmic fast adders. MODELING AND SIMULATION Sun, S.; Feng, Y.; Dong, C.; Li, X. Efficient SRAM Failure Rate Prediction via Gibbs Sampling http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349433 Statistical analysis of SRAM has emerged as a challenging issue because the failure rate of SRAM cells is extremely small. In this paper, we develop an efficient importance sampling algorithm to capture the rare failure event of SRAM cells. In particular, we adapt the Gibbs sampling technique from the statistics community to find the optimal probability distribution for importance sampling with a low computational cost (i.e., a small number of transistor- level simulations). The proposed Gibbs sampling method applies an integrated optimization engine to adaptively explore the failure region in a Cartesian or spherical coordinate system by sampling a sequence of 1-D probability distributions. Several implementation issues such as 1-D random sampling and starting point selection are carefully studied to make the Gibbs sampling method efficient and accurate for SRAM failure rate prediction. Our experimental results of a 90 nm SRAM cell demonstrate that the proposed Gibbs sampling method achieves 1.4 Ð 4.9x runtime speedup over other state-of-the-art techniques when a high prediction accuracy is required (e.g., the relative error defined by the 99% confidence interval reaches 5%). In addition, we further demonstrate an important example for which the proposed Gibbs sampling algorithm accurately estimates the correct failure probability, while the traditional techniques fail to work. PHYSICAL DESIGN Lak, Z.; Nicolici, N. On Using On-Chip Clock Tuning Elements to Address Delay Degradation Due to Circuit Aging http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349436 Lifetime performance of digital integrated circuits degrades as a consequence of circuit aging. In the past few years, there has been extensive research to reduce the impact of aging by different design techniques, or to predict the degradation and adapt the circuit accordingly. In this paper, we explore a novel perspective to this problem by exploiting the presence of clock tuning elements in high-performance designs. By combining on-chip sensors to predict setup or hold-time violations with the clock tuning elements, we provide an effective self-tuning mechanism for each circuit sample. The proposed method can operate in-system to prolong the circuit's maximum performance in its unique operating environment. Chang, H.-Y.; Jiang, I. H.-R.; Chang, Y.-W. Timing ECO Optimization Via BŽzier Curve Smoothing and Fixability Identification http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349427 Due to the rapidly increasing design complexity in modern integrated circuit design, more and more timing failures are detected at late stages. Without deferring time-to-market, metal-only engineering change order (ECO) is an economical technique to correct these late-found failures. Typically, a design might need to undergo many ECO runs in design houses; consequently, the usage of spare cells for ECO is of significant importance. In this paper, we aim at timing ECO by using as few spare cells as possible. We observe that a path with good timing is desired to be geometrically smooth. Unlike negative slack and gate delay used in most prior work, we propose a new metric of timing criticality, fixability, by considering the smoothness of timing violating paths. To measure the smoothness of a path, we use the BŽzier curve as the golden path. Furthermore, in order to concurrently fix timing violations, we derive a propagation property to divide violating paths into independent segments. Based on BŽzier curve smoothing, fixability identification, and the propagation property, we develop an efficient algorithm to fix timing violations. Experimental results show that we can effectively resolve all timing violations with significant speedups over the state-of-the-art works. SYSTEM-LEVEL DESIGN Srinivasan, S.; Ganeshpure, K. P.; Kundu, S. A Wavelet-Based Spatio-Temporal Heat Dissipation Model for Reordering of Program Phases to Produce Temperature Extremes in a Chip http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349437 Localized heating leads to generation of thermal hotspots that affect the performance and reliability of an integrated circuit (IC). Functional workloads determine the locations and temperatures of hotspots on a die. In this paper, we present a systematic approach for developing a synthetic workload to maximize the temperature of a target hotspot. Our approach is based on the observation that hotspot temperature is determined not only by the current activity in that region, but also by the past activities in the surrounding regions. Accordingly, we develop a wavelet-based canonical spatio-temporal heat dissipation model for program traces, and use a novel integer linear programming formulation to rearrange program phases to generate target worst case hotspot temperature. Program phase behavior is rooted in the static structure of programs. In this case, the initial set of program phases is extracted from the SPEC 2000 benchmark. We apply this formulation to target another well-known problem of maximizing the temperature between a pair of coordinates in an IC. Experimental results show that by taking the spatio-temporal effect into account, we can raise the temperature of a hotspot higher than what is otherwise possible. Hotspot temperature maximization is important in design verification and testing. TEST Constantin, N. G.; Kwok, K. H.; Shao, H.; Cismaru, C.; Zampardi, P. J. Formulations and a Computer-Aided Test Method for the Estimation of IMD Levels in an Envelope Feedback RFIC Power Amplifier http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349429 This paper presents new formulations, together with an efficient computer-aided test approach intended for radio frequency integrated circuit power amplifiers (PAs), allowing the estimation of linearity requirements for the circuit blocks typically found in the error signal path of an envelope feedback amplifier. The formulations are based on a three-tone excitation, allowing analysis of intermodulation distortion (IMD) within the feedback system using parameterized peak-to-average envelope voltage. They are also based on a fifth-degree representation, and may be extended to higher degrees of nonlinearities in the RF PA block, enabling IMD analysis of envelope feedback amplifiers at low power. The approach proposed in this paper circumvents the difficulty of measuring error signals during closed-loop operation for troubleshooting purposes. This approach is also very useful for computer-aided test setups intended for development work independent of the often idealized circuit simulation environment. Janicki, J.; Kassab, M.; Mrugalski, G.; Mukherjee, N.; Rajski, J.; Tyszer, J. EDT Bandwidth Management in SoC Designs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349438 This paper presents preemptive test application schemes for system-on-a-chip (SoC) designs with embedded deterministic test-based compression. The schemes seamlessly combine new test data reduction techniques with test scheduling algorithms and novel test access mechanisms devised for both input and output sides. In particular, they allow cores to interface with automatic test equipment through an optimized number of channels. They are well suited for SoC devices comprising both nonisolated cores, i.e., blocks that occasionally need to be tested simultaneously, and completely wrapped modules. Experimental results obtained for large industrial SoC designs illustrate feasibility of the proposed test application schemes and are reported herein. Kochte, M. A.; Elm, M.; Wunderlich, H.-J. Accurate X-Propagation for Test Applications by SAT-Based Reasoning http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349431 Unknown or X-values during test applications may originate from uncontrolled sequential cells or macros, from clock or A/D boundaries, or from tristate logic. The exact identification of X-value propagation paths in logic circuits is crucial in logic simulation and fault simulation. In the first case, it enables the proper assessment of expected responses and the effective and efficient handling of X-values during test response compaction. In the second case, it is important for a proper assessment of fault coverage of a given test set and consequently influences the efficiency of test pattern generation. The commonly employed n-valued logic simulation evaluates the propagation of X-values only pessimistically, i.e., the X-propagation paths found by n-valued logic simulation are a superset of the actual propagation paths. This paper presents an efficient method for overcoming this pessimism and for determining accurately the set of signals that carry an X-value for an input pattern. As examples, it investigates the influence of this pessimism on the two applications, X-masking and stuck-at fault coverage assessment. The experimental results on benchmark and industrial circuits assess the pessimism of classic algorithms and show that these algorithms significantly overestimate the signals with X-values. The experiments show that overmasking of test data during test compression can be reduced by an accurate analysis. In stuck-at fault simulation, the coverage of the test set is increased by the proposed algorithm without incurring any overhead. SHORT PAPERS Viraraghavan, J.; Pandharpure, S. J.; Watts, J. Statistical Compact Model Extraction: A Neural Network Approach http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349439 A technique for extracting statistical compact model parameters using artificial neural networks (ANNs) is proposed. ANNs can model a much higher degree of nonlinearity compared to existing quadratic polynomial models and, hence, can even be used in sub-100-nm technologies to model leakage current that exponentially depends on process parameters. Existing techniques cannot be extended to handle such exponential functions. Additionally, ANNs can handle multiple input multiple output relations very effectively. The concept applied to CMOS devices improves the efficiency and accuracy of model extraction. Results from the ANN match the ones obtained from SPICE simulators within 1%. Maffezzoni, P.; Levantino, S. Phase-Noise Analysis and Simulation of LC Oscillator-Based Injection-Locked Frequency Dividers http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349435 This letter proposes a phase-noise analysis of injection-locked frequency dividers that are based on LC oscillators. The proposed analysis method relies on the concept of the perturbation projection vector and allows one to investigate how the output phase noise is affected by the amplitude and the frequency of the input signal. Closed- form expressions for the output phase-noise spectrum under different injection conditions are provided and validated against the periodic noise analysis of a commercial circuit simulator. Indeed, the proposed semianalytical method can provide insights and guidelines to improve the circuit design. Li, K. S.-M.; Liao, Y.-Y. Layout-Aware Multiple Scan Tree Synthesis for 3-D SoCs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349432 An interconnect-driven layout-aware multiple scan tree (MST) synthesis methodology for 3-D integrated circuits (ICs) is proposed. MSTs, also known as scan forest, greatly reduce test data volume and test application time in system-on-a-chip testing. Previous studies on layout-aware scan tree synthesis only address 2-D layouts, so they cannot be directly applied to 3-D ICs. The proposed algorithm effectively optimizes both test compression rate and routing length under 3-D IC-induced constraints, and produces better results than all previous known methods. Chung, J.; Abraham, J. A. On Computing Criticality in Refactored Timing Graphs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6349428 The maximum operator in statistical static timing analysis (SSTA) is a decent approximation for timing sign-off, but often causes significant error in SSTA applications. This paper presents a timing criticality computation method based on non-maximum analytic operators in a parameterized SSTA. After an SSTA run, the proposed method computes the criticality for all edges and nodes in a single graph traversal. Although we do not employ the max operator in the computation process, the error in the maximum operator still degrades the accuracy of the computed criticality because the criticality is a joint probability of expressions, including arrival times, which are computed by the maximum operator during SSTA. To address this issue, we employ the refactoring technique, which was recently proposed to reduce common path pessimism in combinational circuits. This paper shows that refactoring is also very useful in reducing the maximum-induced error in arrival times, and how existing graph-based algorithms can be geared toward refactoring. Our experimental results show that the proposed method reduces the error of the criticality significantly compared to the conventional cutset-based method.