January 2013 Newsletter Placing you one click away from the best new CAD research! EDITORIAL Sapatnekar, S. S. Editorial http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387704 ANNUAL LIST OF REVIEWERS http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387673 KEYNOTE PAPER Gupta, P.; Agarwal, Y.; Dolecek, L.; Dutt, N.; Gupta, R. K.; Kumar, R.; Mitra, S.; Nicolau, A.; Rosing, T. S.; Srivastava, M. B.; Swanson, S.; Sylvester, D. Underdesigned and Opportunistic Computing in Presence of Hardware Variability http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387697 Microelectronic circuits exhibit increasing variations in performance, power consumption, and reliability parameters across the manufactured parts and across use of these parts over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure yield and reliability with respect to a rigid set of datasheet specifications. This paper explores the possibility of constructing computing machines that purposely expose hardware variations to various layers of the system stack including software. This leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or modeled hardware. The envisioned underdesigned and opportunistic computing (UnO) machines face a number of challenges related to the sensing infrastructure and software interfaces that can effectively utilize the sensory data. In this paper, we outline specific sensing mechanisms that we have developed and their potential use in building UnO machines. REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Gong, F.; Basir-Kazeruni, S.; He, L.; Yu, H. Stochastic Behavioral Modeling and Analysis for Analog/Mixed-Signal Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387696 It has become increasingly challenging to model the stochastic behavior of analog/mixed-signal (AMS) circuits under large-scale process variations. In this paper, a novel moment-matching-based method has been proposed to accurately extract the probabilistic behavioral distributions of AMS circuits. This method first utilizes Latin hypercube sampling coupling with a correlation control technique to generate a few samples (e.g., sample size is linear with number of variable parameters) and further analytically evaluate the high-order moments of the circuit behavior with high accuracy. In this way, the arbitrary probabilistic distributions of the circuit behavior can be extracted using moment-matching method. More importantly, the proposed method has been successfully applied to high-dimensional problems with linear complexity. The experiments demonstrate that the proposed method can provide up to 1666X speedup over crude Monte Carlo method for the same accuracy. EMBEDDED SYSTEMS Ahn, J.; Choi, K. Isomorphism-Aware Identification of Custom Instructions With I/O Serialization http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387693 Extensible processors have been widely used to achieve the conflicting demands for performance improvement, low power consumption, and flexibility. As extensible processors have become more popular, several algorithms have been proposed for automatically identifying instruction-set extensions in order to reduce the effort of manual design and verification. However, most of them focus on finding large and complex instructions that are used only once, rather than repeatedly used ones. Moreover, some other approaches that consider recurrence are limited to finding small instructions. This paper proposes a novel algorithm that considers the instruction reusability as well as input/output (I/O) serialization. In order to overcome the high complexity of the problem, we develop a canonical-form construction algorithm for fast isomorphism detection on directed acyclic graphs and an incremental template generation algorithm that identifies the best custom instruction in terms of a user-defined fitness function. Moreover, our algorithm serializes I/O operations so that the numbers of inputs and outputs of custom instructions are not limited by the microarchitecture. This paper also proposes an algorithm for multiple custom instructions utilizing a well-known iterative selection algorithm. Last, it presents a hybrid algorithm composed of our algorithm and the previous algorithm that does not consider reusability. Experimental results show that our isomorphism-aware algorithm achieves significant improvement over previous approaches in terms of algorithm runtime, as well as performance gain obtained by custom instructions. EMERGING TECHNOLOGIES Bhoj, A. N.; Joshi, R. V.; Jha, N. K. Efficient Methodologies for 3-D TCAD Modeling of Emerging Devices and Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387694 Over the past decade, 3-D process simulation, which is central to the 3-D Technology Computer-Aided Design (3-D TCAD) approach, has severely limited the scope and applicability of TCAD to circuits with a small number of field-effect transistors, owing to its prohibitively high computational costs for large layouts. Due to rapidly changing process recipes and shorter production cycles in the industry, designÐtime optimization and iterative layout-3-D TCAD exploration for yield-critical or yield-characterizing circuits, such as static random-access memories (SRAMs), ring oscillators, and others, is currently impossible in a practical time frame. In this paper, we architect a novel layout/process/device-independent TCAD methodology in the Sentaurus tool suite to overcome the process simulation barrier for accurate 3-D TCAD structure generation. We adopt an automated structure synthesis (SS) approach, thereby bypassing the need for repetitive 3-D process simulations for different layouts or different versions of the same layout. Results for 32-nm bulk process simulations versus SS and 32-nm silicon-on-insulator (SOI) hardware measurements versus corresponding synthesized structures indicate that the method is an excellent substitute to 3-D process simulation of large layouts, with extremely favorable time and memory scaling behavior. Finally, the robustness and scalability of the proposed abstractions are highlighted through the synthesis of 22-nm SOI 6T FinFET SRAMs and ring oscillator structures. Luo, Y.; Chakrabarty, K.; Ho, T.-Y. Error Recovery in Cyberphysical Digital Microfluidic Biochips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387703 Droplet-based digital microfluidics technology has now come of age, and software-controlled biochips for healthcare applications are starting to emerge. However, today's digital microfluidic biochips suffer from the drawback that there is no feedback to the control software from the underlying hardware platform. Due to the lack of precision inherent in biochemical experiments, errors are likely during droplet manipulation; error recovery based on the repetition of experiments leads to wastage of expensive reagents and hard-to-prepare samples. By exploiting recent advances in the integration of optical detectors (sensors) into a digital microfluidics biochip, we present a physical-aware system reconfiguration technique that uses sensor data at intermediate checkpoints to dynamically reconfigure the biochip. A cyberphysical resynthesis technique is used to recompute electrode-actuation sequences, thereby deriving new schedules, module placement, and droplet routing pathways, with minimum impact on the time-to-response. MODELING AND SIMULATION Aadithya, K. V.; Demir, A.; Venugopalan, S.; Roychowdhury, J. Accurate Prediction of Random Telegraph Noise Effects in SRAMs and DRAMs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387692 With aggressive technology scaling and heightened variability, circuits such as SRAMs and DRAMs have become vulnerable to random telegraph noise (RTN). The bias dependence (i.e., non-stationarity), bi-directional coupling, and high inter-device variability of RTN present significant challenges to understanding its circuit-level effects. In this paper, we present two computer-aided design (CAD) tools, SAMURAI and MUSTARD, for accurately estimating the impact of non-stationary RTN on SRAMs and DRAMs. While traditional (stationary) analysis is often overly pessimistic (e.g., it overestimates RTN-induced SRAM failure rates), the predictions made by SAMURAI and MUSTARD are more reliable by virtue of non-stationary analysis. Lin, I.-C.; Lin, C.-H.; Li, K.-H. Leakage and Aging Optimization Using Transmission Gate-Based Technique http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387701 Negative bias temperature instability (NBTI), which can degrade the switching speed of PMOS transistors, has become a major reliability challenge. Reducing leakage consumption is one of the major design goals. The gate replacement (GR) technique is an effective way to reduce both the NBTI effect and leakage. This technique, however, has less flexibility because the replaced gate can only produce one output value and careful algorithms are needed to decide the output value of the replaced gate. In this paper, we propose a novel transmission gate-based technique to minimize NBTI-induced degradation and leakage. This technique, which can offer logic 1 for NBTI mitigation and logic 0 for leakage reduction, provides higher flexibility, as compared to the GR technique. Simulation results show that our proposed technique has up to 20x and 2.16x, on average, improvement on NBTI-induced degradation with comparable leakage power reduction. With a 19.19% area penalty, combining our technique and the GR can reduce 17.92% of the total leakage power and 32.36% of NBTI-induced circuit degradation. Firouzi, F.; Kiamehr, S.; Tahoori, M. B. Power-Aware Minimum NBTI Vector Selection Using a Linear Programming Approach http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387695 Transistor aging is a major reliability concern for nanoscale CMOS technology that can significantly reduce the operation lifetime of very large-scale integration chips. Negative bias temperature instability (NBTI) is a major contributor to transistor aging that affects pMOS transistors. On the other hand, leakage power is becoming a dominant factor of the total power with successive technology scaling. Since the input combinations applied to a logic core have a significant impact on both NBTI and leakage power, input vector control can be used to optimize both phenomena during idle cycles. In this paper, we present an efficient input vector selection technique based on linear programming for cooptimizing the NBTI-induced delay degradation and leakage power consumption during standby mode. Since the NBTI-induced delay degradation and leakage power are not affected by the input vector in the same direction, we provide a pareto curve based on both phenomena. A suitable point from such a pareto curve is chosen based on circuit conditions and requirements during runtime. Jung, J.; Kim, T. Statistical Viability Analysis for Detecting False Paths Under Delay Variation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387698 How long does an integrated circuit take to produce its result? To answer the question, we must tackle the difficult and complex false path detection problem first. The viability analysis is one of the most sophisticated approaches to the false path detection problem. On the other side, as the technology scales down, the gate delay variation has made a significant impact on the circuit reliability. Nevertheless, so far the previous timing analyzers have invariably used the worst-case gate delay in their false path detection algorithms, missing some important false or true path timing behavior. In this paper, we propose a solid method of viability analysis under delay variation to solve the false path detection problem under delay variation, which has never been addressed by the prior works of timing analysis. In addition to the thorough theoretical results, to cope with the runtime problem in evaluating the viability for large circuits in practice, we propose an efficient viability evaluation technique that is able to soothe the complexity of the numbers of input vectors. We tested the proposed method on ISCAS benchmark circuits and carry bypass adders under delay variation, and showed its effectiveness and usefulness on the false path aware statistical timing analysis. SYSTEM-LEVEL DESIGN Gupta, V.; Mohapatra, D.; Raghunathan, A.; Roy, K. Low-Power Digital Signal Processing Using Approximate Adders http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387646 Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders. TEST Kim, T.-Y.; Kim, T. Resource Allocation and Design Techniques of Prebond Testable 3-D Clock Tree http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387699 In 3-D stacked integrated circuit (IC) manufacturing, for the acceptable high yield, it is essential to stack only known good dies by testing the individual dies at the prebond stage. While the postbonded 3-D IC is operated by a low power 3-D clock tree, the prebond testing requires a 2-D clock tree on each die. The previous work provided a prebond testable 3-D clock tree synthesis solution by allocating through-silicon via (TSV) buffers and redundant trees with transmission gates. However, no optimizations on the allocation and design of the resources have been addressed. In this paper, we propose practically viable clock tree optimization techniques under prebond testability: 1) TSV-buffer-aware topology generation techniques that enable an economical buffer allocation by preventing (potentially ÒbadÓ) TSV buffers; 2) delay-locked loop (DLL)-based 2-D clock network design method that offers a diverse exploration of 2-D clock tree synthesis and resource allocation for prebond die testing; and 3) a new circuit design technique of transmission gates that completely removes its control line. Compared to the existing topology generation algorithms, our proposed TSV-buffer-aware topology generation uses 68%Ð88% fewer TSVs, 36%Ð58% less wire resource, and 35%Ð69% fewer buffers while consuming 17%Ð43% less clock power for the benchmark circuits, and our proposed method of clock tree exploration provides many alternative structures of a 2-D clock tree, considering the resource balance between DLLs and wires. In addition, the use of our self-controlled clock transmission gate enables a drastic reduction of the total wirelength, which amounts to 18% on average. Lien, W.-C.; Lee, K.-J.; Hsieh, T.-Y.; Chakrabarty, K.; Wu, Y.-H. Counter-Based Output Selection for Test Response Compaction http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387700 Output selection is a recently proposed test response compaction method, where only a subset of output response bits is selected for observation. It can achieve zero aliasing, full X-tolerance, and high diagnosability. One critical issue for output selection is how to implement the selection hardware. In this paper, we present a counter-based output selection scheme that employs only a counter and a multiplexer, hence involving very small area overhead and simple test control. The proposed scheme is ATPG-independent and thus can easily be incorporated into a typical design flow. Two efficient output selection algorithms are presented to determine the desired output responses, one using a single counter operation for simpler test control and the other using more counter operations for achieving a better test-response reduction ratio. Experimental results show that for stuck-at faults in large ISCAS'89 and ITC'99 benchmark circuits, 48%Ð90% reduction ratios on test responses can be achieved with only one counter and one multiplexer employed. Even better results, i.e., 76%Ð95% reductions, can be obtained for transition faults. It is also shown that the diagnostic resolution of this method is almost the same as that achieved by observing all output responses. SHORT PAPERS Lu, S.-K.; Huang, H.-H.; Huang, J.-L.; Ning, P. Synergistic Reliability and Yield Enhancement Techniques for Embedded SRAMs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6387702 Single isolated fault (SIF) stands for about 60%Ð70% of the total number of defects and is rather redundancy hungry since a spare row or a column is required for repairing each SIF. Therefore, manufacturing yield will decrease if we do not allocate sufficient spare resources. In this paper, instead of the traditional fault replacement techniques, synergistic techniques that integrate both fault replacement and fault masking techniques are proposed. With our approaches, SIFs are masked instead of the traditional replacement for repairing. For other minor fault types (e.g., faulty rows and faulty columns), the fault replacement technique is used as usual. According to simulation results, repair rates can be improved significantly. The proposed techniques can be integrated with the conventional built-in self-repair with nearly negligible hardware overhead.