August 2013 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-08.txt NEW EDITORIAL TEAM TO TAKE THE HELM IN 2014 CEDA is pleased to announce that Professor Vijaykrishnan Narayanan of Penn State University has been appointed as the next Editor-in-Chief of the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) for a two-year term starting January 2014. Professor Narayanan is an expert in the areas of power-aware and reliable systems, embedded systems, reconfigurable architectures, nanoarchitectures, and computer architecture. He has published over 350 papers with an h-index of 55, has received multiple awards for his research, and is a Fellow of the IEEE. Dr. Charles J. Alpert, Manager of the Design Productivity group at the IBM Austin Research Laboratory, will also serve on the leadership team of TCAD as Deputy Editor-in-Chief. Dr. Alpert has published extensively in the area of physical design. He holds nearly 60 patents and has received three DAC Best Paper awards. He currently serves as Technical Program Cochair for DAC and as Chair of the Design Automation Technical Committee, and is a Fellow of the IEEE. Interested in volunteering for TCAD? The incoming leadership team will be working on formulating the new editorial board over the next few months. If you are interested in serving as an Associate Editor, please send a recent CV to Professor Narayanan at vijay@cse.psu.edu before October 1, 2013. REGULAR PAPERS EMBEDDED SYSTEMS Hsieh, J.-W. ; Zheng, Y.-C. ; Peng, Y.-S. ; Yeh, P.-H. VAST: Virtually Associative Sector Translation for MLC Storage Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559102 In recent years, multilevel cell Flash memory (MLC), which stores two or more bits per cell, has gradually replaced single-level cell flash memory due to its lower cost and higher density. However, MLC also brings new constraints, i.e., no partial programming and sequential page writes within a block, to the management. This paper proposes a virtual log-block-based hybrid-mapping scheme, referred to as virtually associative sector translation (VAST), for MLC storage systems. Unlike traditional hybrid-mapping schemes, VAST is a combination of block-level and segment-level mappings and manages log blocks in a flexible manner. The goals of our research are to avoid timeout by decreasing dummy-page writes, to get a better response time by decreasing live-page copies, and to prolong the life span of flash memory by decreasing total block erasures. Our trace-driven simulation shows that VAST could reduce up to 90% of dummy-page writes, 22%Ð52% of live-page copies, and 55%Ð83% of block erasures, compared to well-known hybrid-mapping schemes. EMERGING TECHNOLOGIES Chen, Y.-H. ; Hsu, C.-L. ; Tsai, L.-C. ; Huang, T.-W. ; Ho, T.-Y. A Reliability-Oriented Placement Algorithm for Reconfigurable Digital Microfluidic Biochips Using 3-D Deferred Decision Making Technique http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559249 In recent studies, digital microfluidic biochips (DMFBs) have been a promising solution for lab-on-a-chip and bio- assay experiments because of their flexible application and low fabrication cost. However, the reliability problem is an imperative issue to guarantee the valid function of DMFBs. The reliability of DMFBs decreases when electrodes are excessively actuated, preventing droplets on DMFBs controlled successfully. Because the placement for bio- assays in DMFBs is a key step in generating corresponding actuating signals, the reliability of DMFBs must be considered during biochip placement to avoid excessive actuation. Although researchers have proposed several DMFB placement algorithms, they have failed to consider the reliability issue. In addition, previous algorithms were all based on the simulated-annealing (SA) method, which is time consuming and does not guarantee to obtain an optimal solution. This paper proposes the first reliability-oriented non-SA placement algorithm for DMFBs. This approach considers the reliability problem during placement, and uses the 3-D deferred decision making (3D-DDM) technique to enumerate only possible placement solutions. Large-scale DMFB placement can be synthesized efficiently by partitioning the operation sequential graph of bioassays. Experimental results demonstrate that the proposed technique can achieve reliability-oriented placement for DMFBs without excessive actuation in each electrode, while optimizing bioassay completion time. LOGIC SYNTHESIS Choudhury, M.R. ; Mohanram, K. Low Cost Concurrent Error Masking Using Approximate Logic Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558854 With technology scaling, logical errors arising due to single-event upsets and timing errors arising due to dynamic variability effects are increasing in logic circuits. Existing techniques for online resilience to logical and timing errors are limited to detection of errors, and often result in significant performance penalty and high area/power overhead. This paper proposes approximate logic circuits as a design approach for low cost concurrent error masking. An approximate logic circuit predicts the value of the outputs of a given logic circuit for a specified portion of the input space, and can indicate uncertainty about the outputs over the rest of the input space. Using portions of the input space that are most vulnerable to errors as the specified input space, we show that approximate logic circuits can be used to provide low overhead concurrent error masking support for a given logic circuit. We describe efficient algorithms for synthesizing approximate circuits for concurrent error masking of logical and timing errors. Results indicate that concurrent error masking based on approximate logic circuits can mask 88% of targeted logical errors for 34% area overhead and 17% power overhead, 100% timing errors on all timing paths within 10% of the critical path delay for 23% area overhead and 8% power overhead, and 100% timing errors on all timing paths within 20% of the critical path delay for 42% area overhead and 26% power overhead. MODELING AND SIMULATION Burke, D. ; Smy, T. Thermal Models for Optical Circuit Simulation Using a Finite Cloud Method and Model Reduction Techniques http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558888 This paper presents a procedure for the creation of versatile and powerful thermal compact models of integrated optical devices and demonstrates their use in an optical circuit level simulator. A detailed 3-D model of the device is first built using a meshless finite cloud method, producing a large linear sparse set of equations. This model is then reduced to a compact representation using a Krylov subspace model reduction (MR) technique. Such a reduced model is described by small dense matrices, but can reproduce the original temperature distribution within acceptable error. Three devices are used as demonstration models for the technique: a microdisc laser and two microring-based devices, a modulator, and an optical switch. All three devices are built in a silicon on oxide platform. Using MR the linear systems describing these models are reduced from thousands of unknowns to systems with less than 100 reduced variables. It is then demonstrated how the reduced compact models can be linked together to describe a complete optical system with solution errors of lower than 1%. Finally, it is shown how this reduced thermal model can be utilized in a circuit level opto-electronic circuit simulator and simulations are presented, demonstrating the effectiveness of the reduced models in speeding up simulation times or enabling otherwise intractable problems. Ardestani, E.K. ; Mesa-Martinez, F.J. ; Southern, G. ; Ebrahimi, E. ; Renau, J. Sampling in Thermal Simulation of Processors: Measurement, Characterization, and Evaluation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6560454 Power densities in modern processors induce thermal issues that limit performance. Power and thermal models add complexity to architectural simulators, limiting the depth of analysis. Prohibitive execution time overheads may be circumvented using sampling techniques. While these approaches work well when characterizing processor performance, they introduce new challenges when applied to the thermal domain. This paper aims to improve the accuracy and performance of sampled thermal simulation at the architectural level. To the best of our knowledge, this paper is the first to evaluate the impact of statistical sampling on thermal metrics through direct temperature measurements performed at runtime. Experiments confirm that sampling can accurately estimate certain thermal metrics. However, extra consideration needs to be taken into account to preserve the accuracy of temperature estimation in a sampled simulation. Mainly because, on average, thermal phases are much longer than performance phases. Based on these insights, we introduce a framework that extends statistical sampling techniques, used at the performance and power stages, to the thermal domain. The resulting technique yields an integrated performance, power, and temperature simulator that maintains accuracy, while reducing simulation time by orders of magnitude. In particular, this paper shows how dynamic frequency and voltage adaptations can be evaluated in a statistically sampled simulation. We conclude by showing how the increased simulation speed benefits architects in the exploration of the design space. Yakopcic, C. ; Taha, T.M. ; Subramanyam, G. ; Pino, R.E. Generalized Memristive Device SPICE Model and its Application in Circuit Design http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558886 This paper presents a SPICE model for memristive devices. It builds on existing models and is correlated against several published device characterization data with an average error of 6.04%. When compared to existing alternatives, the proposed model can more accurately simulate a wide range of published memristors. The model is also tested in large circuits with up to 256 memristors, and was less likely to cause convergence errors when compared to other models. We show that the model can be used to study the impact of memristive device variation within a circuit. We examine the impact of nonuniformity in device state variable dynamics and conductivity on individual memristors as well as a four memristor read/write circuit. These studies show that the model can be used to predict how variation in a memristor wafer may impact circuit performance. PHYSICAL DESIGN Brenner, U. BonnPlace Legalization: Minimizing Movement by Iterative Augmentation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559150 We describe BonnPlaceLegal, an algorithm for VLSI placement legalization. Based on a minimum-cost flow algorithm that iteratively augments flows along paths, our approach ensures that only augmentations are considered that can be realized exactly by cell movements. Hence, this method avoids realization problems that are inherent to previous flow-based legalization algorithms. As a result, it combines the global perspective of minimum-cost flow approaches with the efficiency of local search algorithms. The tool is mainly designed to minimize total and maximum cell movement, but it is flexible enough to optimize other objective functions provided that the effect of single cell movements on them can be estimated efficiently. We compare our approach to legalization tools from industry and academia by experiments on dense recent real-world designs and public benchmarks. The results show that we are much faster and produce significantly better results in terms of average (linear and quadratic) and maximum movement than any other tool. The experiments also demonstrate that by minimizing squared movement we also produce a smaller increase in net length than the other tools. Xiao, Z. ; Du, Y. ; Zhang, H. ; Wong, M.D.F. A Polynomial Time Exact Algorithm for Overlay-Resistant Self-Aligned Double Patterning (SADP) Layout Decomposition http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558884 Double patterning lithography (DPL) technologies have become a must for today's sub-32 nm technology nodes. Currently, there are two leading DPL technologies: self-aligned double patterning (SADP) and litho-etch-litho-etch (LELE). Among them, SADP has the significant advantage over LELE in its ability to avoid overlay, making it the likely DPL candidate for the next technology node of 14 nm. In any DPL technology, layout decomposition is the key problem. While the layout decomposition problem for LELE has been well studied in the literature, only a few attempts have been made to address the SADP layout decomposition problem. In this paper, we present a polynomial time exact (optimal) algorithm to determine if a given layout has SADP decompositions that do not have any overlay at specified critical edges. The previous approaches tried to minimize the total overlay of a given layout, which may be a problematic objective. Furthermore, all previous exact algorithms were computationally expensive exponential time algorithms based on SAT or ILP. Other previous algorithms for the problem were heuristics without having any guarantee that an overlay-free solution can be found even if one exists. SYSTEM-LEVEL DESIGN van Stralen, P. ; Pimentel, A. Fitness Prediction Techniques for Scenario-Based Design Space Exploration http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559130 Modern embedded systems are becoming increasingly multifunctional. The dynamism in multifunctional embedded systems manifests itself with more dynamic applications and the presence of multiple applications executing on a single embedded system. This dynamism in the application workload must be taken into account during the early system-level design space exploration (DSE) of multiprocessor system-on-a-chip (MPSoC)-based embedded systems. Scenario-based DSE utilizes the concept of application scenarios to search for optimal mappings of a multi- application workload onto an MPSoC. The scenario-based DSE uses a multi-objective genetic algorithm (GA) to identifying the mapping with the best average quality for all the application scenarios in the workload. In order to keep the exploration of the scenario-based DSE efficient, fitness prediction is used to obtain the quality of a mapping. This fitness prediction is performed using a representative subset of application scenarios that is obtained using co-exploration of the scenario subset space. In this paper, multiple fitness prediction techniques are presented: stochastic, deterministic, and a hybrid combination. Results show that, for our test cases, accurate fitness prediction is already provided for subsets containing only 1Ð4% of the application scenarios. Larger subsets will obtain a similar accuracy, but the DSE will require more time to identify promising mappings that meet the requirements of multifunctional embedded systems. TEST Lien, W.-C. ; Lee, K.-J. ; Hsieh, T.-Y. ; Ang, W.-L. An Efficient On-Chip Test Generation Scheme Based on Programmable and Multiple Twisted-Ring Counters http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559094 Twisted-ring-counters (TRCs) have been used as built-in test pattern generators for high-performance circuits due to their small area overhead, low performance impact and simple control circuitry. However, previous work based on a single, fixed-order TRC often requires long test time to achieve high fault coverage and large storage space to store required control data and TRC seeds. In this paper, a novel programmable multiple-TRC-based on-chip test generation scheme is proposed to minimize both the required test time and test data volume. The scan path of a circuit under test is divided into multiple equal-length scan segments, each converted to a small-size TRC controlled by a programmable control logic unit. An efficient algorithm to determine the required seeds and the control vectors is developed. Experimental results on ISCAS'89, ITC'99 and IWLS'05 benchmark circuits show that, on average, the proposed scheme using only a single programmable TRC design can achieve 35.58%Ð98.73% reductions on the number of test application cycles with smaller storage data volume compared with previous work. When using more programmable TRC designs, 83.60%Ð99.59% reductions can be achieved with only slight increase on test data volume. Huang, S.-Y. ; Lin, Y.-H. ; Huang, L.-R. ; Tsai, K.-H. ; Cheng, W.-T. Programmable Leakage Test and Binning for TSVs With Self-Timed Timing Control http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559087 Leakage tests have been a challenge for through-silicon vias (TSVs) in a 3-D IC. Most existing methods are still inadequate in terms of the range of testable leakage currents. In this paper, we borrow the wisdom of the IO-pin leakage test while enhancing it with two features. First, we make it more suitable for a TSV, which has a much smaller capacitance than an IO pin. Second, we support a wide range of leakage test (e.g., from 0.125 uA to 16 uA), and thereby allowing for flexible test threshold setting and leakage characterization. To achieve this goal, we present two sets of techniquesÑ1) wait-time generation by programmable delay line, and 2) wait-time propagation with a self-timed timing control scheme to overcome the timing skew problem due to signal routing. We demonstrate that the entire scheme can be done in only logic gates, making it easy to integrate into the common design flow. VERIFICATION Mitra, S. ; Banerjee, A. ; Dasgupta, P. ; Ghosh, P. ; Kumar, H. Formal Guarantees for Localized Bug Fixes http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6558891 Bug traces produced in simulation serve as the basis for patching the RTL code in order to fix a bug. It is important to prove that the patch covers all instances of the bug scenario; otherwise, the bug may return with a different valuation of the variables involved in the bug scenario. For large circuits, formal methods do not scale well enough to comprehensively eliminate the bug, and achieving adequate coverage in simulation and regression testing becomes expensive. This paper proposes formal methods for analyzing the control trace leading to the observed manifestation of the bug and verifying the robustness of the bug fix with respect to that control trace. We propose a classification of the bug fix based on the guarantee that our analysis can provide about the quality of the bug fix. Our method also prescribes the types of tests that are recommended to validate the bug fix on other types of scenarios. Since our methods are more scalable by orders of magnitude than model checking the entire design, we believe that the proposed formal methods hold immense promise in analyzing bug fixes in practice. SHORT PAPERS Kahng, A.B. ; Kang, S. ; Rosing, T.S. ; Strong, R. Many-Core Token-Based Adaptive Power Gating http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6559076 Among power dissipation components, leakage power has become more dominant with each successive technology node. Leakage energy waste can be reduced by power gating. In this paper, we extend token-based adaptive power gating (TAP), a technique to power gate an actively executing core during memory accesses, to many-core Chip Multi-Processors (CMPs). TAP works by tracking every system memory request and its estimated time of arrival so that a core may power gate itself without performance or energy loss. Previous work on TAP shows several benefits compared to earlier state-of-the-art techniques , including zero performance hit and 2.58 times average energy savings for out-of-order cores. We show that TAP can adapt to increasing memory contention by increasing power- gated time by 3.69 times compared to a low memory-pressure case. We also scale TAP to many-core architectures with a distributed wake-up controller that is capable of supporting staggered wake-ups and able to power gate each core for 99.07% of the time, achieved by a non-scalable centralized scheme.