September 2012 Newsletter Placing you one click away from the best new CAD research! Call for Nominations for Editor-in-Chief of the IEEE Transactions on CAD The IEEE Council on Design Automation (CEDA) invites nominations for the position of Editor-in-Chief (EiC) for the IEEE Transactions on Computer-Aided Design (TCAD) of Integrated Circuits and Systems. Self-nominations are permitted. The term of duty of the new EiC begins on January 1, 2014 and ends December 31, 2015. A period of approximately three months prior to the beginning of the term will be spent on enabling a smooth transition from the current EiC to the new EiC. A Selection Committee will evaluate all nominations and make recommendations to the CEDA Executive Committee for appointment. After the EiC is chosen, the Selection Committee will identify the Deputy EiC, in agreement with the newly appointed EiC. The EiC will then constitute an own Editorial Board of Associate Editors. Nominations should be sent electronically to Kristin Steen (admin@ieee-ceda.com) and should be received by June 27, 2013 to ensure full consideration. The nomination package should include: * Name, postal address, email address and telephone numbers of the and the publication for which the candidate is nominated. * Written confirmation that the candidate intends to accept a nomination for the EiC position. * Curriculum vitae of the nominee, including the track record with IEEE service and publications details, with particular reference to TCAD. * Position statement of the nominee indicating the view of the candidate on the future of IEEE TCAD, the opportunities and challenges, and any plans that the candidate wants to pursue. The nominee must be a member in good standing of the IEEE. Desirable qualities of the candidates include: An established track record of technical accomplishments, leadership, integrity and ethical standards, demonstrable organizational and management skills, and an energetic eagerness to continue moving the journal forward toward visibly higher levels of accomplishment and contributions to the relevant technical community. REGULAR PAPERS EMBEDDED SYSTEMS Guo, Y. ; Zhuge, Q. ; Hu, J. ; Yi, J. ; Qiu, M. ; Sha, E.H.-M. Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516677 Scratch pad memories (SPM) are attractive alternatives for caches on multicore systems since caches are relatively expensive in terms of area and energy consumption. The key to effectively utilizing SPMs on multicore systems is the data placement algorithm. In this paper, two polynomial time algorithms, regional data placement for multicore (RDPM) and regional data placement for multicore with duplication (RDPM-DUP), have been proposed to generate near-optimal data placement with minimum total cost. There is only one copy for each data in RDPM, while RDPM- DUP allows data duplication. Experimental results show that the proposed RDPM algorithm alone can reduce the time cost of memory accesses by 32.68% on average compared with existing algorithms. With data duplication, the RDPM-DUP algorithm further reduces the time cost by 40.87%. In terms of energy consumption, the proposed RDPM algorithm with exclusive copy can reduce the total cost by 33.47% on average. When RDPM-DUP is applied, the improvement increases up to 38.15% on average. EMERGING TECHNOLOGIES Amy, M. ; Maslov, D. ; Mosca, M. ; Roetteler, M. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516700 We present an algorithm for computing depth-optimal decompositions of logical operations, leveraging a meet-in- the-middle technique to provide a significant speedup over simple brute force algorithms. As an illustration of our method, we implemented this algorithm and found factorizations of commonly used quantum logical operations into elementary gates in the Clifford+T set. In particular, we report a decomposition of the Toffoli gate over the set of Clifford and T gates. Our decomposition achieves a total T-depth of 3, thereby providing a 40% reduction over the previously best known decomposition for the Toffoli gate. Due to the size of the search space, the algorithm is only practical for small parameters, such as the number of qubits, and the number of gates in an optimal implementation. HIGH-LEVEL SYNTHESIS Sarbishei, O. ; Radecka, K. On the Fixed-Point Accuracy Analysis and Optimization of Polynomial Specifications http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516678 Fixed-point accuracy analysis and optimization of polynomial data-flow graphs with respect to a reference model is a challenging task in many digital signal processing applications. Range and precision analysis are two important steps of this process to assign suitable integer and fractional bit-widths to the fixed-point variables and constant coefficients in a design such that no overflow occurs and a given error bound on maximum mismatch (MM) or mean-square-error (MSE) and signal-to-quantization-noise ratio (SQNR) is satisfied. This paper explores efficient optimization algorithms based on robust analyses of MM and MSE/SQNR for fixed-point polynomial data-flow graphs. Experimental results illustrate the robustness of our analyses and the efficiency of the optimization algorithms compared to previous work. MODELING AND SIMULATION Kim, D. ; Ciesielski, M. ; Yang, S. MULTES: Multilevel Temporal-Parallel Event-Driven Simulation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516614 Multilevel temporal-parallel event-driven simulation is a new radically different approach to simulation of designs described in Verilog HDL. It is based on a concept of time-parallel simulation applied to gate-level timing simulation. The simulation is performed in two steps: 1) fast reference simulation that runs on a higher, reference- level design model (typically RTL) and saves the design state at predetermined checkpoints; and 2) target simulation, which runs on a lower, gate-level model and distributes the simulation run slices to individual simulators. The paper addresses a number of important issues that make this approach practical: 1) finding initial state for each simulation slice; 2) resolving initial state mismatches; and 3) handling designs with multiple asynchronous clocks. Experimental results performed on industrial designs demonstrate the validity and efficiency of the method in terms of its performance and the debugging efficiency. Liu, H. ; Wong, N. Autonomous Volterra Algorithm for Steady-State Analysis of Nonlinear Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516712 We present a novel algorithm, named autonomous Volterra (AV), that achieves efficient steady-state analysis of nonlinear circuits. With elegant analytic forms and availability of efficient solvers, AV constitutes a competitive steady-state algorithm besides the two mainstreams, namely, shooting Newton (SN) and harmonic balance (HB). Nonlinear systems are first captured in nonlinear differential algebraic equations, followed by expansion into linear Volterra subsystems. A key step of steady-state analysis lies in modeling each Volterra subsystem with autonomous nonlinear inputs. The steady-state solution of these subsystems then proceeds with a series of Sylvester equation solves, completely avoiding the guesses of initial condition and time stepping as in SN, as well as the uncertain length of Fourier series as in HB. Error control in AV is also straightforward by monitoring the norms of the Sylvester equation solutions. We further demonstrate that AV is readily parallelizable with superior scalability toward large-scale problems. Abrishami, H. ; Hatami, S. ; Pedram, M. Design and Multicorner Optimization of the Energy-Delay Product of CMOS Flip-Flops Under the Negative Bias Temperature Instability Effect http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516601 With the CMOS transistors being scaled to 28 nm and lower, negative bias temperature instability (NBTI) has become a major concern due to its impact on pMOS transistor aging process and the corresponding reduction in the long-term reliability of CMOS circuits. This paper investigates the effect of NBTI phenomenon on the setup and hold times of CMOS flipÐflops. First, it is shown that the NBTI effect tightens the setup and hold timing constraints imposed on the flipÐflops in the design. Second, an efficient algorithm is introduced for characterizing codependent setup and hold time contours of the flipÐflops. Third, a multicorner optimization technique, which relies on mathematical programming to find the best transistor sizes, is presented to minimize the energy-delay product of the flipÐflops under the NBTI effect. Finally, the proposed optimization technique is applied to true single-phase clock flipÐflops to demonstrate its effectiveness. PHYSICAL DESIGN Huang, T. ; Young, E.F.Y. ObSteiner: An Exact Algorithm for the Construction of Rectilinear Steiner Minimum Trees in the Presence of Complex Rectilinear Obstacles http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516642 In this paper, we present ObSteiner, an exact algorithm for the construction of obstacle-avoiding rectilinear Steiner minimum trees (OARSMTs) among complex rectilinear obstacles. This is the first paper to propose a geometric approach to optimally solve the OARSMT problem among complex obstacles. The optimal solution is constructed by the concatenation of full Steiner trees among complex obstacles, which are proven to be of simple structures in this paper. ObSteiner is able to handle complex obstacles, including both convex and concave ones. Benchmarks with hundreds of terminals among a large number of obstacles are solved optimally in a reasonable amount of time. Kozik, A. Fully Dynamic Evaluation of Sequence Pair http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516638 In the electronic design automation field, as well as in other areas, problem instances and solutions are often subject to discrete changes. The foundational significance of efficient updates of the criterion value after dynamic updates, instead of recomputing it from scratch each time, has attracted a lot of research. In this paper, motivated by the significance of the sequence pair (SP) representation for floorplanning, we develop a fully dynamic algorithm of SP evaluation, that efficiently updates a criterion value after insertions and deletions of SP elements and after modifications of element weights. Our result is based on a new data structure for the predecessor problem, which maintains the whole history of its dynamic modifications, and the path-compression technique from the union-find problem, which efficiently support predecessor queries. Numerical experiments showed that our algorithm exhibits linear-time behavior and considerably reduces the time of SP evaluation, compared to other approaches. Athikulwongse, K. ; Yang, J.-S. ; Pan, D.Z. ; Lim, S.K. Impact of Mechanical Stress on the Full Chip Timing for Through-Silicon-Via-based 3-D ICs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516720 In this paper, we study the impact of through-silicon-via (TSV) and shallow trench isolation (STI) stress on the timing variations of 3-D IC. We also propose the first systematic TSV-STI-stress-aware timing analysis and show how to optimize layouts for better performance. First, we generate a stress contour map with an analytical radial stress model for TSV. We also develop a stress model for STI from finite element analysis results. Then, depending on geometric relation between TSVs, STI, and transistors, the tensile and compressive stresses are converted to hole and electron mobility variations. Mobility-variation-aware cell library and netlist are generated and incorporated into an industrial engine for timing analysis of 3-D IC. We observe that TSV stress and STI stress interact with each other, and rise and fall time react differently to stress and relative locations with respect to both TSVs and STIs. Overall, TSV-STI-stress-induced timing variations can be as much as +/-15% at the cell level. Thus, as an application to layout optimization, we exploit the stress-induced mobility enhancement to improve performance of 3-D ICs. We show that stress-aware layout perturbation could reduce cell delay by up to 23.37% and critical path delay by 6.67% in our test case. Chakraborty, A. ; Pan, D.Z. Skew Management of NBTI Impacted Gated Clock Trees http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516611 Negative bias temperature instability (NBTI) has emerged as the dominant failure mechanism for PMOS devices in nanometer integrated circuit (IC) designs, thus limiting their lifetime. There are several existing research works that mitigate impact of NBTI on gate delay and reliability. However, its impact on one of the most important components of modern IC designÑthe clock treeÑhas not been researched enough. Clock gating impacts the extent of NBTI-induced Vth degradation of clock buffers leading to nonuniform NBTI degradation and, thus, increased clock skew. In this paper, we propose a practical design-time technique of modifying the clock gating implementation by selecting NAND or NOR gate as output stage of integrated clock gating cells with the objective of minimizing NBTI-induced clock skew. This selection intelligently modulates the signal probability and delay equations of clock signal paths with no extra hardware penalty. We formulate the skew minimization problem as an integer linear program which determines the optimal NAND or NOR assignment of clock gating buffer. Experimental results demonstrate the effectiveness of our method as the NBTI-induced clock skew is reduced by more than 74% compared to the traditional method. The impact of voltage and temperature variation on the proposed technique was analyzed and we observed reduced but still significant reduction in clock skew under variation as compared to the traditional clock gating technique. SYSTEM-LEVEL DESIGN Yeh, Y.-F. ; Lin, H.-C. ; Huang, C.-Y. An Ultrasynchronization Checking Method With Trace-Driven Simulation for Fast and Accurate MPSoC Virtual Platform Simulation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516693 Efficiency and accuracy are two critical concerns in multiprocessor system on chip (MPSoC) virtual platform simulation. Traditional simulation approaches that achieve high efficiency usually sacrifice accuracy. On the other hand, the cycle-accurate simulation algorithms are generally slow in speed. In this paper, we propose an ultrasynchronization checking method with an efficient trace-driven simulation mechanism that can not only improve simulation speed, but also maintain simulation accuracy. We build a SystemC-based MPSoC virtual platform to evaluate the effectiveness of our approach. The experimental results show that our proposed simulation scheme can improve simulation speed up to 156x over the traditional clock-step simulation method. Furthermore, our proposed method can also guarantee the cycle-accurate simulation result. Jang, W. ; Pan, D.Z. Chemical-Mechanical Polishing-Aware Application-Specific 3D NoC Design http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516721 Three-dimensional (3D) integration with through-silicon vias (TSVs) is promising in the integration of many cores into a single chip. Network-on-chip (NoC) can efficiently manage the complicated 3D interconnections. However, irregular and dense TSV arrays used as vertical links in 3D NoC cause severe TSV height variation during silicon- thinning and chemical-mechanical polishing (CMP) processes. It may lead to TSV bonding failure between silicon layers. In this paper, we propose the first CMP-aware application-specific 3D NoC design that minimizes such TSV height variation and thus reduces the bonding failure, and meanwhile optimizes conventional NoC design objectives such as hop count, wirelength, power consumption, and area. Our 3D NoC design assigns cores to proper silicon layers, determines the 3D NoC topology, allocates routing paths, and then floorplans all cores, routers, and TSV arrays in a CMP-aware manner. The key idea behind this 3D NoC design flow is to determine the CMP-aware 3D NoC topology where TSV arrays with low and uniform metal density are inserted between adjacent layers. Experimental results show that our CMP-aware 3D NoC design achieves, on average, 17.9% lower TSV height variation, 15% lower hop count, 2.3% shorter total wirelength, and 7.8% lower power consumption than the previous state-of-the-art 3D NoC designs. VERIFICATION Hertz, S. ; Sheridan, D. ; Vasudevan, S. Mining Hardware Assertions With Guidance From Static Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516599 We present GoldMine, a methodology for generating assertions automatically in hardware. Our method involves a combination of data mining and static analysis of the register transfer level (RTL) design. The RTL design is first simulated to generate data about the design's dynamic behavior. The generated data is then mined for Òcandidate assertionsÓ that are likely to be invariants. The data mining algorithm is a decision-tree-based supervised learning algorithm. These candidate assertions are then passed through a formal verification engine to filter out the spurious candidates. The assertions that are attested as true by the formal engine are system invariants. These are then evaluated by a process of designer ranking that is provided as feedback to the data mining engine. We demonstrate the scalability of GoldMine by showing assertion generation of the RTL of Sun's OpenSparc T2 many-threaded processor. Our results show that GoldMine can generate complex, high coverage assertions for sequential as well as combinational designs in RTL, thereby minimizing human effort in this process. GoldMine assertions distill the random input stimulus space and can be used for calibrating directed tests. They can be used in a regression test suite of an evolving RTL. They are also useful in providing differing perspectives from the designer, as well as hints to designers for manually writing assertions. SHORT PAPERS Khlopotine, A.B. ; Jandhyala, V. ; Kirkpatrick, D. A Variant of Parallel Plane Sweep Algorithm for Multicore Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516597 Parallel algorithms used in VLSI physical design bring significant challenges for their efficient and effective design and implementation. The rectangle intersection problem is a subset of the plane sweep problem, a topic of computational geometry and a component in design rule checking, parasitic resistanceÐ capacitance extraction, and mask processing flows. A variant of a plane sweep algorithm that is embarrassingly parallel and therefore easily scalable on multicore machines and clusters, while exceeding the best-known parallel plane sweep algorithms on real-world tests, is presented in this letter. Chang, C.-Y. ; Liao, K.-Y. ; Hsu, S.-C. ; Li, J.C.-M. ; Rau, J.-C. Compact Test Pattern Selection for Small Delay Defect http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516596 This letter proposes an algorithm that selects a small number of test patterns for small delay defects from a large N- detect test set. This algorithm uses static upper and lower bound analysis to quickly estimate the sensitized path length so that the central processing unit (CPU) time can be reduced. By ignoring easy faults, only a partial fault dictionary, instead of a complete fault dictionary, is built for test pattern selection. Experimental results on large International Test Conference benchmark circuits show that, with very similar quality, the selected test set is 46% smaller and the CPU time is 42% faster than that of timing-aware automated test pattern generation (ATPG). With the proposed selection algorithm, small delay defect test sets are no longer very expensive to apply.