December 2011 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-12.txt REGULAR PAPERS EMBEDDED SYSTEMS Yang, S. Khursheed, S. Al-Hashimi, B. M. Flynn, D. Idgunji, S. Reliable State Retention-Based Embedded Processors Through Monitoring and Recovery http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071090 State retention power gating and voltage-scaled state retention are two effective design techniques, commonly employed in embedded processors, for reducing idle circuit leakage power. This paper presents a methodology for improving the reliability of embedded processors in the presence of power supply noise and soft errors. A key feature of the method is low cost, which is achieved through reuse of the scan chain for state monitoring, and it is effective because it can correct single and multiple bit errors through hardware and software, respectively. To validate the methodology, ARM¨ Cortexª-M0 embedded microprocessor (provided by our industrial project partner) is implemented in field-programmable gate array and further synthesized using 65-nm technology to quantify the cost in terms of area, latency, and energy. It is shown that the proposed methodology has a small area overhead (8.6%) with less than 4% worst-case increase in critical path and is capable of detecting and correcting both single bit and multibit errors for a wide range of fault rates. EMERGING TECHNOLOGIES Huang, T.-W. Yeh, S.-Y. Ho, T.-Y. A Network-Flow Based Pin-Count Aware Routing Algorithm for Broadcast-Addressing EWOD Chips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071082 Electrowetting-on-dielectric (EWOD) chips have emerged as the most widely used actuators for digital microfluidic (DMF) systems. These devices enable the electrical manipulation of microfluidics with various advantages, such as low power consumption, flexibility, accuracy, and efficiency. In addressing the need for low-cost and practical fabrication, pin-count reduction has become a key problem to the large-scale integration of EWOD-chip designs. One of the major approaches, broadcast addressing, reduces the pin count by assigning a single control pin to multiple electrodes with mutually compatible control signals. Most previous studies utilize this addressing scheme by scheduling fluidic-level synthesis on pin-constrained chip arrays. However, the associated interconnect routing problem is still not provided in currently available DMF automations, and thus the broadcast-addressing scheme cannot be actually realized. In this paper, we present the first network-flow based pin-count aware routing algorithm for EWOD-chip designs with a broadcast electrode-addressing scheme. Our algorithm simultaneously takes pin- count reduction and wirelength minimization into consideration for higher integration and better design performance. Experimental results show the effectiveness and scalability of our algorithm on a set of real-life chip applications. FPGAs AND RECONFIGURABLE COMPUTING Kim, K. Shin, S. Kang, S.-M. Field Programmable Stateful Logic Array http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071085 Recently, researchers have demonstrated that memristive switches can be used to implement logic and latches as well as memory and programmable interconnects. In this paper, we propose a novel stateful logic pipeline architecture based on memristive switches. The proposed architecture mapped to the field programmable nanowire interconnect fabric produces a field programmable stateful logic array, in which general-purpose computation functions can be implemented by configuring only nonvolatile nanowire crossbar switches. CMOS control switches are used to isolate stateful logic units so that multiple operations can be executed in parallel. Since basic operation of the stateful logic, namely, material implication, cannot fan out, a new basic AND operation which can duplicate output is proposed. The basic unit of the proposed architecture is designed to execute multiple basic operations concurrently in a step so that each basic unit implements a large fan-in OR or NOR gate. The fine-grain ultradeep constant-throughput pipeline properties pose new design automation problems. We address some of the issues, in particular logic representation using OR-inverter graphs, two-level optimization synthesis strategy, data synchronization with data forwarding, stall-free pipelined finite state machines, and constraints for synthesis and mapping onto the fabric. MODELING AND SIMULATION Zhang, W. Li, X. Liu, F. Acar, E. Rutenbar, R. A. Blanton, R. D. Virtual Probe: A Statistical Framework for Low-Cost Silicon Characterization of Nanoscale Integrated Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071091 In this paper, we propose a new technique, referred to as virtual probe (VP), to efficiently measure, characterize, and monitor spatially-correlated inter-die and/or intra-die variations in nanoscale manufacturing process. VP exploits recent breakthroughs in compressed sensing to accurately predict spatial variations from an exceptionally small set of measurement data, thereby reducing the cost of silicon characterization. By exploring the underlying sparse pattern in spatial frequency domain, VP achieves substantially lower sampling frequency than the well-known Nyquist rate. In addition, VP is formulated as a linear programming problem and, therefore, can be solved both robustly and efficiently. Our industrial measurement data demonstrate the superior accuracy of VP over several traditional methods, including 2-D interpolation, Kriging prediction, and k-LSE estimation. Ionutiu, R. I. J. Rommes, J. Schilders, W. H. A. SparseRC: Sparsity Preserving Model Reduction for RC Circuits With Many Terminals http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071083 A novel model order reduction (MOR) method, SparseRC, for multiterminal RC circuits is proposed. Specifically tailored to systems with many terminals, SparseRC employs graph-partitioning and fill-in reducing orderings to improve sparsity during model reduction, while maintaining accuracy via moment matching. The reduced models are easily converted to their circuit representation. These contain much fewer nodes and circuit elements than otherwise obtained with conventional MOR techniques, allowing faster simulations at little accuracy loss. PHYSICAL DESIGN Ozdal, M. M. Hentschke, R. F. An Algorithmic Study of Exact Route Matching for Integrated Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071088 As system-on-chip designs are getting more popular, the importance of design automation for analog and mixed- signal integrated circuits is increasing. In this paper, we study the problem of exact route matching, which is an important physical design constraint commonly imposed on specific analog signals for the purpose of correct functionality. For this, we first propose a mathematical formulation that models the route matching problem exactly. Based on this formulation, we derive important theoretical conclusions, and propose dynamic-programming and heuristic search algorithms to solve the min-cost route matching problem. We also discuss various practical considerations related to this problem. Our experimental results show the effectiveness of our algorithms. Chuang, Y.-L. Kim, S. Shin, Y. Chang, Y.-W. Pulsed-Latch Aware Placement for Timing-Integrity Optimization http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071081 Utilizing pulsed-latches in circuit designs is one emerging solution to timing improvements. Pulsed-latches, driven by a brief clock signal generated from pulse generators, possess superior design parameters over flip-flops. If the pulse generator and pulsed-latches are not placed properly, however, pulse-width degradations at pulsed-latches and thus timing violations might occur. In this paper, we present a unified placement framework for pulsed-latches to maintain the timing integrity. Our new placer has the following distinguished features: 1) a multilevel analytical placement framework to effectively prevent the potential pulse-width distortion problem; 2) a physical-location aware pulse-generator insertion algorithm to identify each desired group of a pulse generator and latches; and 3) a new optimization gradient for global placement to consider the impact of load capacitance of generators. Experimental results show that our placement flow can effectively consider pulse-width integrity and thus achieve much smaller total/worst negative slacks with marginal wirelength overheads, compared to a leading commercial and an academic placement flows. Lin, M. P.-H. Hsu, C.-C. Chang, Y.-T Post-Placement Power Optimization With Multi-Bit Flip-Flops http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071086 Optimization for power is always one of the most important design objectives in modern nanometer integrated circuit design. Recent studies have shown the effectiveness of applying multi-bit flip-flops to save the power consumption of the clock network. This paper presents: 1) a novel design methodology of applying multi-bit flip- flops at the post-placement stage, which can be seamlessly integrated in modern design flow; 2) a new problem formulation for post-placement optimization with multi-bit flip-flops; 3) flip-flop clustering and placement algorithms to simultaneously minimize flip-flop power consumption and interconnecting wirelength; and 4) a progressive window-based optimization technique to reduce placement deviation and improve runtime efficiency of our algorithms. Experimental results show that our algorithms are very effective in reducing not only flip-flop power consumption but also clock tree and signal net wirelength. Consequently, the power consumption of the clock network is minimized. SYSTEM-LEVEL DESIGN Sabry, M. M. Coskun, A. K. Atienza, D. Rosing, T. S. Brunschwiler, T. Energy-Efficient Multiobjective Thermal Control for Liquid-Cooled 3-D Stacked Architectures http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071089 3-D stacked systems reduce communication delay in multiprocessor system-on-chips (MPSoCs) and enable heterogeneous integration of cores, memories, sensors, and RF devices. However, vertical integration of layers exacerbates temperature-induced problems such as reliability degradation. Liquid cooling is a highly efficient solution to overcome the accelerated thermal problems in 3-D architectures; however, it brings new challenges in modeling and run-time management for such 3-D MPSoCs with multitier liquid cooling. This paper proposes a novel design-time/run-time thermal management strategy. The design-time phase involves a rigorous thermal impact analysis of various thermal control variables. We then utilize this analysis to design a run-time fuzzy controller for improving energy efficiency in 3-D MPSoCs through liquid cooling management and dynamic voltage and frequency scaling (DVFS). The fuzzy controller adjusts the liquid flow rate dynamically to match the cooling demand of the chip for preventing overcooling and for maintaining a stable thermal profile. The DVFS decisions increase chip-level energy savings and help balance the temperature across the system. Our controller is used in conjunction with temperature-aware load balancing and dynamic power management strategies. Experimental results on 2-tier and 4-tier 3-D MPSoCs show that our strategy prevents the system from exceeding the given threshold temperature. At the same time, we reduce cooling energy by up to 63% and system-level energy by up to 21% in comparison to statically setting a flow rate setting to handle worst-case temperatures. Kao, Y.-H. Yang, M. Artan, N. S. Chao, H. J. CNoC: High-Radix Clos Network-on-Chip http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071084 Many high-radix network-on-chip (NoC) topologies have been proposed to improve network performance with an ever-growing number of processing elements (PEs) on a chip. We believe high-radix Clos network-on-chip (CNoC) is the most promising with its low average hop counts and good load-balancing characteristics. In this paper, we propose: 1) a high-radix router architecture with virtual output queue (VOQ) buffer structure and packet mode dual round-robin matching (PDRRM) scheduling algorithm to achieve high speed and high throughput in CNoC; 2) the design of hierarchical round-robin arbiter for high-radix high-speed NoC routers; and 3) a heuristic floor-planning algorithm to minimize the power consumption caused by the long wires. Experimental results show that the throughput of a 64-node three-stage CNoC under uniform traffic increases from 62% to 78% by replacing the baseline virtual channel routers with PDRRM VOQ routers. We also compared the delay, power, and area performance of the 64-node CNoC with other NoC topologies under various synthetic traffic patterns and SPLASH- 2 benchmark traces. The simulation results show that in general CNoC improves the throughput, low-load delay, and energy efficiency over the compared NoC topologies. TEST Arumi, D. Rodriguez-Montanes, R. Figueras, J. Eichenberger, S. Hora, C. Kruseman, B. Diagnosis of Interconnect Full Open Defects in the Presence of Fan-Out http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071080 The development of accurate diagnosis methodologies is important to identify process problems and achieve fast yield improvement. As open defects are becoming dominant in some CMOS technologies, their accurate diagnosis is key to improving the quality of new very large-scale integrated circuits. Widely used interconnect full open diagnosis procedures are based on the assumption that neighboring lines determine the voltage of the defective line. However, this assumption decreases the diagnosis efficiency for opens in interconnect lines with fan-out, where the influence of transistor capacitances is significant. This paper presents a diagnosis methodology for interconnect full open defects which considers and models the impact of transistor parasitic capacitances on the defective node accurately. The methodology is able to properly diagnose interconnect opens with fan-out even in the presence of Byzantine behavior. Diagnosis results for real defective devices from different technology nodes are also provided. Ma, J. Tehranipoor, M. Layout-Aware Critical Path Delay Test Under Maximum Power Supply Noise Effects http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071087 As technology shrinks, gate sensitivity to noise increases due to supply voltage scaling and limited scaling of the voltage threshold. As a result, power supply noise (PSN) plays a greater role in sub-100 nm technologies and creates signal integrity issues. It is vital to consider supply voltage noise effects: 1) during design validation to apply sufficient guardbands to critical paths, and 2) during path delay test to ensure the performance and reliability of the chip. In this paper, a novel layout-aware pattern generation procedure is proposed to maximize PSN effects on critical paths considering the impact of local voltage drop. The proposed pattern generation and validation flow is implemented on the ITC'99 b19 benchmark. Experimental results for both wire-bond and flip-chip packaging styles are presented. Results demonstrate that our proposed method is fast, significantly increases switching around the functionally testable critical paths, and induces large voltage drop on cells placed on the critical paths which results in increased path delay. The proposed method eliminates the very time consuming pattern validation phase that is practised in industry.