June 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-06.txt Announcement for a sister publication IEEE Design & Test Magazine is looking for a new Editor-In-Chief http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6198427 REGULAR PAPERS EMERGING TECHNOLOGIES Zhao, Y.; Chakrabarty, K. Cross-Contamination Avoidance for Droplet Routing in Digital Microfluidic Biochips http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200446 Recent advances in digital microfluidics have enabled droplet-based biochip devices for DNA sequencing, immunoassays, clinical chemistry, and protein crystallization. Since cross-contamination between droplets of different biomolecules can lead to erroneous outcomes for bioassays, the avoidance of cross-contamination during droplet routing is a key design challenge for biochips. We propose a droplet-routing method that avoids cross-contamination in the optimization of droplet flow paths. The proposed approach targets disjoint droplet routes and synchronizes wash-droplet routing with functional droplet routing, in order to reduce the duration of droplet routing while avoiding the cross-contamination between different droplet routes. In order to avoid cross-contamination between successive routing steps, an optimization technique is used to minimize the number of wash operations that must be used between successive routing steps. Two real-life biochemical applications are used to evaluate the proposed droplet-routing methods. LOGIC SYNTHESIS Zhu, X.-Y.; Basten, T.; Geilen, M.; Stuijk, S. Efficient Retiming of Multirate DSP Algorithms http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200447 Multirate digital signal processing (DSP) algorithms are often modeled with synchronous dataflow graphs (SDFGs). A lower iteration period implies a faster execution of a DSP algorithm. Retiming is a simple but efficient graph transformation technique for performance optimization, which can decrease the iteration period without affecting functionality. In this paper, we deal with two problems: feasible retiming-retiming a SDFG to meet a given iteration period constraint, and optimal retiming-retiming a SDFG to achieve the smallest iteration period. We present a novel algorithm for feasible retiming and based on that one, a new algorithm for optimal retiming, and prove their correctness. Both methods work directly on SDFGs, without explicitly converting them to their equivalent homogeneous SDFGs. Experimental results show that our methods give a significant improvement compared to the earlier methods. MODELING AND SIMULATION Lee, M.-S. M.; Liao, W.-T.; Liu, C.-N. J. Levelized High-Level Current Model of Logic Blocks for Dynamic Supply Noise Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200439 Since the problem of power integrity has become a critical issue that limits design performance, obtaining the supply current waveforms at early design stages is essential to achieve efficient reduction of supply noise. Therefore, a high-level current macro model is proposed by Bodapati and Najm for logic blocks to provide fast current waveform estimation at register-transfer level (RTL). However, due to the different arrival time of internal signals, modeling the supply current of the entire logic block accurately as specific fixed templates is difficult. This paper thus proposes a levelized high-level current model for logic blocks. By merging gates with similar arrival time as a super-gate and recording its current waveforms separately, obtaining more accurate supply current waveforms is possible by using a unified model, even for multipeak cases. This paper also proposes a frequency-domain waveform transformation method to consider the effects of nonideal supply resistance on the supply current waveform. As shown in the experimental results, the peak error and waveform correlation of the proposed current model are significantly improved compared to the results of the single-stage current model. Using accurate supply current waveforms can also help obtain precise IR-drop estimation in RTL simulations for early system evaluation. Badami, K. M. H.; Karmalkar, S. Quasi-Static Compact Model for Coupling Between Aligned Contacts on Finite Substrates With Insulating or Conducting Backplanes http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200437 We derive a closed-form quasi-static model for the coupling impedance between aligned coplanar rectangular contacts on bulk and epitaxial semiconductor substrates by resolving the 3-D field lines into simpler components (vertical, lateral, fringing, 2-D, etc.). Both insulating and conducting (grounded or floating) backplane conditions are considered. Our model reflects all the geometry and process parameters, and its constants are process independent and universal. The model also gives the capacitive coupling via ambient, i.e., via the region outside the substrate, and specifies conditions under which a given thickness or lateral extension of the substrate can be regarded as infinite. Comparisons with technology computer-aided design simulations and measurements validate the model over a wide range of width/length and width/separation ratios of the contacts. PHYSICAL DESIGN Qian, H.; Restle, P. J.; Kozhaya, J. N.; Gunion, C. L. Subtractive Router for Tree-Driven-Grid Clocks http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200441 A tree-driven clock grid has become the choice of clock delivery for most microprocessors, due to its ability to achieve lower skew and lower variability than clock trees, and is becoming the choice of clock delivery for certain high-end application-specific integrated circuit designs. This paper reports on a clock routing tool that was used in designing multiple tree-driven clock grids in a 2.3 GHz processor system-on-chip, which achieved below 5 ps skew within 500 $mu{rm m}$ Manhattan distance and below 10 ps skew across each clock grid. This clock routing tool employs a nonsequential algorithm comprised of linear programming and combinatorial heuristics. Its robust length-matching capability enables flexible buffer placement, improved clock signal quality, and robustness to variations. Lin, C.-W.; Lee, P.-W.; Chang, Y.-W.; Shen, C.-F.; Tseng, W.-C. An Efficient Pre-Assignment Routing Algorithm for Flip-Chip Designs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200434 The flip-chip package is introduced for modern integrated circuit (IC) designs with higher integration density and larger I/O counts. In this paper, we consider the pre-assignment flip-chip routing problem with predefined connections between driver pads and bump pads. This problem has been shown to be much more difficult than the free-assignment one, but is more popular in real-world designs because the connections between driver pads and bump pads are typically predetermined by IC or packaging designers. Based on the concept of routing sequence exchange, we propose a very efficient approach to guide the global routing by computing the longest common subsequence and the maximum planar subset of chords for pre-assignment flip-chips. We observe that the existing work over-constrains the capacity of a routing tile, which might miss some critical solution space with a better routing solution (e.g., smaller wirelength), and provide a remedy for this insufficiency to identify a better solution in a more complete solution space. We also develop a constant-time routability analyzer to check if a given set of wires can pass through a tile. Experimental results show that our router can achieve a $125times$ speedup with even better solution quality (same routability with slightly smaller wirelength), compared with a state-of-the-art flip-chip router based on integer linear programming. SYSTEM-LEVEL DESIGN Ren, P.; Lis, M.; Cho, M. H.; Shim, K. S.; Fletcher, C. W.; Khan, O.; Zheng, N.; Devadas, S. HORNET: A Cycle-Level Multicore Simulator http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200443 We present hornet, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on six separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed $12times$. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, parameters driving power, and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple dimension-ordered routing to complex Valiant, ROMM, O1Turn or PROM schemes, BSOR, and adaptive routing. Hornet can run in network-only mode using synthetic traffic or traces, or directly emulate a MIPS-based multicore. Hornet is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/. Cui, J.; Maskell, D. L. A Fast High-Level Event-Driven Thermal Estimator for Dynamic Thermal Aware Scheduling http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200435 Thermal aware scheduling (TAS) is an important system level optimization for many-core systems. A fast event driven thermal estimation method, which includes both the dynamic and leakage power models, for monitoring temperature and guiding dynamic TAS (DTAS) is proposed in this paper. The fast event driven thermal estimation is based upon a thermal map, with occasional thermal sensor-based calibration, which is updated only when a high level event occurs. To minimize the overhead, while maintaining the estimation accuracy, prebuilt look-up-tables and predefined leakage calibration parameters are used to speed up the thermal solution. Experimental results show our method is accurate, producing thermal estimations of similar quality to an existing open-source thermal simulator, while having a considerably reduced computational complexity. Based on this predictive approach, we take full advantage of a projected future thermal map to develop several heuristic policies for DTAS. We show that our proposed predictive policies are significantly better, in terms of minimizing average/peak temperature, reducing the dynamic thermal management overhead and improving other real-time features, than existing DTAS schedulers, making them highly suitable for heuristically guiding thermal aware task allocation and scheduling. TEST Tam, W. C.; Blanton, R. D. SLIDER: Simulation of Layout-Injected Defects for Electrical Responses http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200444 Logic-level simulation has been the de facto method for simulating defect/faulty behavior for various testing tasks since it offers a good tradeoff between accuracy and speed. Unfortunately, by abstracting defect behavior to the logic level (i.e., a fault model), it also discards important information that inevitably results in inaccuracies. This paper describes a fast and accurate defect simulation framework called SLIDER (simulation of layout-injected defects for electrical responses). SLIDER uses well-developed mixed-signal simulation technology that is conventionally used for design verification. There are three innovative aspects that distinguish SLIDER from prior work in this area: 1) accuracy resulting from defect injection taking place at the layout level; 2) speedup resulting from careful and automatic partitioning of the circuit into maximal digital and minimal analog domains for mixed-signal simulation; and 3) complete automation that includes defect generation, defect injection, design partitioning, netlist extraction, mixed-signal simulation, and test-data extraction. The virtual failure data created by SLIDER is useful in a variety of settings that include diagnosis resolution improvement, defect localization, fault model evaluation, and evaluation of yield/test learning techniques that are based on failure data analysis. Chen, T.-J.; Li, J.-F.; Tseng, T.-W. Cost-Efficient Built-In Redundancy Analysis With Optimal Repair Rate for RAMs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200438 Built-in self-repair (BISR) techniques are widely used for the repair of embedded memories. One of the key components of a BISR circuit is the built-in redundancy-analysis (BIRA) module, which allocates redundancies according to the designed redundancy analysis algorithm. Thus, the BIRA module affects the repair rate of the BISR circuit. Existing BIRA schemes for RAMs can provide the optimal repair rate (the ratio of the number of repaired RAMs to the number of defective RAMs), but they require either high area cost or multiple test runs. This paper proposes a BIRA scheme for RAMs, which can provide the optimal repair rate using very low area cost and single test run. Furthermore, the BIRA is designed as reconfigurable such that it can be shared by multiple RAMs. Experimental results show that the area cost for implementing the proposed BIRA scheme is much lower than that of existing BIRA schemes with optimal repair rate. A test chip is also implemented to demonstrate the proposed BIRA scheme. Harutyunyan, G.; Shoukourian, S.; Vardanian, V.; Zorian, Y. A New Method for March Test Algorithm Generation and Its Application for Fault Detection in RAMs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200436 In this paper, all linked and unlinked static and two-operation dynamic faults are considered. A classification for their description is introduced. To generate a test algorithm for detection of all the considered faults, it was shown that it is not an easy problem. For this purpose, a new structure-oriented method is developed. Based on the proposed method, an efficient test algorithm March LSD of complexity 75N is generated for the detection of the considered linked static and dynamic faults. Rabenalt, T.; Richter, M.; Poehl, F.; Goessel, M. Highly Efficient Test Response Compaction Using a Hierarchical X-Masking Technique http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200442 This paper presents a highly effective compactor architecture for processing test responses with a high percentage of x-values. The key component is a hierarchical configurable masking register, which allows the compactor to dynamically adapt to and provide excellent performance over a wide range of x-densities. A major contribution of this paper is a technique that enables the efficient loading of the x-masking data into the masking logic in a parallel fashion using the scan chains. A method for eliminating the requirement for dedicated mask control signals using automated test equipment timing flexibility is also presented. The proposed compactor is especially suited to multisite testing. Experiments with industrial designs show that the proposed compactor enables compaction ratios exceeding 200x. Nassery, A.; Erol, O. E.; Ozev, S.; Verhelst, M. Test Signal Development and Analysis for OFDM Systems RF Front-End Parameter Extraction http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200440 Testing radio frequency (RF) transceivers requires the measurement of a diverse set of specifications, requiring multiple testing setups. This complicates load board design, debug, and diagnosis, as well as results in long testing time. In this paper, we present a single setup testing solution for orthogonal frequency-division multiplexing systems RF front-ends based on a loop-around scheme. With this technique, it is possible to determine gain and phase mismatch, inphase-quadrature time skew, and dc offset. Linear gain and IIP3 decouple the transmitter parameters from the receiver parameters. Although loop-around has been used in many forms, the basic challenge is to determine what input conditions will lead to accurate measurement and what form of modeling will yield this accuracy. To this end, we develop test signal design and multistep extraction techniques. Experimental results indicate that IIP3 can be extracted with 0.6 dB maximum error while phase mismatch and gain mismatch can be extracted with 0.3$^{circ}$ and 0.6% maximum error. Our method is able to de-embed the characteristics of transmitter from those of receiver while it requires the analysis of only low-frequency digital baseband signals (I and Q branches) and eliminates the need for RF testers. Yilmaz, E.; Ozev, S. Test Application for Analog/RF Circuits With Low Computational Burden http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6200445 In this paper, we propose an adaptive test strategy that tailors the test sequence with respect to the properties of each individual instance of a circuit. Reducing the test set by analyzing the dropout patterns during characterization and eliminating the unnecessary tests has always been the approach for high volume production in the analog domain. However, once determined, the test set remains typically fixed for all devices. We propose to exploit the statistical diversity of the manufactured devices and adaptively eliminate tests that are determined to be unnecessary based on information obtained on the circuit under test. Test time information is incorporated in the method to yield short test time. The proposed methodology is computationally efficient and imposes very little overhead on the tester. We compare our results with other similar specification-based test reduction techniques for a low noise amplifier (LNA) circuit and an analog industrial circuit. Results show 85% test quality improvement for the same test time or 24% test time reduction for the same test quality for the LNA circuit. Moreover, near zero defective parts per million is achieved for the industrial circuit.