TCAD Newsletter - November 2010 Issue Placing you one click away from the best new CAD research! ANNOUNCEMENT: Call for Papers for an upcoming TCAD Special Section: PAR-CAD: Parallel CAD Algorithms and CAD for Parallel Architectures/Systems Submission deadline: Jan 30, 2011 Regular Papers ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Li, X., "Finding Deterministic Solution From Underdetermined Equation: Large-Scale Performance Variability Modeling of Analog/RF Circuits" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605327&isn... Abstract: The aggressive scaling of integrated circuit technology results in high-dimensional, strongly-nonlinear performance variability that cannot be efficiently captured by traditional modeling techniques. In this paper, we adapt a novel L0-norm regularization method to address this modeling challenge. Our goal is to solve a large number of (e.g., 10^4-10^6) model coefficients from a small set of (e.g., 10^2-10^3) sampling points without over-fitting. This is facilitated by exploiting the underlying sparsity of model coefficients. Namely, although numerous basis functions are needed to span the high-dimensional, strongly-nonlinear variation space, only a few of them play an important role for a given performance of interest. An efficient orthogonal matching pursuit (OMP) algorithm is applied to automatically select these important basis functions based on a limited number of simulation samples. Several circuit examples designed in a commercial 65 nm process demonstrate that OMP achieves up to 25times, speedup compared to the traditional least-squares fitting method. Ballicchia, M.; Orcioni, S., "Design and Modeling of Optimum Quality Spiral Inductors With Regularization and Debye Approximation" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605328&isn... Abstract: This paper proposes a methodology, based on the regularization theory, to find the planar spiral inductors with optimum quality factor in the whole space of geometric parameters. Moreover a new technique, based on the Debye approximation, has been applied to extract an equivalent circuit model of these inductors. This model provides an accurate wideband frequency characterization up to 30 GHz. EMERGING TECHNOLOGIES Huang, T.-W.; Lin, C.-H.; Ho, T.-Y., "A Contamination Aware Droplet Routing Algorithm for the Synthesis of Digital Microfluidic Biochips" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605305&isn... Abstract: Recent advances of digital microfluidic biochips (DMFBs) have revolutionized the traditional laboratory procedures. By providing the droplet-based system, DMFB can perform real-time biological analysis and safety-critical biomedical applications. However, different droplets being transported and manipulated on the DMFB may introduce the contamination problem caused by liquid residue between different biomolecules. To overcome this problem, a wash droplet is introduced to clean the contaminations on the surface of the microfluidic array. However, current scheduling of wash droplet does not restrict the extra used cells and execution time of bioassay, thereby degrading the reliability and fault-tolerance significantly. In this paper, we propose a contamination aware droplet routing algorithm for DMFBs. To reduce the routing complexity and the used cells, we first construct preferred routing tracks by analyzing the global moving vector of droplets to guide the droplet routing. To cope with contaminations within one subproblem, we first apply a k-shortest path routing technique to minimize the contaminated spots. Then, to take advantage of multiple wash droplets, we adopt a minimum cost circulation (MCC) algorithm for optimal wash-droplet routing to simultaneously minimize used cells and the cleaning time. Since the droplet routing problem consists of several subproblems, a look-ahead prediction technique is further used to determine the contaminations between successive subproblems. After that, we can simultaneously clean both contaminations within one subproblem and those between successive subproblems by using the MCC-based algorithm to reduce the execution time and the used cells significantly. Based on four widely used bioassays, our algorithm reduces the used cells and the execution time significantly compared with the state-of-the-art algorithm. Roy, S.; Bhattacharya, B. B.; Chakrabarty, K., "Optimization of Dilution and Mixing of Biochemical Samples Using Digital Microfluidic Biochips" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605330&isn... Abstract: The recent emergence of lab-on-a-chip (LoC) technology has led to a paradigm shift in many healthcare-related application areas, e.g., point-of-care clinical diagnostics, high-throughput sequencing, and proteomics. A promising category of LoCs is digital microfluidic (DMF)-based biochips, in which nanoliter-volume fluid droplets are manipulated on a 2-D electrode array. A key challenge in designing such chips and mapping lab-bench protocols to a LoC is to carry out the dilution process of biochemical samples efficiently. As an optimization and automation technique, we present a dilution/mixing algorithm that significantly reduces the production of waste droplets. This algorithm takes O(n) time to compute at most n sequential mix/split operations required to achieve any given target concentration with an error in concentration factor less than 1/2^n. To implement the algorithm, we design an architectural layout of a DMF-based LoC consisting of two O(n)-size rotary mixers and O(n) storage electrodes. Simulation results show that the proposed technique always yields nonnegative savings in the number of waste droplets and also in the total number of input droplets compared to earlier methods. FPGAS AND RECONFIGURABLE COMPUTING Chen, D.; Cong, J.; Dong, C.; He, L.; Li, F.; Peng, C.-C., "Technology Mapping and Clustering for FPGA Architectures With Dual Supply Voltages" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605304&isn... Abstract: This paper presents a technology mapping algorithm for field-programmable gate array architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not increase compared to the circuit with a single Vdd. This paper also presents an enhanced clustering algorithm that considers dual supply voltages, honoring the dual-Vdd mapping solution. To carry out various comparisons, we first design a single-Vdd mapping algorithm, named SVmap-2, which achieves a 3.8% total power reduction (15.6% dynamic power reduction) over a previously published low-power mapping algorithm, Emap . We then show that our dual-Vdd mapping algorithm, named DVmap-2, can further improve total power savings by 12.8% over SVmap-2, with a 52.7% dynamic power reduction. Compared to the early single-Vdd version SVmap, DVmap-2 is 14.3% better for total power reduction. This is achieved through an ideal selection of the low-Vdd/high-Vdd ratio and the consideration of various voltage changing scenarios during the mapping process. Czajkowski, T. S.; Brown, S. D., "Decomposition-Based Vectorless Toggle Rate Computation for FPGA Circuits" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605306&isn... Abstract: This paper presents a novel and accurate method of estimating the toggle rates of signals in field-programmable gate array (FPGA)-based logic circuits without the use of simulation vectors. Compared to previous vectorless techniques, our approach provides improved accuracy-of-results, especially for individual signals, which could be leveraged by computer-aided design (CAD) tools for performing power optimization of logic circuits. Increased accuracy is achieved by using stochastic methods that estimate the transition densities at FPGA logic elements while accounting for both spatial and temporal correlation of logic signals. Spatial correlation is calculated by leveraging a unique XOR-based decomposition technique that provides both accurate results and fast computation times. We also consider the delay information of implemented circuits, providing for a comprehensive treatment of glitches, including the effects of inertial limits on power dissipation. Our toggle-rate estimation approach has been tested on a commonly used set of Microelectronic Center of North Carolina circuits, as well as a set of industrial circuits targeted to Altera Stratix II FPGAs. Results show that our techniques provide a three times lower percent error, while maintaining a low processing time, when compared to two existing techniques: the vectorless estimation tool shipped with the commercial Quartus II 8.0 CAD tool, and the ACE v2.0 academic tool produced from the University of British Columbia, Vancouver, BC, Canada. HIGH-LEVEL SYNTHESIS Andriamisaina, C.; Coussy, P.; Casseau, E.; Chavet, C., "High-Level Synthesis for Designing Multimode Architectures" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605303&isn... Abstract: This paper addresses the design of multimode architectures for digital signal and image processing applications. We present a dedicated design flow and its associated high-level synthesis tool, named GAUT. Given a unified description of a set of time-wise mutually exclusive tasks and their associated throughput constraints, a single register transfer level hardware architecture optimized in area is generated. In order to reduce the register, the steering logic, and the controller complexities, this paper proposes a joint-scheduling algorithm, which maximizes the similarities between the control steps and specific binding approaches for both operators and storage elements which maximize the similarities between the datapaths. It is shown through a set of test cases that the proposed approach offers significant area saving and low-performance penalties compared to both state-of-the-art techniques and dedicated mono-mode architectures. MODELING AND SIMULATION Cong, J.; Gupta, P.; Lee, J., "Evaluating Statistical Power Optimization" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605324&isn... Abstract: In response to the increasing variations in integrated-circuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper, we quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upper bound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. We show that for the mean power measure, the deterministic optima is an excellent approximation, and for the mean plus standard deviation measures, the optimality gap increases as the amount of inter-die variation grows, for a suite of benchmark circuits in a 45 nm technology. For large variations, we show that there are excellent linear approximations that can be used to approximate the effects of variation. Therefore, the need to develop special statistical power optimization algorithms is questionable. Singhee, A.; Rutenbar, R. A., "Why Quasi-Monte Carlo is Better Than Monte Carlo or Latin Hypercube Sampling for Statistical Circuit Analysis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605333&isn... Abstract: At the nanoscale, no circuit parameters are truly deterministic; most quantities of practical interest present themselves as probability distributions. Thus, Monte Carlo techniques comprise the strategy of choice for statistical circuit analysis. There are many challenges in applying these techniques efficiently: circuit size, nonlinearity, simulation time, and required accuracy often conspire to make Monte Carlo analysis expensive and slow. Are we - the integrated circuit community - alone in facing such problems? As it turns out, the answer is "no". Problems in computational finance share many of these characteristics: high dimensionality, profound nonlinearity, stringent accuracy requirements, and expensive sample evaluation. We perform a detailed experimental study of how one celebrated technique from that domain - quasi-Monte Carlo (QMC) simulation - can be adapted effectively for fast statistical circuit analysis. In contrast to traditional pseudorandom Monte Carlo sampling, QMC uses a (shorter) sequence of deterministically chosen sample points. We perform rigorous comparisons with both Monte Carlo and Latin hypercube sampling across a set of digital and analog circuits, in 90 and 45 nm technologies, varying in size from 30 to 400 devices. We consistently see superior performance from QMC, giving 2 times to 8 times speedup over conventional Monte Carlo for roughly 1% accuracy levels. We present rigorous theoretical arguments that support and explain this superior performance of QMC. The arguments also reveal insights regarding the (low) latent dimensionality of these circuit problems; for example, we observe that over half of the variance in our test circuits is from unidimensional behavior. This analysis provides quantitative support for recent enthusiasm in dimensionality reduction of circuit problems. SYSTEM-LEVEL DESIGN Javaid, H.; Ignjatovic, A.; Parameswaran, S., "Rapid Design Space Exploration of Application Specific Heterogeneous Pipelined Multiprocessor Systems" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605323&isn... Abstract: This paper describes a rapid design methodology to create a pipeline of processors to execute streaming applications. The methodology seeks a system with the smallest area while its runtime is within a specified runtime constraint. Initially, a heuristic is used to rapidly explore a large number of processor configurations to find the near Pareto front of the design space, and then an exact integer linear programming (ILP) formulation (EIF) is used to find an optimal solution. A reduced ILP formulation (RIF) or the heuristic is used if the EIF does not find an optimal solution in a given time window. This design methodology was integrated into a commercial design flow and was evaluated on four benchmarks with design spaces containing up to 10^16 design points. For each benchmark, the near Pareto front was found in less than 3 h using the heuristic, while EIF took up to 16 h. The results show that the average area error of the heuristic and RIF was within 2.25% and 1.25% of the optimal design points for all the benchmarks, respectively. The heuristic is faster than RIF, while both the heuristic and RIF are significantly faster than EIF. VERIFICATION Keng, B.; Safarpour, S.; Veneris, A., "Bounded Model Debugging" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605322&isn... Abstract: Design debugging is a major bottleneck in modern very large scale integration design flows as both the design size and the length of the error trace contribute to its inherent complexity. With typical design blocks exceeding half a million synthesized logic gates and error traces in the thousands of clock cycles, the complexity of the debugging problem poses a great challenge to automated debugging techniques. This paper aims to address this daunting challenge by introducing the bounded model debugging methodology that iteratively analyzes bounded sequences of the error trace. Two techniques are introduced in this methodology to solve this growing problem. The first technique iteratively analyzes bounded subsequences of the error trace of increasing size until the error is found or the entire trace is analyzed. The second technique partitions the error trace into non-overlapping bounded sequences of clock cycles which are each separately analyzed. A discussion of these two techniques is presented and a unified methodology that leverages the strengths of both techniques is developed. Empirical results on real industrial designs show that for large designs and long error traces the proposed methodology can find the actual error in 79% of cases with the first technique and 100% of cases with the second technique. In cases where the methodology is not used, only 21% of cases are able to find the actual error. These numbers confirm the benefits of the proposed methodology to allow conventional automated debuggers to handle much larger real-life circuits. Chen, Y.; Safarpour, S.; Marques-Silva, J.; Veneris, A., "Automated Design Debugging With Maximum Satisfiability" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605301&isn... Abstract: As contemporary very large scale integration designs grow in complexity, design debugging has rapidly established itself as one of the largest bottlenecks in the design cycle today. Automated debug solutions such as those based on Boolean satisfiability (SAT) enable engineers to reduce the debug effort by localizing possible error sources in the design. Unfortunately, adaptation of these techniques to industrial designs is still limited by the performance and capacity of the underlying engines. This paper presents a novel formulation of the debugging problem using MaxSAT to improve the performance and applicability of automated debuggers. Our technique not only identifies errors in the design but also indicates when the bug is excited in the error trace. MaxSAT allows for a simpler formulation of the debugging problem, reducing the problem size by 80% compared to a conventional SAT-based technique. Empirical results demonstrate the effectiveness of the proposed formulation as run-time improvements of 4.5 times are observed on average. This paper introduces two performance improvements to further reduce the time required to find all error sources within the design by an order of magnitude. Short Papers ============ Lucas, G.; Dong, C.; Chen, D., "Variation-Aware Placement With Multi-Cycle Statistical Timing Analysis for FPGAs" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605329&isn... Abstract: Deep submicron processes have allowed field-programmable gate arrays (FPGAs) to grow in complexity and speed. However, such technology scaling has caused FPGAs to become more susceptible to the effects of process variation. In order to obtain sufficient yield values, it is now necessary to consider process variation during physical design. It is common for FPGAs to contain designs with multi-cycle paths to help increase the performance, but current statistical static timing analysis (SSTA) techniques cannot support this type of timing constraint. In this paper, we propose an extension to block-based SSTA to consider multi-cycle paths. We then use this new SSTA to optimize FPGA placement with our tool VMC-Place for designs with multi-cycle paths. Experimental results show our multi-cycle SSTA is accurate to 0.59% for the mean and 0.0024% for the standard deviation. Our results also show that VMC-Place is able to reduce the 95% performance yield clock period by 15.36% as compared to VPR. Kim, J.; Vandenberghe, L.; Yang, C.-K. K., "Convex Piecewise-Linear Modeling Method for Circuit Optimization via Geometric Programming" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605325&isn... Abstract: This paper presents a new method for fitting a convex piecewise-linear function to a given set of data, which can serve as an empirical modeling framework for circuit optimization via geometric programming. The method iteratively solves a series of linear optimization problems to minimize the fitting error. To reduce the fitting error in each iteration, an extra plane is added in the region where the largest error occurs. For verification, we apply the method to create transistor-level models in 90-nm complementary metal-oxide-semiconductor technology. Numerical results indicate that the proposed method can generate process-dependent transistor-level models with reasonable modeling accuracy. Chen, Y.-C.; Wang, C.-Y., "Fast Node Merging With Don't Cares Using Logic Implications" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605316&isn... Abstract: Node merging is a popular and effective logic restructuring technique that has recently been applied to minimize logic circuits. However, in the previous satisfiability (SAT)-based methods, the search for node mergers required trial-and-error validity checking of a potentially large set of candidate mergers. Here, we propose a new method, which directly identifies node mergers using logic implications without any SAT solving calls. Although the efficiency benefits of the method come at the expense of quality, we further engage the redundancy removal and the wire replacement techniques to enhance its quality. The experimental results show that the proposed optimization method achieves approximately 46 times the speedup while possessing a competitive capability of circuit minimization compared to the state-of-the-art method. Pomeranz, I.; Reddy, S. M., "On Undetectable Faults and Fault Diagnosis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605331&isn... Abstract: The presence of an undetectable fault u_i may modify the response of a detectable fault d_j to a test set used for fault diagnosis. This may impact the accuracy of fault diagnosis based on the responses of single faults. Many state-of-the-art diagnosis processes are based on the responses of single stuck-at faults even though their goal is to diagnose defects (including multiple defects) that are different from stuck-at faults. Therefore, we study the effects of undetectable single stuck- at faults on the accuracy of fault diagnosis based on the responses of single stuck-at faults. For this purpose, we consider the cases where the response of a double stuck-at fault u_i&d_j, which consists of an undetectable fault u_i and a detectable fault d_j, is different from the response of the single fault d_j. We show that there are significant, yet manageable, numbers of such faults in benchmark circuits under test sets used for fault diagnosis. In all these cases, a fault diagnosis process based on single stuck-at faults may not identify the locations of d_j and u_i as candidate defect sites if a defect affects the sites of d_j and u_i. We conclude that it is important to consider u_i&d_j during fault diagnosis in order not to preclude the sites of d_j and u_i as candidate defect sites. Chen, P.-L.; Huang, Y.-C.; Chang, T.-Y., "Fast Test Integration: Toward Plug-and-Play At-Speed Testing of Multiple Clock Domains Based on IEEE Standard 1500" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605317&isn... Abstract: The rapid advance of semiconductor technology exposes multifrequency designs to severe reliability loss due to incomplete at-speed testing, which is induced by ignorance of timing-related defects between clocks. However, the reduced testability caused by core-based design strategy also aggravates the difficulty in applying on-chip at-speed testing. Although previous works were able to successfully increase the quality of the at-speed testing, the diversity of on-chip clock control schemes from different components may complicate the test integration, increasing the test costs. Therefore, to accelerate the time-to-market and the time-to-volume, the development of a plug-and-play at-speed testing based on a well-defined test interface has become increasingly urgent. In this paper, a fast test integration approach for multi-clock-domain at-speed testing based on IEEE Standard 1500 is proposed. The proposed framework has been successfully integrated into an IEEE 1500-wrapped ultrawide-band design and a simple SoC design. Experiment results also confirm the feasibility of the proposed approach. Li, J.-F.; Huang, Y.-J.; Hu, Y.-J., "Testing Random Defect and Process Variation Induced Comparison Faults of TCAMs With Asymmetric Cells" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5605326&isn... Abstract: This paper presents a march-like test T_AC-P to cover comparison faults of ternary content addressable memories (TCAMs) with asymmetric cells. The T_AC-P only requires 4N Write operations and (3N+2B) Compare operations for an N x B-bit TCAM with Hit and priority address encoder outputs. We show that the test also can cover search time failures induced by process variation in the comparison circuit of a TCAM. Furthermore, a test T_MF for match failures induced by the process variation in the comparison circuit of a TCAM is also presented.