TCAD Newsletter - September 2010 Issue Placing you one click away from the best new CAD research! Regular Papers EMBEDDED SYSTEMS Chou, C.L.; Marculescu, R., "Designing Heterogeneous Embedded Network-on-Chip Platforms With Users in Mind" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552223&isn... Abstract: In this paper, we propose a user-centric design methodology targeting heterogeneous embedded systems-on-chip where communication happens via the network-on-chip approach. More precisely, in this new design methodology, we consider explicitly the information about the user experience and apply machine learning techniques to develop a design flow which aims at minimizing the workload variance; this allows the system to better adapt to different types of user needs and workload variations. Our experimental results show that by considering the user experience into the design space exploration step, the system platforms generated by our approach achieve more than 30% energy savings, on average, compared to the single platform derived from the traditional design flow; this implies that each system configuration we generate is highly suitable for the targeted class of user and workload behaviors. EMERGING TECHNOLOGIES Lin, C. C.Y.; Chang, Y.W., "ILP-Based Pin-Count Aware Design Methodology for Microfluidic Biochips" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552222&isn... Abstract: Digital microfluidic biochips have emerged as a popular alternative for laboratory experiments. To make the biochip feasible for practical applications, pin-count reduction is a key problem to higher-level integration of reactions on a biochip. Most previous works approach the problem by post-processing the placement and routing solutions to share compatible control signals; however, the quality of such sharing algorithms is inevitably limited by the placement and routing solutions. We present in this paper a comprehensive pin-constrained biochip design flow that addresses the pin-count issue at all design stages. The proposed flow consists of three major stages: 1) pin-count aware stage assignment that partitions the reactions in the given bioassay into execution stages; 2) pin-count aware device assignment that determines a specific device used for each reaction; and 3) guided placement, routing, and pin assignment that utilize the pin-count saving properties from the stage and device assignments to optimize the assay time and pin-count. For both the stage and device assignments, basic integer linear programming formulations and effective solution-space reduction schemes are proposed to minimize the assay time and pin-count. Experimental results show the efficiency of our methods and a 55-57% pin-count reduction over the state-of-the-art algorithms/flow. MODELING AND SIMULATION Bayrakci, A. A.; Demir, A.; Tasiran, S., "Fast Monte Carlo Estimation of Timing Yield With Importance Sampling and Transistor-Level Circuit Simulation" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552215&isn... Abstract: Considerable effort has been expended in the electronic design automation community in trying to cope with the statistical timing problem. Most of this effort has been aimed at generalizing the static timing analyzers to the statistical case. On the other hand, detailed transistor-level simulations of the critical paths in a circuit are usually performed at the final stage of performance verification. We describe a transistor-level Monte Carlo (MC) technique which makes final transistor-level timing verification practically feasible. The MC method is used as a golden reference in assessing the accuracy of other timing yield estimation techniques. However, it is generally believed that it can not be used in practice as it requires too many costly transistor-level simulations. We present a novel approach to constructing an improved MC estimator for timing yield which provides the same accuracy as standard MC but at a cost of much fewer transistor-level simulations. This improved estimator is based on a unique combination of a variance reduction technique, importance sampling, and a cheap but approximate gate delay model. The results we present demonstrate that our improved yield estimator achieves the same accuracy as standard MC at a cost reduction reaching several orders of magnitude. Ye, X.; Li, P.; Zhao, M.; Panda, R.; Hu, J., "Scalable Analysis of Mesh-Based Clock Distribution Networks Using Application-Specific Reduced Order Modeling" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552177&isn... Abstract: Clock meshes possess inherent low clock skews and excellent immunity to process-voltage-temperature variations, and have increasingly found their way to high-performance integrated circuit designs. However, analysis of such massively coupled networks is significantly hindered by the sheer size of the network and tight coupling between non-tree interconnects and large numbers of clock drivers. While the SPICE simulation of large clock meshes is often intractable, standard interconnect model order reduction algorithms also fail due to the large number of input/output ports introduced by clock drivers. The presented approach is motivated by the key observation of the steady-state operation of the clock networks while its efficiency is facilitated by exploring new clock-mesh specific harmonic-weighted model order reduction algorithm and locality analysis via port sliding. The scalability of the analysis is significantly improved by eliminating the need for computing infeasible multi-port passive reduced order interconnect models with large port count and decomposing the overall task into very tractable and naturally parallelizable model generation and fast Fourier transform/inverse-fast Fourier transform operations, all on a per driver or per sink basis. We demonstrate the application of our approach by feasibly analyzing large clock meshes with excellent accuracy. Reis, T.; Stykel, T., "PABTEC: Passivity-Preserving Balanced Truncation for Electrical Circuits" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552178&isn... Abstract: We present a passivity-preserving balanced truncation model reduction method for differential-algebraic equations arising in circuit simulation. This method is based on balancing the solutions of projected Lure equations. By making use of the special structure of circuit equations, we can reduce the numerical effort for balanced truncation significantly. It is shown that the property of reciprocity is also preserved in the reduced-order model. Network topological interpretations of certain circuit effects are given. The presented model reduction method is illustrated by numerical examples. Meraji, S.; Zhang, W.; Tropper, C., "On the Scalability and Dynamic Load-Balancing of Optimistic Gate Level Simulation" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552191&isn... Abstract: As proscribed by Moore's law, the size of integrated circuits has grown geometrically, resulting in simulation becoming the major bottleneck in the circuit design process. Parallel simulation provides us with a way to cope with this growth. In this paper, we describe an optimistic (time warp) parallel discrete event simulator which can simulate all synthesizeable Verilog circuits. We investigate its scalability and describe a machine learning based dynamic load balancing algorithm for use with the simulator. We initially developed two dynamic load balancing algorithms to balance the load and the communication, respectively, during the course of a simulation. Making use of reinforcement learning (RL), we then created an algorithm which is an amalgam of these two algorithms. To the best of our knowledge, this is the first time that RL has been used for the dynamic load-balancing of time warp. We investigated the scalability and the effectiveness of the dynamic load balancing algorithms on gate level simulations of several realistic very large scale integration (VLSI) circuits. Our experimental results showed that our simulator is indeed scalable. They also reveled a 88.6% improvement in the simulation time through the use of our RL algorithm. SYSTEM-LEVEL DESIGN Kang, K.; Kim, J.; Yoo, S.; Kyung, C.M., "Temperature-Aware Integrated DVFS and Power Gating for Executing Tasks With Runtime Distribution" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552187&isn... Abstract: At high-operating temperature, chip cooling is crucial due to the exponential temperature dependence of leakage current. However, traditional cooling methods, e.g., power/clock gating applied when a temperature threshold is reached, often cause excessive performance degradation. In this paper, we propose a method for delivering lower energy consumption by integrating the cooling and running in a temperature-aware manner without incurring performance penalty. In order to further reduce the energy consumption, we exploited the runtime distribution of each sub-segment of a task called "bin" in an analytical manner such that time budget for cooling in each bin is allocated in proportion to the probability of the occurrence of the bin. We apply the proposed method to two realistic software programs, H.264 decoder and ray tracing and a benchmark program, equake. The experimental results show that the proposed method yields additional 19.4%-27.2% reduction in energy consumption compared with existing methods. Jung, H.; Pedram, M., "Supervised Learning Based Power Management for Multicore Processors" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552184&isn... Abstract: This paper presents a supervised learning based power management framework for a multi-processor system, where a power manager (PM) learns to predict the system performance state from some readily available input features (such as the occupancy state of a global service queue) and then uses this predicted state to look up the optimal power management action (e.g., voltage-frequency setting) from a precomputed policy table. The motivation for utilizing supervised learning in the form of a Bayesian classifier is to reduce the overhead of the PM which has to repetitively determine and assign voltage-frequency settings for each processor core in the system. Experimental results demonstrate that the proposed supervised learning based power management technique ensures system-wide energy savings under rapidly and widely varying workloads. TEST Khursheed, S.; Al-Hashimi, B. M.; Chakrabarty, K.; Harrod, P., "Gate-Sizing-Based Single V_{dd} Test for Bridge Defects in Multivoltage Designs" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552193&isn... Abstract: The use of multiple voltage settings for dynamic power management is an effective design technique. Recent research has shown that testing for resistive bridging faults in such designs requires more than one voltage setting for 100% fault coverage; however, switching between several supply voltage settings has a detrimental impact on the overall cost of test. This paper proposes an effective gate sizing technique for reducing test cost of multi-V_{rm dd} designs with bridge defects. Using synthesized ISCAS and ITC benchmarks and a parametric fault model, experimental results show that for all the circuits, the proposed technique achieves single V_{rm dd} test, without affecting the fault coverage of the original test. In addition, the proposed technique performs better in terms of timing, area, and power than the recently proposed test point insertion technique. This is the first reported work that achieves single V_{rm dd} test for resistive bridge defects, without compromising fault coverage in multi-V_{rm dd} designs. VERIFICATION Alizadeh, B.; Fujita, M., "Modular Datapath Optimization and Verification Based on Modular-HED" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552210&isn... Abstract: This paper proposes an automatic design flow of datapath-dominated applications which is able to deal with optimization and equivalence checking of multi-output polynomials over Z_{2^{n}}. This paper also gives four main contributions: 1) proposing a complete design flow for modular equivalence checking, high level synthesis, and optimization; 2) considering hidden monomials to factorize those polynomials which do not have any common monomials; 3) combining and improving our previous optimization heuristics to eliminate multi-operand common sub-expressions as much as possible; and 4) implementing all algorithms on top of the modular Horner expansion diagram package. Experimental results have shown an average saving of 9.9% and 4.5% in the number of gates and critical path delay, respectively, after applying modular reduction over Z_{2^{n}} , while other optimization techniques are not used. Besides, after applying our optimization techniques, experimental comparisons with the state-of-the-art techniques show an average improvement in the area by 19.5% with an average delay decrease of 16.7%. Regarding the comparison with our previous papers, the area and delay are improved by 13.3% and 15.5%, respectively. Morin-Allory, K.; Boule, M.; Borrione, D.; Zilic, Z., "Validating Assertion Language Rewrite Rules and Semantics With Automated Theorem Provers" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552218&isn... Abstract: Modern assertion languages such as property specification language (PSL) and SystemVerilog assertions include many language constructs. By far, the most economical way to process the full languages in automated tools is to rewrite the majority of operators to a small set of base cases, which are then processed in an efficient way. Since recent rewrite attempts in the literature have shown that the rules could be quite involved, sometimes counterintuitive, and that they can make a significant difference in the complexity of interpreting assertions, ensuring that the rewrite rules are correct is a major contribution toward ensuring that the tools are correct, and even that the semantics of the assertion languages are well founded. This paper outlines the methodology for computer-assisted proofs of several publicly known rewrite rules for PSL properties. We first present the ways to express the PSL syntax and semantics in the prototype verification system (PVS) theorem prover, and then prove or disprove the correctness of over 50 rewrite rules published without proofs in various sources in the literature. In doing so, we also demonstrate how to circumvent known issues with PSL semantics regarding the "ssr never" and "ssr eventually" operators, and offer our proposals on assertion language semantics. Short Papers ============ Pomeranz, I.; Reddy, S. M., "Hazard-Based Detection Conditions for Improved Transition Path Delay Fault Coverage" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5552192&isn... Abstract: Transition path delay faults were defined to capture the behavior of both small and large delay defects in a single fault model. The number of detectable transition path delay faults as defined earlier is the same or close to the number of conventional path delay faults that are detectable under the strong non-robust propagation conditions. When the weak non-robust propagation conditions are used, the number of detectable conventional path delay faults is significantly higher. Using what are called the hazard-based detection conditions for transition faults, we define detection conditions for transition path delay faults, under which the number of detectable faults is the same or close to the number of conventional path delay faults that are detectable under the weak non-robust propagation conditions. The fault model still captures the behavior of both small and large delay defects, but the number of detectable faults is significantly higher.