TCAD Newsletter - May 2009 Issue Placing you one click away from the best new CAD research! WINNER of the 2009 DONALD O. PEDERSON BEST PAPER AWARD!!! Systematic and Automated Multiprocessor System Design, Programming, and Implementation Authors: Nikolov, H.; Stefanov, T.; Deprettere, E. Issue 3, March 2008, Page(s): 542-555 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4378214&isnumber=4454003 For modern embedded systems in the realm of highthroughput multimedia, imaging, and signal processing, the complexity of embedded applications has reached a point where the performance requirements of these applications can no longer be supported by embedded system architectures based on a single processor. Thus, the emerging embedded system-on-chip platforms are increasingly becoming multiprocessor architectures. As a consequence, two major problems emerge, namely how to design and how to program such multiprocessor platforms in a systematic and automated way in order to reduce the design time and to satisfy the performance needs of applications executed on such platforms. As an efficient solution to these two problems, this paper presents the methodology and techniques implemented in a tool called Embedded System-level Platform synthesis and Application Mapping (ESPAM) for automated multiprocessor system design, programming, and implementation. ESPAM moves the design specification and programming from the Register Transfer Level and low-level C to a higher system level of abstraction. The paper explains how, starting from system-level platform, application, and mapping specifications, a multiprocessor platform is synthesized, programmed, and implemented in a systematic and automated way. Regular Papers ============== Automated Design and Optimization of Low-Noise Oscillators Vytyaz, I.; Lee, D. C.; Hanumolu, P. K.; Moon, U.-K.; Mayaram, K. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838897&isnumber=4838810 This paper presents a technique for automated design and optimization of low noise oscillators. A sensitivity analysis for oscillators guides the reduction of oscillator noise intensities and improvement of oscillator's immunity to noise. A design-oriented approach to circuit analysis efficiently handles design constraints and reduces the dimensionality of the optimization problem. The perturbation projection vector based phase noise computation makes the proposed optimization technique general and applicable to all types of oscillators, independent of circuit topology. Regular Analog/RF Integrated Circuits Design Using Optimization With Recourse Including Ellipsoidal Uncertainty Xu, Y.; Hsiung, K.-L.; Li, X.; Pileggi, L. T.; Boyd, S. P. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838812&isnumber=4838810 Long design cycles due to the inability to predict silicon realities is a well-known problem that plagues analog/RF integrated circuit product development.In this paper a regular analog/RF IC using metal-mask configurability design methodology ORACLE is proposed, which is a combination of reuse and shared-use by formulating the synthesis problem as an optimization with recourse problem. Using a two-stage geometric programming with recourse approach, ORACLE solves for both the globally optimal shared and application-specific variables. Furthermore, robust optimization is proposed to treat the design with variability problem, further enhancing the ORACLE methodology by providing yield bound for each configuration of regular designs. Custom Floating-Point Unit Generation for Embedded Systems Chong, Y. J.; Parameswaran, S. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838813&isnumber=4838810 While Application Specific Instruction Set Processors (ASIPs) have allowed designers to create processors with custom instructions to target specific applications, floating-point units (FPUs) are still instantiated as non-customizable general-purpose units, which if under utilized, wastes area. This paper presents a method for generating application specific FPUs for embedded systems. The technique involves determining the subset of floating-point operations that should be implemented in hardware and performing datapath merging on the hardware datapaths for the floating-point operations, while using a novel bit-alignment technique. Bit-alignment is necessary to allow the sharing of resources with different bit-widths. Fast Unified Floorplan Topology Generation and Sizing on Heterogeneous FPGAs Banerjee, P.; Sur-Kolay, S.; Bishnu, A. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838899&isnumber=4838810 This paper presents a three phase method Heterofloorplan for unified floorplan topology generation and sizing in FPGAs with heterogeneous resources of CLBs, RAMs and MULs. The phases are recursive bipartitioning, generation of slicing topologies, and allocation of CLBs and RAM/MULs by a greedy heuristic and min-cost max-flow method respectively. Partitioning and Scheduling of Task Graphs on Partially Dynamically Reconfigurable FPGAs Cordone, R.; Redaelli, F.; Redaelli, M. A.; Santambrogio, M. D.; Sciuto, D. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838817&isnumber=4838810 This paper proposes a new model for the partitioning and scheduling of specifications on partially dynamically reconfigurable hardware. Though this problem can be solved optimally only by tackling its sub-problems jointly, the exceeding complexity of the task suggests a decomposition into two phases. The partitioning phase is based on a new graph-theoretic approach, which aims to obtain near-optimality even if performed independently from the subsequent phase. For the scheduling phase, a new ILP formulation and a heuristic approach are developed. System-Level Power Management Using Online Learning Dhiman, G.; Rosing, T. S. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838819&isnumber=4838810 This paper proposes a novel online learning algorithm for system level power management. It formulates DPM and DVFS problems as one of workload characterization and selection, and solves them using this algorithm. The selection is done among a set of experts (a set of DPM policies and voltage-frequency settings), where the online learning algorithm guarantees fast convergence to the best performing expert. Voltage-Island Partitioning and Floorplanning Under Timing Constraints Lee, W.-P.; Liu, H.-Y.; Chang, Y.-W. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838821&isnumber=4838810 Power consumption is a crucial concern in nanometer chip design, and multiple supply voltage (MSV) is proposed as an effective method for power consumption reduction. The underlying idea behind MSV is the trade-off between power saving and performance. In this paper, an effective voltage assignment technique based on dynamic programming is presented. For circuits without re-convergent fanouts, an optimal solution for the voltage assignment is guaranteed; for circuits with re-convergent fanouts, a near-optimal solution is obtained. The authors then generate a level shifter for each net that connects two blocks in different voltage domains, and perform power-network aware floorplanning. Exact Multiple-Control Toffoli Network Synthesis With SAT Techniques Grosse, D.; Wille, R.; Dueck, G. W.; Drechsler, R. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838823&isnumber=4838810 Synthesis of reversible logic has become a very important research area in recent years. Applications can be found in the domain of low-power design, optical computing and quantum computing. In the past, several approaches have been introduced that synthesize reversible networks with respect to a given function. Most of these methods only approximate a minimal network representation. In this work exact algorithms for the synthesis of multiple control Toffoli networks are presented, i.e., algorithms which guarantee to find a network with the minimal number of gates. Scan-Chain Partition for High Test-Data Compressibility and Low Shift Power Under Routing Constraint Wang, S.-J.; Li, K. S.-M.; Chen, S.-C.; Shiu, H.-Y.; Chu, Y.-L. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838825&isnumber=4838810 The achievable test data compression rate depends on not only the compression scheme but also the applied test data. The relationship between signal probability and test data entropy is explored in this paper, and the results show that the theoretical maximum compression can be increased through a partition of scan flip-flops. This approach simply puts similar scan flip-flops in adjacent part of a scan chain, which also helps to reduce shift power. The intra-partition scan chain order has little impact on the compressibility; thus, it is easy to achieve higher test compression with low routing overhead. Inferno: Streamlining Verification With Inferred Semantics DeOrio, A.; Bauserman, A. B.; Bertacco, V.; Isaksen, B. C. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838827&isnumber=4838810 In order to aid the understanding of designers' intentions and accurately verify a design, Inferno is presented, a novel solution capable of automatically extracting semantic information from simulation traces. Transactions, that is, monolithic communication units which have typically been observed several times during the logic simulation, can be used to aid in the understanding of a design, encoded as assertions for random simulation or presented graphically. When approved as correct, transactions can be used in a newly developed closed-loop verification methodology called Transactional Verification, which leverages a database of approved transactions to describe correct design behavior. A Formal Approach for Debugging Arithmetic Circuits Sarbishei, O.; Tabandeh, M.; Alizadeh, B.; Fujita, M. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838829&isnumber=4838810 This paper presents an efficient debugging algorithm for a post synthesis arithmetic circuit. The approach is robust under wide varieties of arithmetic circuit architectures and design optimizations. The debugging algorithm consists of three phases of partial product initialization, XOR extraction and carry-signal mapping. The run-time complexity of the proposed algorithm is much better compared to the conventional algorithms conventional carry signal mapping algorithms is exponential. However, the proposed algorithm categorizes the extracted XORs into half/full-adders to make a very fast debugging algorithm. Short Papers ============ Bit-Swapping LFSR and Scan-Chain Ordering: A Novel Technique for Peak- and Average-Power Reduction in Scan-Based BIST Abu-Issa, A. S.; Quigley, S. F. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838831&isnumber=4838810 This paper presents a novel low transition LFSR. The proposed design, called bit swapping LFSR (BS-LFSR), is composed of a LFSR and a 2x1 multiplexer. If used for scan-based BIST, the BS-LFSR reduces the number of transitions that occur at the scan chain input during scan shift operation by 50% when compared with a conventional LFSR. The BS-LFSR is combined with a scan-chain ordering algorithm that orders the cells in a way that reduces the average and peak power. A Tree Based Novel Representation for 3D-Block Packing Fujiyoshi, K.; Kawai, H.; Ishihara, K. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838833&isnumber=4838810 The three-dimensional (3D) packing problem consists of arranging non-overlapping rectangular boxes of given sizes in a rectangular box of minimum volume. As a representation of 3D packings, this paper proposes a novel encoding method called Double Tree and Sequence (DTS).The following are features of DTS: (1) It can represent any minimal packing. (2) It can be decoded into the corresponding 3D packing in $O(n2)$ time. (3) The size of the solution space of DTS is significantly smaller than any conventional representation that can represent any packing. Layout-Based Defect-Driven Diagnosis for Intracell Bridging Defects Tzeng, C.-W.; Cheng, H.-C.; Huang, S.-Y. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838835&isnumber=4838810 The paper presents a layout-based methodology to predict the exact physical location of a bridging defect inside a standard cell. It involves a number of techniques. First of all, most likely intra-cell bridging defects are identified through layout analysis, and then converted into equivalent logic models. Next, the authors use a new defect-oriented formulation to generate test pattern for each candidate defect so as to further enhance the diagnostic resolution. Modeling Approaches for Functional Verification of RF-SoCs: Limits and Future Requirements Wang, Y.; Joeres, S.; Wunderlich, R.; Heinen, S. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4838837&isnumber=4838810 This paper gives an overview of current modeling approaches to handle functional verification of a design on the top level, prior to tape out. The problems that arise from the different approaches such as baseband modeling and event-driven modeling are explained and the resulting effects on the simulated system specifications are presented. The necessity of the approaches for future systems, which are not simulatable with current methods, is presented and the needed extensions of the hardware description languages and simulators are proposed.