August 2011 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-08.txt TCAD NEWS Impact factor data The latest impact factor information is out: according to data from the Journal Citation Reports, the two-year impact factor for the journal in 2010 is 1.252 (roughly level from 1.230 in 2009), the top rating among journals in the area of design automation. Changes to the editorial board Professor Li-C. Wang from the University of California-Santa Barbara has joined the editorial board as an Associate Editor in the area of test, replacing Professor Hank Walker from Texas A&M University. We thank Hank for his service and warmly welcome Li. REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Habal, H. Graeb, H. Constraint-Based Layout-Driven Sizing of Analog Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956867 A flow is presented for the automatic synthesis of an analog circuit layout based on a schematic and a list of circuit design parameter values. The flow is driven by design, placement, and routing constraints?no layout template is necessary. Every possible layout for each device in the circuit is investigated; the layouts with the best geometric features and smallest quantization error (due to manufacturing grid alignment) are kept. For circuit placement, a complete enumeration of possible circuit placements, limited only by usual constraints of symmetry, proximity, and common centroid, is performed. Out of this enumeration a final circuit placement is selected and routed. The new flow is integrated with a deterministic nonlinear optimization algorithm to perform layout-driven circuit sizing; layouts are synthesized during both gradient approximation and next step determination. Layout-driven circuit sizing was applied to two example circuits. Sizing of the first circuit example took $8times$ the amount of CPU time needed for traditional circuit sizing, but remained feasible at 2.1 h of wall clock time on a contemporary workstation. EMERGING TECHNOLOGIES Zhang, J. Patil, N. P. Hazeghi, A. Wong, H.-S. P. Mitra, S.Characterization and Design of Logic Circuits in the Presence of Carbon Nanotube Density Variations http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956874 Variations in the spatial density of carbon nanotubes (CNTs), resulting from the lack of precise control over CNT positioning during chemical synthesis, is a major hurdle to the scalability of carbon nanotube field effect transistor (CNFET) circuits. Such CNT density variations can lead to non-functional CNFET circuits. This paper presents a probabilistic framework for modeling the CNT count distribution contained in a CNFET of given width, and establishes the accuracy of the model using experimental data obtained from CNT growth. Using this model, we estimate the impact of CNT density variations on the yield of CNFET very large-scale integrated circuits. Our estimation results demonstrate that CNT density variations can significantly degrade the yield of CNFETs, and can be a major concern for scaled CNFET circuits. Finally, we analyze the impact of CNT correlation (i.e., correlation of CNT count between CNFETs) that exists in CNT growth, and demonstrate how the yield of a CNFET storage circuit (primarily limited by its noise immunity) can be significantly improved by taking advantage of such correlation. LOGIC SYNTHESIS Paik, S. Lee, S. Shin, Y.Retiming Pulsed-Latch Circuits With Regulating Pulse Width http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956870 A pulsed-latch is an ideal sequencing element for high-performance application-specific integrated circuit designs due to its simple timing model and reduced sequencing overhead. The possibility of time-borrowing while a latch is transparent is deliberately ignored in pulsed-latch circuits to simplify the timing model. However, using more than one pulse width allows another form of time-borrowing, which preserves the simple timing model. The associated problem of allocating pulse widths is called pulse width allocation (PWA); and we combine it with retiming to achieve a shorter clock period in pulsed-latch circuits than can be obtained by retiming or PWA alone, with less requirement for extra latches than standard retiming. An exact solution can be obtained by an integer linear programming, but this is restricted to small circuits. We therefore introduce a practical heuristic, which performs clock skew scheduling to find the minimum clock period and then brings the clock skew as nearly back to zero as possible by converting it to combined retiming and PWA. Experiments with 45 nm technology circuits suggest that the heuristic algorithm achieves a clock period that is close to minimum in most circuits, with an average 23% reduction compared to initial circuit, at an average cost of a 16% increase in the number of latches. On the same benchmarks, standard retiming achieves a 20% reduction in clock period with a 29% increase in the number of latches. The cost in extra area and energy is reasonable. Hu, W. Oberg, J. Irturk, A. Tiwari, M. Sherwood, T. Mu, D. Kastner, R. Theoretical Fundamentals of Gate Level Information Flow Tracking http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5948366 Information flow tracking is an effective tool in computer security for detecting unintended information flows. However, software based information flow tracking implementations have drawbacks in preciseness and performance. As a result, researchers have begun to explore tracking information flow in hardware, and more specifically, understanding the interference of individual bits of information through logical functions. Such gate level information flow tracking (GLIFT) can track information flow in a system at the granularity of individual bits. However, the theoretical basis for GLIFT, which is essential to its adoption in real applications, has never been thoroughly studied. This paper provides fundamental analysis of GLIFT by introducing definitions, properties, and the imprecision problem with a commonly used shadow logic generation method. This paper also presents a solution to this imprecision problem and provides results that show this impreciseness can be tolerated for the benefit of lower area and delay. MODELING AND SIMULATION Villena, J. F. Silveira, L. M. Multi-Dimensional Automatic Sampling Schemes for Multi-Point Modeling Methodologies http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956873 This paper presents a methodology for optimizing sample point selection in the context of model order reduction (MOR). The procedure iteratively selects samples from a large candidate set in order to identify a projection subspace that accurately captures system behavior. Samples are selected in an efficient and automatic manner based on their relevance measured through an error estimator. Projection vectors are computed only for the best samples according to the given criteria, thus minimizing the number of expensive solves. The scheme makes no prior assumptions on the system behavior, is general, and valid for single and multiple dimensions, with applicability on linear and parameterized MOR methodologies. The proposed approach is integrated into a multi-point MOR algorithm, with automatic sample and order selection based on a transfer function error estimation. Different implementations and improvements are proposed, and a wide range of results on a variety of industrial examples demonstrate the accuracy and robustness of the methodology. PHYSICAL DESIGN Ma, Q. Qian, Z. Young, E. F. Y. Zhou, H.MSV-Driven Floorplanning http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956869 Power consumption has become a crucial problem in modern circuit design. Multiple supply voltage (MSV) design is introduced to provide higher flexibility in controlling the power and performance tradeoff. One important requirement of MSV design is that timing constraints of the circuit must be satisfied after voltage assignment of the cells. In this article, we develop two algorithms to solve the voltage assignment problem under timing constraints, namely, min-cost flow (MCF) and value-oriented branch-and-bound (VOBB). In the MCF algorithm, the voltage assignment problem is formulated as a convex cost dual network flow problem, and can be solved optimally in polynomial time under certain conditions by calling a MCF solver. The VOBB algorithm, which is a VOBB-based searching method, solves the voltage assignment problem optimally in general cases by employing the MCF algorithm and a linear programming solver as subroutines. At last, we propose a MSV-driven floorplanning framework that optimizes power consumption and physical layout of a circuit simultaneously during the floorplanning stage, by embedding the MCF algorithm into a simulated annealing-based floorplanner and applying the VOBB algorithm as a postprocessing step. We compared our approach with the latest works on this problem, and the experimental results show that, using our approach, significant improvement on power saving can be achieved in much less running time, which confirms the effectiveness and efficiency of our method. SYSTEM-LEVEL DESIGN Kim, S. H. Mukhopadhyay, S. Wolf, M.Modeling and Analysis of Image Dependence and Its Implications for Energy Savings in Error Tolerant Image Processing http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958188 We present an analysis of the relationship between input images and energy consumption in error tolerant image processing. Under aggressive voltage scaling, the output image quality of image processing depends on input images for two reasons: 1) error tolerance among images is naturally disparate in terms of perceptual image quality assessment, and 2) the error rate under aggressive voltage scaling varies by input image types. Based on both effects, the supply voltage can be optimized for a given quality requirement so as to achieve ultralow power/energy dissipation. Our analysis demonstrates the significance of the accurate delay estimation, which depends on not only combinational inputs but also the previous state of the logic. We present a new sequential model for accurate error estimation. Based on the model, our experimental results demonstrate that different input image types lead to very different output quality. We also present the effect of process variation on the relationship between input image and output quality. The dependence of energy consumption on input images provides a new perspective for low-power multimedia and image processing system design. Irturk, A. Matai, J. Oberg, J. Su, J. Kastner, R.Simulate and Eliminate: A Top-to-Bottom Design Methodology for Automatic Generation of Application Specific Architectures http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956868 There is an increasing trend toward application specific processing, particularly in embedded computing devices that have stringent performance requirements. Achieving the desired area and throughput constraints requires careful tuning of the underlying architecture and high-level design tools are gaining increasing acceptance to achieve this goal while decreasing the design time. Most existing tools employ a bottom-to-top methodology, which piece together functional units, interconnect, and control logic based on the given application; this tends to scale poorly. We developed a tool, simulate and eliminate (S&E), that is fundamentally different from the existing high-level design tools as it employs a top-to-bottom methodology. S&E provides automatic generation of a variety of general purpose processing cores with different parameterization options. Then, the provided application(s) are simulated on this general-purpose architecture and the unneeded functionality is eliminated resulting in application specific architecture. S&E generates completely synthesizable hardware description language for an input C and/or MATLAB code. S&E provides different design methods and parameterization options to enable the user to study area and performance tradeoffs over a large number of different architectures and find the optimum architecture for the desired objective. Jou, J. M. Lee, Y.-L. Wu, S.-S.Model-Driven Design and Generation of New Multi-Facet Arbiters: From the Design Model to the Hardware Synthesis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958189 Designing of arbiters has become increasingly important due to their wide use in the areas such as multi-processor systems-on-a-chip and on-chip or off-chip high-speed cross-bar switches and networks. In this paper, we proposed a new systematic model-driven flow for designing the new scalable multi-facet arbiters through a 3-phase process combined with the template-based modular design approach that includes the design model derivation phase, the architecture model (or template) design phase as well as the arbiter hardware implementation and generation phase. First, we described the phase 1 of the design flow of how to induce an arbiter design model in detail by careful analysis of arbiter design issues and systematic design space exploring the construction of the model. Then, we continue to discuss the phase 2 of how to derive an architecture model or template using the reusability, modularity and expansibility techniques. With both the design and the architecture models, designers can easily design or at least understand and thus choose many kinds of different but better arbiters efficiently. Finally, we have developed a scalable decentralized parallel tree structure and corresponding modular algorithms for efficient arbiter hardware implementation, which also is the final phase of our proposed 3-phase model-driven design flow. Moreover, based on this modular and reusable hardware implementation structure as well as the algorithms, we have designed a parametric arbiter generator that automatically generates various multi-facet arbiters. Using this hardware implementation design and the generator, not only a fast and small round-robin arbiter but also other type arbiters were designed and generated on the fly. The hardware implementation algorithms, the generator, and the experiment results all were given to verify their performances. To our knowledge, this is the first time that such a systematic model-driven design approach is proposed in the practical hardware d- - esign and such a multi-facet arbiter is designed. Bogdan, P. Marculescu, R.Hitting Time Analysis for Fault-Tolerant Communication at Nanoscale in Future Multiprocessor Platforms http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956865 This paper investigates the on-chip stochastic communication and proposes an analytical model for computing its mean hitting time. Toward this end, we model the on-chip stochastic communication of any source-destination pair as a branching and annihilating random walk taking place on a finite mesh. The evolution of this branching process is studied via a master equation which helps us estimate the mean number of communication rounds needed to reach a destination node from a particular source node. Besides the probabilistic performance analysis, we also present experimental results for two concrete platforms and assess the potential of stochastic communication for future nanotechnologies. Beretta, I. Rana, V. Atienza, D. Sciuto, D. A Mapping Flow for Dynamically Reconfigurable Multi-Core System-on-Chip Design http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956864 Nowadays, multi-core systems-on-chip (SoCs) are typically required to execute multiple complex applications, which demand a large set of heterogeneous hardware cores with different sizes. In this context, the popularity of dynamically reconfigurable platforms is growing, as they increase the ability of the initial design to adapt to future modifications. This paper presents a design flow to efficiently map multiple multi-core applications on a dynamically reconfigurable SoC. The proposed methodology is tailored for a reconfigurable hardware architecture based on a flexible communication infrastructure, and exploits applications similarities to obtain an effective mapping. We also introduce a run-time mapper that is able to introduce new applications that were not known at design-time, preserving the mapping of the original system. We apply our design flow to a real-world multimedia case study and to a set of synthetic benchmarks, showing that it is actually able to extract similarities among the applications, as it achieves an average improvement of 29% in terms of reconfiguration latency with respect to a communication-oriented approach, while preserving the same communication performance. TEST Czysz, D. Mrugalski, G. Mukherjee, N. Rajski, J. Szczerbicki, P. Tyszer, J. Deterministic Clustering of Incompatible Test Cubes for Higher Power-Aware EDT Compression http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958190 The embedded deterministic test-based compression uses cube merging to reduce a pattern count, the amount of test data, and test time. It gradually expands a test pattern by incorporating compatible test cubes. This paper demonstrates that compression ratios can be order of magnitude higher, if the cube merging continues despite conflicts on certain positions. Our novel solution produces test clusters, each comprising a parent pattern and a number of its derivatives obtained by imposing extra bits on it. In order to load scan chains with patterns that feature original test cubes, only data necessary to recreate parent patterns as well as information regarding locations and values of the corresponding conflicting bits are required. A test controller can then deliver tests by repeatedly applying the same parent pattern, every time using a different control pattern to decide whether a given scan chain receives data from the parent pattern, or another pattern is used instead to recover content of the original test cube. Compression of incompatible test cubes preserves all benefits of continuous flow decompression and offers compression ratios of order $1000times$ with encoding efficiency much higher than 1.0. We also demonstrate that test clusters make it possible to deliver test patterns in a flexible power-aware fashion. This framework achieves significant reductions in switching activity during scan loading as well as additional test data volume reductions due to encoding algorithms employed to compress parent and control vectors. VERIFICATION Fey, G. Sulflow, A. Frehse, S. Drechsler, R. Effective Robustness Analysis Using Bounded Model Checking Techniques http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956866 Continuously shrinking feature sizes result in an increasing susceptibility of circuits to transient faults, e.g., due to environmental radiation. Approaches to implement fault tolerance are known. But assessing the fault tolerance of a given implementation is a hard verification problem. Here, we propose the use of formal methods to assess the robustness of a digital circuit with respect to transient faults. Our formal model uses a fixed bound in time and exploits fault detection circuitry to cope with the complexity of the underlying sequential equivalence check. As a result, a lower and an upper bound on the robustness are returned together with vulnerable components. The underlying algorithm and techniques to improve the efficiency are presented. In experiments, we evaluate the method on circuits with different fault detection mechanisms. SHORT PAPERS Pomeranz, I.Generation of Multi-Cycle Broadside Tests http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956871 The use of multi-cycle (or multi-pattern) tests for delay faults can reduce the number of clock cycles required for test application, and enhance the ability of a test set to detect delay defects. This is achieved by exercising the circuit in functional mode for several clock cycles as part of each test. This advantage is especially important for multi-pattern functional broadside tests, which guarantee normal functional operation conditions during the functional clock cycles of the test. This paper describes a procedure for generating multi-pattern broadside tests. The procedure extends a two-pattern test set gradually to increase the number of patterns included in each test while reducing the number of tests. Experimental results demonstrate that significant reductions in the numbers of clock cycles are possible with the proposed procedure for both functional and arbitrary broadside test sets. Shebaita, A. Das, D. Petranovic, D. Ismail, Y.A Novel Moment Based Framework for Accurate and Efficient Static Timing Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5956872 A novel methodology for accurate and efficient static timing analysis is presented in this paper. Our methodology uses the traditional cell library table structure with one modification. The cell library tables are filled with the gate output signal moments instead of the gate output 50% delay and output slew. Using only few moments gives much better accuracy and visibility for the gate output waveform than using the time domain information. Simple convolution of the gate output moments with the interconnect moments yields the signal moments at the stage output. The parameters of the gate input signal, which are used for the table access of the successive stage, are directly computed from the predecessor stage output moments using the closed form expressions without having to explicitly transform the frequency domain moments to time domain. Thus, the interconnects and the gates are treated in a unified moment-based homogeneous framework. The proposed approach inherits the classical cell library tables approach efficiency with even reduced computation complexities. As compared to the classical cell library table approach, the proposed approach accounts for the increasingly nonlinear and non-monotonic waveform shapes which are prohibitively difficult to represent in the classical approaches. In contrary to the classical approaches, increasing the accuracy in the novel approach is made flexible and can be achieved by simply using more moments. To illustrate the concept and prove its merits, multiple examples are presented with 2?3 moments which maintain accuracy within 1%?3% as compared to SPICE.