August 2012 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2012-08.txt REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Eick, M.; Graeb, H. E. MARS: Matching-Driven Analog Sizing http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238393 This paper presents a new approach for automatic computation of matching constraints for analog sizing. The method automatically analyzes a circuit netlist to generate sizing constraints. These sizing constraints are considered during a subsequent numerical optimization. It is the first method that computes symmetry constraints for sizing, based on the hierarchical structure of the circuit and a qualitative signal flow. As a unique feature, the method is capable of handling multiple signal and feedback paths, leading to multiple symmetry axes. It can be applied to linear and nonlinear circuits and any available sizing algorithm. Experimental results show that the runtime of analog sizing, as well as of result quality, are greatly improved. EMBEDDED SYSTEMS Qin, X.; Wang, W.; Mishra, P. TCEC: Temperature and Energy-Constrained Scheduling in Real-Time Multitasking Systems http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238400 The ongoing scaling of semiconductor technology is causing severe increase of on-chip power density and temperature in microprocessors. This urgently requires both power and thermal management during system design. In this paper, we propose a model checking-based technique using extended timed automata to solve the processor frequency assignment problem in a temperature and energy-constrained multitasking system. We also develop a polynomial time-approximation algorithm to address the state-space explosion problem caused by symbolic model checker. Our approximation scheme is guaranteed to not generate any false-positive answer, while it may return false-negative answer in rare cases. Our method is universally applicable since it is independent of any system and task characteristics. Experimental results demonstrate the usefulness of our approach. HIGH-LEVEL SYNTHESIS Kong, B. Y.; Park, I.-C. FIR Filter Synthesis Based on Interleaved Processing of Coefficient Generation and Multiplier-Block Synthesis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238399 An efficient filter synthesis algorithm is proposed to minimize the number of adders required in the design of finite- impulse response filters. Given a specification, a filter can be synthesized by conducting two main steps: coefficient generation and multiplier-block synthesis. While most of previous works have focused on only one of the steps, the proposed algorithm integrates the two steps in an interleaved manner so as to take into account the effect of multiplier-block synthesis in generating coefficients. In addition, the concept of sensitivity is developed to reduce the complexity of computing the variable ranges of coefficients. Experimental results show that the proposed algorithm outperforms previous algorithms in terms of adder cost and takes a relatively short computation time. MODELING AND SIMULATION Weng, S.-H.; Chen, Q.; Cheng, C.-K. Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method With Adaptive Control http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238402 We propose an explicit numerical integration method based on matrix exponential operator for transient analysis of large-scale circuits. Solving the differential equation analytically, the limiting factor of maximum time step changes largely from the stability and Taylor truncation error to the error in computing the matrix exponential operator. We utilize Krylov subspace projection to reduce the computation complexity of matrix exponential operator. We also devise a prediction-correction scheme tailored for the matrix exponential approach to dynamically adjust the step size and the order of Krylov subspace approximation. Numerical experiments show the advantages of the proposed method compared with the implicit trapezoidal method. PHYSICAL DESIGN Jung, M.; Mitra, J.; Pan, D. Z.; Lim, S. K. TSV Stress-Aware Full-Chip Mechanical Reliability Analysis and Optimization for 3-D IC http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238396 In this paper, we propose an efficient and accurate full-chip thermomechanical stress and reliability analysis tool and design optimization methodology to alleviate mechanical reliability issues in 3-D integrated circuits (ICs). First, we analyze detailed thermomechanical stress induced by through-silicon vias in conjunction with various associated structures such as landing pad and dielectric liner. Then, we explore and validate the linear superposition principle of stress tensors and demonstrate the accuracy of this method against detailed finite element analysis simulations. Next, we apply this linear superposition method to full-chip stress simulation and a reliability metric named the von Mises yield criterion. Finally, we propose a design optimization methodology to mitigate the mechanical reliability problems in 3-D ICs. Our numerical experimental results demonstrate the effectiveness of the proposed methodology. Chen, J.; Zhu, W. An Analytical Placer for VLSI Standard Cell Placement http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238405 Placement is the process of determining the exact locations of circuit elements within a chip. It is a crucial step in very large scale integration (VLSI) physical design, because it affects routability, performance, and power consumption of a design. In this paper, we develop a new analytical placer to solve the VLSI standard cell placement problem. The placer consists of two phases, multilevel global placement (GP) and detailed cell placement (DP). In the stage of GP, during the clustering stage, we use a nonlinear programming technique and a best-choice clustering algorithm to take a global view of the whole netlist and placement information, and then use an iterative local refinement technique during the declustering stage to further distribute the cells and reduce the wirelength. In the stage of DP, we develop a fast legalization algorithm to make the solution by global placement legal and use a cell order polishing to improve the legal solution. The proposed algorithm is tested on the IBM standard cell benchmark circuits and Peko suites. Experimental results show that our placer obtains high-quality results in a reasonable running time. Zhao, X.; Tolbert, J. R.; Mukhopadhyay, S.; Lim, S. K. Variation-Aware Clock Network Design Methodology for Ultralow Voltage (ULV) Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238404 This paper presents a design methodology for robust and low-energy clock networks for ultralow voltage (ULV) circuits. We show that both clock slew and skew play important roles in achieving high maximum operating frequency Fmax and low clock energy in ULV circuits. In addition, clock networks in ULV circuits are highly sensitive to process variations. We propose a variation-aware methodology that controls both clock skew and slew to maximize Fmax and minimize clock power. In addition, we implement dynamic programming (DP)-based ULV clock routing and buffering methods (deferred merging and embedding) for deterministic and statistical conditions. Experimental results show that our clock network design method achieves lower energy (more than 20% savings) at comparable or even higher Fmax compared with the existing methods. SYSTEM-LEVEL DESIGN Vitkovskiy, A.; Soteriou, V.; Nicopoulos, C. A Dynamically Adjusting Gracefully Degrading Link-Level Fault-Tolerant Mechanism for NoCs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238398 The rapid scaling of silicon technology has enabled massive transistor integration densities. Nanometer feature sizes, however, are marred by increasing variability and susceptibility to wear-out. Billion-transistor designs, such as chip multiprocessors (CMPs), are especially vulnerable to defects. CMPs rely on a network-on-chip for all their communication needs. A single link failure within this on-chip fabric can impede, halt, or even deadlock, intertile communication, which can render the entire chip multiprocessor useless. In this paper, we present a technique capable of handling very large numbers of permanent wire failures that occur in parallel links either at manufacture- time or at runtime (dynamically). As opposed to marking an entire parallel link as faulty, whenever some wires fail, the proposed methodology employs these partially-faulty links (PFLs) to continue the transfer of informationÑ albeit at a gracefully degraded modeÑin order to maintain network connectivity. Furthermore, the presented technique can designate PFLs as fully-faulty when several wires fail, by utilizing appropriate routing algorithms that bypass nonoperational links, while still maintaining load-balance in the vicinity of PFLs. The proposed scheme employs architectural support within the on-chip routers to detect link failures and enable reconfiguration at the granularity of individual wires. Hardware synthesis confirms the low-cost nature of the proposed architecture, and full-system simulations using both synthetic network traffic and real workloads demonstrate its efficacy. Le, H. M.; Grosse, D.; Drechsler, R. Automatic TLM Fault Localization for SystemC http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238395 To meet today's time-to-market demands, catching bugs as early as possible during the design of a system is essential. In electronic system level design where SystemC has become the de-facto standard due to transaction level modeling (TLM), many approaches for verification have been developed. They determine an error trace that demonstrates the difference between the required and the actual behavior of the system. However, the subsequent debugging process is very time-consuming, in particular due to TLM-related faults caused by complex process synchronization and concurrency. In this paper, we present an automatic fault localization approach for SystemC TLM designs. We target typical TLM faults, such as accidentally swapped blocking and nonblocking transactions, erroneous event notification, or incorrect transaction data. The approach determines parts of the design that can be changed such that the intended behavior of the design is obtained by removing the contradiction given by the error trace. Single, as well as multiple faults, is considered. Techniques based on bounded model checking are used to find the faulty parts. We demonstrate the quality of our approach by several experiments. As shown in the experiments, the fault locations are identified very fast and hence a significant acceleration for the design of SystemC TLM models is achieved. TEST Liu, X.; Xu, Q. On Signal Selection for Visibility Enhancement in Trace-Based Post-Silicon Validation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238403 Today's complex integrated circuit designs increasingly rely on post-silicon validation to eliminate bugs that escape from pre-silicon verification. One effective silicon debug technique is to monitor and trace the behaviors of the circuit during its normal operation. However, due to the associated overhead, designers can only afford to trace a small number of signals in the design. Selecting which signals to trace is therefore a crucial issue for the effectiveness of this technique. This paper proposes an automated trace signal selection strategy with a new probability-based evaluation metric, which is able to dramatically enhance the visibility in post-silicon validation. Experimental results on benchmark circuits show that the proposed technique is more effective than existing solutions. Chung, J.; Xiong, J.; Zolotov, V.; Abraham, J. A. Testability-Driven Statistical Path Selection http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238392 In the face of large-scale process variations, statistical timing methodology has advanced significantly over the last few years, and statistical path selection takes advantage of it in at-speed testing. In deterministic path selection, the separation of path selection and test generation is known to require time consuming iteration between the two processes. This paper shows that in statistical path selection, this is not only the case, but also the quality of results can be severely degraded even after the iteration. To deal with this issue, we consider testability in the first place by integrating a satisfiability (SAT) solver, and this necessitates a new statistical path selection method. We integrate the SAT solver in a novel way that leverages the conflict analysis of modern SAT solvers, which provides more than 4X speedup without special optimizations of the SAT solver for this particular application. Our proposed method is based on a generalized path criticality metric whose properties allow efficient pruning. Our experimental results show that the proposed method achieves 47% better quality of results on average, and up to 361X speedup compared to statistical path selection followed by test generation. SHORT PAPERS Shen, S.; Qin, Y.; Wang, K.; Pang, Z.; Zhang, J.; Li, S. Inferring Assertion for Complementary Synthesis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238401 Complementary synthesis can automatically synthesize the decoder circuit of an encoder. However, its user needs to manually specify an assertion on some configuration pins to prevent the encoder from reaching the nonworking states. To avoid this tedious task, this paper propose an automatic approach to infer this assertion, by iteratively discovering and removing cases without decoders. To discover all decoders that may exist simultaneously under this assertion, a second algorithm based on functional dependency is proposed to decompose $BBR$, the Boolean relation that uniquely determines the encoder's input, into all possible decoders. To help the user select the correct decoder, a third algorithm is proposed to infer each decoder's precondition formula, which represents those cases that lead to this decoder's existence. Experimental results on several complex encoders indicate that our algorithm can always infer assertions and generate decoders for them. Moreover, when multiple decoders exist simultaneously, the user can easily select the correct one by inspecting their precondition formulas. Foreman, E. A.; Habitz, P. A.; Cheng, M.-C.; Visweswariah, C. A Novel Method for Reducing Metal Variation With Statistical Static Timing Analysis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238394 Process variation continues to increase with new technologies. With the advent of statistical static timing analysis (SSTA), multiple independent sources of variation can be modeled. This paper proposes a novel technique to reduce variability of metal process variation in SSTA. This novel method maximizes sensitivity cancellation to minimize variability. The developed methodology is simulated with SSTA in 65-nm technology and shows a reduction in variability. Adir, A.; Nahir, A.; Ziv, A. Concurrent Generation of Concurrent Programs for Post-Silicon Validation http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238397 The continuing trend toward increased parallelism in processor design can be seen in both the growing number of processor cores per system and in on-core hardware mechanisms that assist parallelism, such as multithreading and cache hierarchies. This complexity exacerbates the problem of ensuring the functional correctness of such hardware systems. The growing importance of post-silicon validation is leading to an emerging type of parallel application, namely, the hardware exerciser. We describe a method for exercising parallel hardware by generating pseudorandom concurrent test programs. The test generation is carried out on the tested parallel platform and thus the generator itself is also a concurrent program. We describe the challenges associated with this technology and the approach used by the Threadmill hardware exerciser, a tool developed for the post-silicon validation of the IBM POWER7 processor.