TCAD Newsletter - December 2010 Issue Placing you one click away from the best new CAD research! Regular Papers ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Maffezzoni, P., "Computing the Synchronization Regions of Injection-Locked Strongly Nonlinear Oscillators for Frequency Division Applications" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621039&isn... Abstract: This paper describes an original method to compute the synchronization regions of injection-locked oscillators that exhibit a strong nonlinear response, such as relaxation or ring architectures. The proposed method is based on the theory of controllably periodically forced oscillators and is implemented within the frame of the time-domain periodic steady-state analysis. The novel method is able to predict, in an accurate way, the conditions for which the correct frequency division occurs and can identify the transitions to more complex regimes that limit the exploitable synchronization region. FPGAS AND RECONFIGURABLE COMPUTING Lin, M.; Wawrzynek, J., "Improving FPGA Placement With Dynamically Adaptive Stochastic Tunneling" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621041&isn... Abstract: This paper develops a dynamically adaptive stochastic tunneling (DAST) algorithm to avoid the "freezing" problem commonly found when using simulated annealing for circuit placement on field-programmable gate arrays (FPGAs). The main objective is to reduce the placement runtime and improve the quality of final placement. We achieve this by allowing the DAST placer to tunnel energetically inaccessible regions of the potential solution space, adjusting the stochastic tunneling schedule adaptively by performing detrended fluctuation analysis, and selecting move types dynamically by a multi-modal scheme based on Gibbs sampling. A prototype annealing-based placer, called DAST, was developed as part of this paper. It targets the same computer-aided design flow as the standard versatile placement and routing (VPR) but replaces its original annealer with the DAST algorithm. Our experimental results using the benchmark suite and FPGA architecture file which comes with the Toronto VPR5 software package have shown a 18.3% reduction in runtime and a 7.2% improvement in critical-path delay over that of conventional VPR. LOGIC SYNTHESIS Maidee, P.; Bazargan, K., "Improvements on Efficiency and Efficacy of SPFD-Based Rewiring for LUT-Based Circuits" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621036&isn... Abstract: This paper proposes two set-of-pairs-of-functions-to-be-distinguished (SPFD)-based rewiring algorithms to be used in a multi-tier rewiring framework, which employs multiple rewiring techniques. The first algorithm has two unique features: 1) a satisfiability problem (SAT) instance was devised so that an unsuccessful rewiring can be identified very quickly, and 2) unlike binary decision diagram-based methods that require all pairs of SPFD, our algorithm uses a few SAT instances to perform rewiring for a given wire without explicitly enumerating all SPFDs. Experimental results show that the runtime of our algorithm is about three times faster than that of a conventional one under a simulated setting of such a framework and it scales well with the number of candidate wires considered. The efficacy of the framework can be further improved by the second proposed algorithm. The algorithm relies on a theory presented herein to allow adding a new wire outside of the restricted set of dominator nodes, a feature common in automatic-test-pattern-generation-based rewiring, but absent in existing SPFD-based ones. Although this algorithm may suffer from long runtimes in the same way conventional SPFD-based techniques do, experiments show that the number of wires which can be rewired increases 13% on average and the number of alternative wires also increases. MODELING AND SIMULATION Maricau, E.; Gielen, G., "Efficient Variability-Aware NBTI and Hot Carrier Circuit Reliability Analysis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621034&isn... Abstract: This paper discusses an efficient method to analyze the spatial and temporal reliability of analog and digital circuits. First, a SPICE-based reliability simulator with automatic step-size control is proposed. Both hot carrier degradation and negative bias temperature instability are included in the simulator. Next, a method to analyze the interaction between process variability effects and circuit aging is introduced. This method is based on a screening experimental design (DoE) succeeded by a set of regression DoEs, resulting in a good speed-accuracy tradeoff with a nearly linear complexity for all circuits under test. Finally, based on the DoE analysis, a circuit response surface model (RSM) is derived. The RSM is used for further circuit reliability analysis such as circuit weak spot detection and yield calculation as a function of circuit lifetime. The proposed method is validated over a broad range of both analog and digital circuits. Yield simulation time is reduced with up to three orders of magnitude, when compared to standard Monte Carlo-based techniques and while still maintaining simulation accuracy. Jaffari, J.; Anis, M., "Advanced Variance Reduction and Sampling Techniques for Efficient Statistical Timing Analysis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621042&isn... Abstract: The Monte-Carlo (MC) technique is a traditional solution for a reliable statistical analysis, and in contrast to probabilistic methods, it can account for any complicate model. However, a precise analysis that involves a traditional MC-based technique requires many simulation iterations, especially for the extreme quantile points. In this paper, advanced sampling and variance reduction techniques, along with applications for efficient digital circuit timing yield analysis, are studied. Three techniques are proposed: 1) an enhanced quasi-MC-based sampling which generates optimally low-discrepancy samples suitable for yield estimation of digital circuits; 2) an order-statistics based control-variate technique that improves the quality of the yield estimations, when a moderate number of samples is needed; and 3) a classical control-variate technique utilized for a variance-reduced critical delay's statistical moment estimation. This solution is shown to be effective even for a very low number of samples. Wang, J.; Singhee, A.; Rutenbar, R. A.; Calhoun, B. H., "Two Fast Methods for Estimating the Minimum Standby Supply Voltage for Large SRAMs" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621031&isn... Abstract: The data retention voltage (DRV) defines the minimum supply voltage for an SRAM cell to hold its state. Intra-die variation causes a statistical distribution of DRV for individual cells in a memory array. We present two fast and accurate methods to estimate the tail of the DRV distribution. The first method uses a new analytical model based on the relationship between DRV and static noise margin. The second method extends the statistical blockade technique to a recursive formulation. It uses conditional sampling for rapid statistical simulation and fits the results to a generalized Pareto distribution (GPD) model. Both the analytical DRV model and the generic GPD model show a good match with Monte Carlo simulation results and offer speedups of up to four or five orders of magnitude over Monte Carlo at the 6 sigma point. In addition, the two models show a very close agreement with each other at the tail up to 8 sigma. For error within 5% with a confidence of 95%, the analytical DRV model and the GPD model can predict DRV quantiles out to 8 sigma and 6.6 sigma respectively; and for the mean of the estimate, both models offer within 1% error relative to Monte Carlo at the 4 sigma point. PHYSICAL DESIGN Su, Y.-S.; Hon, W.-K.; Yang, C.-C.; Chang, S.-C.; Chang, Y.-J., "Clock Skew Minimization in Multi-Voltage Mode Designs Using Adjustable Delay Buffers" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621030&isn... Abstract: In synchronous circuit designs, clock skew is difficult to minimize because a single physical layout of a clock tree must satisfy multiple constraints in a complicated power mode environment where certain modules may operate with different voltages. In this paper, we use adjustable delay buffers (ADB) whose delays can be tuned or adjusted to minimize clock skew under different power modes. Assuming that the positions of k ADBs are already determined, we first propose a linear-time optimal algorithm which assigns the values of ADBs so that the skew is optimal among all possible ADB assignments with a possibility of latency penalty. Then, we propose a modified optimal algorithm without latency penalty. We also propose an efficient heuristic to determine good positions for ADBs. Our results show significant improvement when compared to cases without ADBs. Chang, Y.-J.; Lee, Y.-T.; Gao, J.-R.; Wu, P.-C.; Wang, T.-C., "NTHU-Route 2.0: A Robust Global Router for Modern Designs" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621046&isn... Abstract: This paper presents a robust global router called NTHU-Route 2.0 that improves the solution quality and runtime of NTHU-Route by the following enhancements: 1) a new history based cost function; 2) new ordering methods for congested region identification and rip-up and reroute; and 3) two implementation techniques. We report convincing experimental results to show the effectiveness of each individual enhancement. With all these enhancements together, NTHU-Route 2.0 solves all ISPD98 benchmarks with very good quality. Moreover, NTHU-Route 2.0 routes 7 of 8 ISPD07 benchmarks and 12 of 16 ISPD08 benchmarks without any overflow. Compared with other state-of-the-art global routers, NTHU-Route 2.0 is able to produce better solution quality and/or run more efficiently. Rajaram, A.; Pan, D. Z., "MeshWorks: A Comprehensive Framework for Optimized Clock Mesh Network Synthesis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621032&isn... Abstract: Clock mesh networks are well known for their variation tolerance. But their usage is limited to high-end designs due to the significantly high resource requirements compared to clock trees and the lack of automatic mesh synthesis tools. Most existing works on clock mesh networks either deal with semi-custom design or perform optimizations on a given clock mesh. However, the problem of obtaining a good initial clock mesh has not been addressed. Also, the problem of achieving a smooth tradeoff between variation tolerance and resource requirements has not been addressed adequately. In this paper, we present our MeshWorks framework, the first comprehensive automated framework for planning, synthesis, and optimization of clock mesh networks that addresses the above issues. Experimental results suggest that our algorithms can achieve an additional reduction of 31% in buffer area, 21% in wirelength, and 23% in power, compared to the best previous work, with similar worst case maximum frequency. We also demonstrate the effectiveness of our framework under several practical issues such as blockages, multiple clocks, uneven load distribution, and electromigration violations. SYSTEM-LEVEL DESIGN Dong, A.; Zhao, J.; Xie, Y., "Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621044&isn... Abstract: 3-D integration technology is emerging as an attractive alternative to increase the transistor count for future chips. The majority of the existing 3-D integrated circuit (IC) research is focused on the performance, power, density, and heterogeneous integration benefits offered by 3-D integration. All such advantages, however, ultimately have to translate into cost evaluation when a design strategy has to be decided. Consequently, system-level cost analysis at early design stages is imperative to decide on whether 3-D integration should be adopted. This paper presents a cost estimation method for 3-D ICs at early design stages and proposes a set of cost models that include wafer cost, 3-D bonding cost, package cost, and cooling cost. The proposed 3-D IC cost estimation method can help designers analyze the cost implication for 3-D ICs during the design space exploration at the early stage, and it enables a cost-driven 3-D IC design flow that can guide the design choice toward a cost-effective direction. Based on the proposed cost estimation method, this paper demonstrates two case studies that explore the cost benefits of 3-D integration for application-specific integrated circuit designs and many-core microprocessor designs style, respectively. Finally, this paper suggests the optimum partitioning strategy for future 3-D IC designs. Jafari, F.; Lu, Z.; Jantsch, A.; Yaghmaee, M. H., "Buffer Optimization in Network-on-Chip Through Flow Regulation" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621045&isn... Abstract: For network-on-chip (NoC) designs, optimizing buffers is an essential task since buffers are a major source of cost and power consumption. This paper proposes flow regulation and has defined a regulation spectrum as a means for system-on-chip architects to control delay and backlog bounds. The regulation is performed per flow for its peak rate and burstiness. However, many flows may have conflicting regulation requirements due to interferences with each other. Based on the regulation spectrum, this paper optimizes the regulation parameters aiming for buffer optimization. Three timing-constrained buffer optimization problems are formulated, namely, buffer size minimization, buffer variance minimization, and multiobjective optimization, which has both buffer size and variance as minimization objectives. Minimizing buffer variance is also important because it affects the modularity of routers and network interfaces. A realistic case study exhibits 62.8% reduction of total buffers, 84.3% reduction of total latency, and 94.4% reduction on the sum of variances of buffers. Likewise, the experimental results demonstrate similar improvements in the case of synthetic traffic patterns. The optimization algorithm has low run-time complexity, enabling quick exploration of large design spaces. This paper concludes that optimal flow regulation can be a highly valuable instrument for buffer optimization in NoC designs. Seiculescu, C.; Murali, S.; Benini, L.; De Micheli, G., "SunFloor 3D: A Tool for Networks on Chip Topology Synthesis for 3-D Systems on Chips" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621033&isn... Abstract: Three-dimensional integrated circuits (3D-ICs) are a promising approach to address the integration challenges faced by current systems on chips (SoCs). Designing an efficient network on chip (NoC) interconnect for a 3-D SoC that meets not only the application performance constraints but also the constraints imposed by the 3-D technology is a significant challenge. In this paper, we present a design tool, SunFloor 3D, to synthesize application-specific 3-D NoCs. The proposed tool determines the best NoC topology for the application, finds paths for the communication flows, assigns the network components to the 3-D layers, and places them in each layer. We perform experiments on several SoC benchmarks and present a comparative study between 3-D and 2-D NoC designs. Our studies show large improvements in interconnect power consumption (average of 38%) and delay (average of 13%) for the 3-D NoC when compared to the corresponding 2-D implementation. Our studies also show that the synthesized topologies result in large power (average of 54%) and delay savings (average of 21%) when compared to standard topologies. Ogras, U. Y.; Bogdan, P.; Marculescu, R., "An Analytical Approach for Network-on-Chip Performance Analysis" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621037&isn... Abstract: Networks-on-chip (NoCs) have recently emerged as a scalable alternative to classical bus and point-to-point architectures. To date, performance evaluation of NoC designs is largely based on simulation which, besides being extremely slow, provides little insight on how different design parameters affect the actual network performance. Therefore, it is practically impossible to use simulation for optimization purposes. In this paper, we present a mathematical model for on-chip routers and utilize this new model for NoC performance analysis. The proposed model can be used not only to obtain fast and accurate performance estimates, but also to guide the NoC design process within an optimization loop. The accuracy of our approach and its practical use is illustrated through extensive simulation results. TEST Jeong, W.; Lee, J.; Han, T.; Lee, K.; Kang, S., "An Advanced BIRA for Memories With an Optimal Repair Rate and Fast Analysis Speed by Using a Branch Analyzer" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621043&isn... Abstract: As memory capacity and density grow, a corresponding increase in the number of defects decreases the yield and quality of embedded memories for systems-on-chip as well as commodity memories. For embedded memories, built-in redundancy analysis (BIRA) is widely used to solve quality and yield issues by replacing faulty cells with healthy redundant cells. Many BIRA approaches require extra hardware overhead in order to achieve optimal repair rates, or they suffer a loss of repair rate in minimizing the hardware overhead. An innovative BIRA approach is proposed to achieve optimal repair rates, lower area overhead, and increase analysis speed. The proposed BIRA minimizes area overhead by eliminating some storage coverage for only must-repair faulty information. The proposed BIRA analyzes redundancies quickly and efficiently by evaluating all nodes of a branch in parallel with a new analyzer which is simple and easy-to-implement. Experimental results show that the proposed BIRA allows for a much faster analysis speed than that of the state-of-the-art BIRA, as well as the optimal repair rate, and relatively small area overhead. VERIFICATION Nocco, S.; Quer, S., "A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621035&isn... Abstract: The task graph cost-optimal scheduling problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of binate covering problems, hinged within a bounded model checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to minimum-cost satisfiability solvers or pseudo-Boolean optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tools. Short Papers ============ Jang, Y.; Kim, J.; Kyung, C.-M., "Topology Synthesis for Low Power Cascaded Crossbar Switches" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621040&isn... Abstract: Crossbar switch network has an increasing impact on such critical measures as throughput, latency, area, and power consumption of complex system-on-chip as technology scales to deep submicrometer. With high clock frequency crossbar switch network design, global wire delay and pipeline registers inserted for throughput are important because they affect area, frequency, and power consumption of the chip. The traffic congestion is also a very important factor because it leads to the lowering of throughput of the crossbar switch network. In this paper, we present a topology synthesis method for the low-power cascaded crossbar switch network satisfying the given bandwidth, latency, frequency, and area constraints. Unlike previous papers, our paper considers wire delay, traffic congestion, and pipeline register insertion at the same time. Experimental results show that the topologies optimized for power consumption for a given clock frequency consume less power than existing methods by up to 38.04%. Chen, F.-W.; Liu, Y.-Y., "Performance-Driven Dual-Rail Routing Architecture for Structured ASIC Design Style" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621038&isn... Abstract: In recent years, structured application-specific integrated circuit (ASIC) design style has lessened the importance of mask cost. Multiple structured ASIC chip designs share the same pre-fabricated device and wire masks. Nevertheless, the interconnection delay in a pre-fabricated wire slows down circuit performance as a result of high capacitive load. We propose a dual-rail routing architecture that reduces wire delay by 10% to 15% compared to the original routing architecture. Furthermore, we propose a dual-rail insertion algorithm to reduce routing area overhead. The experimental results demonstrate that our dual-rail technique reduces wire delay by 9.8% with 4.8% routing area overhead and improves overall circuit performance by 7.0%. Chun, S.; Orailoglu, A., "DiSC: A New Diagnosis Method for Multiple Scan Chain Failures" URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5621047&isn... Abstract: In scan-based testing environments, identifying the scan chain failures can be of significant help in guiding the failure analysis process for yield improvement. In this paper, we propose an efficient scan chain diagnosis method using a symbolic fault simulation to achieve high diagnostic resolution and small candidate list for single and multiple defects in scan chains. The main ideas of the proposed scan chain diagnosis method are twofold: 1) the reduction of the candidate scan cells through the analysis of the symbolic simulation responses, and 2) the identification of final candidate scan cells using the backward tracing method with the symbolic simulation responses. Experimental results show the effectiveness.