TCAD Newsletter - May 2011 Issue Placing you one click away from the best new CAD research! REGULAR PAPERS HIGH-LEVEL SYNTHESIS Lee, G. Choi, K. Dutt, N. D. Mapping Multi-Domain Applications Onto Coarse-Grained Reconfigurable Architectures http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752428 Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their performance and flexibility. However, their applications have been restricted to domains based on integer arithmetic since typical CGRAs support only integer arithmetic or logical operations. This paper introduces approaches to mapping applications onto CGRAs supporting both integer and floating-point arithmetic. After presenting an optimal formulation using integer linear programming, we present a fast heuristic mapping algorithm. Our experiments on randomly generated examples generate optimal mapping results using our heuristic algorithm for 97% of the examples within a few seconds. We observe similar results for practical examples from multimedia and 3-D graphics benchmarks. The applications mapped on a CGRA show up to 120 times performance improvement compared to software implementations, demonstrating the potential for application acceleration on CGRAs supporting floating- point operations. LOGIC SYNTHESIS Yang, Y.-S. Sinha, S. Veneris, A. Brayton, R. K. Automating Logic Transformations With Approximate SPFDs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752413 During the very large scale integration design process, a synthesized design is often required to be modified in order to accommodate different goals. To preserve the engineering effort already invested, designers seek small logic structural transformations to achieve these logic restructuring goals. This paper proposes a systematic methodology to devise such transformations automatically. It first presents a simulation-based formulation to approximate sets of pairs of functions to be distinguished and avoid the memory/time explosion issue inherent with the original representation. Then, it uses this new data structure to devise the required transformations dynamically without the need of a static dictionary model. The methodology is applied to both combinational and sequential designs with transformations at a single or multiple locations. An extensive suite of experiments documents the benefits of the proposed methodology when compared to existing practices. Gowda, T. Vrudhula, S. Kulkarni, N. Berezowski, K. Identification of Threshold Functions and Synthesis of Threshold Networks http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752412 This paper presents a new and efficient heuristic procedure for determining whether or not a given Boolean function is a threshold function, when the Boolean function is given in the form of a decision diagram. The decision diagram based method is significantly different from earlier methods that are based on solving linear inequalities in Boolean variables that derived from truth tables. This method's success depends on the ordering of the variables in the binary decision diagram (BDD). An alternative data structure, and one that is more compact than a BDD, called a max literal factor tree (MLFT) is introduced. An MLFT is a particular type of factoring tree and was found to be more efficient than a BDD for identifying threshold functions. The threshold identification procedure is applied to the MCNC benchmark circuits to synthesize threshold gate networks. Cui, A. Chang, C.-H. Tahar, S. Abdel-Hamid, A. T. A Robust FSM Watermarking Scheme for IP Protection of Sequential Circuit Design http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752433 Finite state machines (FSMs) are the backbone of sequential circuit design. In this paper, a new FSM watermarking scheme is proposed by making the authorship information a non-redundant property of the FSM. To overcome the vulnerability to state removal attack and minimize the design overhead, the watermark bits are seamlessly interwoven into the outputs of the existing and free transitions of state transition graph (STG). Unlike other transition-based STG watermarking, pseudo input variables have been reduced and made functionally indiscernible by the notion of reserved free literal. The assignment of reserved literals is exploited to minimize the overhead of watermarking and make the watermarked FSM fallible upon removal of any pseudo input variable. A direct and convenient detection scheme is also proposed to allow the watermark on the FSM to be publicly detectable. Experimental results on the watermarked circuits from the ISCAS'89 and IWLS'93 benchmark sets show lower or acceptably low overheads with higher tamper resilience and stronger authorship proof in comparison with related watermarking schemes for sequential functions. MODELING AND SIMULATION Ghani, N. H. A. Najm, F. N. Fast Vectorless Power Grid Verification Under an RLC Model http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752432 As part of early system design, one must verify that the power grid provides the underlying logic circuitry with voltage levels that are within specified ranges. In this paper, we describe a vectorless verification approach that can be applied early in the design process. We adopt an RLC model of the grid in the framework of current constraints that capture uncertainty about circuit details and activity. With just a few linear programs and one linear system solve, our proposed approach provides tight conservative bounds on the maximum and minimum worst-case voltage drops at every node of the grid. Results show the accuracy and speed of our technique thus making it practical and scalable. PHYSICAL DESIGN Papadopoulou, E. Net-Aware Critical Area Extraction for Opens in VLSI Circuits Via Higher- Order Voronoi Diagrams http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752411 We address the problem of computing critical area for open faults (opens) in a circuit layout in the presence of multilayer loops and redundant interconnects. The extraction of critical area is the main computational bottleneck in predicting the yield loss of a very large scale integrated design due to random manufacturing defects. We first model the problem as a geometric graph problem and we solve it efficiently by exploiting its geometric nature. To model open faults, we formulate a new geometric version of the classic min-cut problem in graphs, termed the geometric min-cut problem. Then the critical area extraction problem gets reduced to the construction of a generalized Voronoi diagram for open faults, based on concepts of higher order Voronoi diagrams. The approach expands the Voronoi critical area computation paradigm with the ability to accurately compute critical area for missing material defects even in the presence of loops and redundant interconnects spanning over multiple layers. The generalized Voronoi diagrams used in the solution are combinatorial structures of independent interest. Huang, T. Li, L. Young, E. F. Y. On the Construction of Optimal Obstacle-Avoiding Rectilinear Steiner Minimum Trees http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752407 This paper presents an efficient method to solve the obstacle-avoiding rectilinear Steiner tree (OARSMT) problem optimally. Our work is developed based on the GeoSteiner approach in which full Steiner trees (FSTs) are first constructed and then combined into a rectilinear Steiner minimum tree (RSMT). We modify and extend the algorithm to allow obstacles in the routing region. For each routing obstacle, we first introduce four virtual terminals located at its four corners. We then give the definition of FSTs with blockages and prove that they will follow some very simple structures. Based on these observations, a two-phase approach is developed for the construction of OARSMTs. In the first phase, we generate a set of FSTs with blockages. In the second phase, the FSTs generated in the first phase are used to construct an OARSMT. Finally, experiments on several benchmarks are conducted. Results show that the proposed method is able to handle problems with hundreds of terminals in the presence of multiple obstacles, generating an optimal solution in a reasonable amount of time. Zhao, X. Lewis, D. L. Lee, H.-H. S. Lim, S. K. Low-Power Clock Tree Design for Pre-Bond Testing of 3-D Stacked ICs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752408 Pre-bond testing of 3-D stacked integrated circuits (ICs) involves testing each individual die before bonding. The overall yield of 3-D ICs improves with pre-bond testability because manufacturers can avoid stacking defective dies with good ones. However, pre-bond testability presents unique challenges to 3-D clock tree design. First, each die needs a complete 2-D clock tree to enable pre-bond test. Second, the entire 3-D stack needs a complete 3-D clock tree for post-bond test and operation. In the case of a two-die stack, a straightforward solution is to have two complete 2-D clock trees connected with a single through- silicon-via (TSV). We show that this solution suffers from long wirelength (WL) and high clock power consumption. Our algorithm improves on this solution, minimizes the overall WL and clock power consumption, and provides both pre-bond testability and post-bond operability with minimum skew and constrained slew. Compared with the single-TSV solution, SPICE simulation results show that our multi- TSV approach significantly reduces the clock power by up to 15.9% for two-die and 29.7% for four-die stacks. In addition, the WL is reduced by up to 24.4% and 42.0%. Ren, H. Dutt, S. Effective Power Optimization Under Timing and Voltage-Island Constraints Via Simultaneous Vdd, Vth Assignments, Gate Sizing, and Placement http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752436 We present a physical-synthesis based power optimization technique that simultaneously explores the four transforms, multiple Vdd, multiple Vth , cell sizing, and placement, to find a minimum power solution under timing and other constraints. The optimal selection of the design options of all transforms for all cells in the circuit is solved using a new optimization technique called discretized network flow that we have recently developed. Among the constraints we consider, timing and the voltage-island constraints are the two most important and complex ones. The voltage-island constraint specifies the maximum allowed number of voltage islands in the layout, and the requirement that each island be rectangular. We develop an approach that along with the option selection process can simultaneously determine the voltage islands needed, as well as satisfy all given constraints. Experimental results on ISCAS'89 and Faraday benchmarks show that compared to an initial wire length (WL)-optimized placement with high supply and low threshold voltage levels, we obtain a power reduction by up to 42% and an average of 30% for the same delay as that of the initial design. These improvements are also 44? 50% relatively better than the improvements yielded by sequentially applying the four power reduction transforms, which is the currently standard method for applying multiple transforms. Finally, compared to an industry tool Synopsys IC Compiler (ICC) that also applies all four transforms, our method reduces power by an additional amount of up to 19%, and an average of 16%. SYSTEM-LEVEL DESIGN Mintarno, E. Skaf, J. Zheng, R. Velamala, J. B. Cao, Y. Boyd, S. Dutton, R. W. Mitra, S. Self- Tuning for Maximized Lifetime Energy-Efficiency in the Presence of Circuit Aging http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752409 This paper presents an integrated framework, together with control policies, for optimizing dynamic control of self-tuning parameters of a digital system over its lifetime in the presence of circuit aging. A variety of self-tuning parameters such as supply voltage, operating clock frequency, and dynamic cooling are considered, and jointly optimized using efficient algorithms described in this paper. Our optimized self-tuning approach satisfies performance constraints at all times, and maximizes a lifetime computational power efficiency (LCPE) metric, which is defined as the total number of clock cycles achieved over lifetime divided by the total energy consumed over lifetime. We present three control policies: 1) progressive-worst-case-aging (PWCA), which assumes worst-case aging at all times; 2) progressive-on-state-aging (POSA), which estimates aging by tracking active/sleep modes, and then assumes worst-case aging in active mode and long recovery effects in sleep mode; and 3) progressive- real-time-aging-assisted (PRTA), which acquires real-time information and initiates optimized control actions. Various flavors of these control policies for systems with dynamic voltage and frequency scaling (DVFS) are also analyzed. Simulation results on benchmark circuits, using aging models validated by 45 nm measurements, demonstrate the effectiveness and practicality of our approach in significantly improving LCPE and/or lifetime compared to traditional one-time worst-case guardbanding. We also derive system design guidelines to maximize self-tuning benefits. Palesi, M. Ascia, G. Fazzino, F. Catania, V. Data Encoding Schemes in Networks on Chip http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752410 An ever more significant fraction of the overall power dissipation of a network-on-chip (NoC) based system-on-chip (SoC) is due to the interconnection system. In fact, as technology shrinks, the power contribute of NoC links starts to compete with that of NoC routers. In this paper, we propose the use of data encoding techniques as a viable way to reduce both power dissipation and energy consumption of NoC links. The proposed encoding scheme exploits the wormhole switching techniques and works on an end-to-end basis. That is, flits are encoded by the network interface (NI) before they are injected in the network and are decoded by the destination NI. This makes the scheme transparent to the underlying network since the encoder and decoder logic is integrated in the NI and no modification of the routers architecture is required. We assess the proposed encoding scheme on a set of representative data streams (both synthetic and extracted from real applications) showing that it is possible to reduce the power contribution of both the self-switching activity and the coupling switching activity in inter-routers links. As results, we obtain a reduction in total power dissipation and energy consumption up to 37% and 18%, respectively, without any significant degradation in terms of both performance and silicon area. SHORT PAPERS Kavousianos, X. Chakrabarty, K. Generation of Compact Stuck-At Test Sets Targeting Unmodeled Defects http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5752435 This letter presents a new method to generate compact stuck-at test sets that offer high defect coverage. The proposed method first selects the most effective patterns from a large $N$-detect repository, by using a new output deviation-based metric. Then it embeds complete coverage of stuck-at faults within these patterns, and uses the proposed metric to further improve their defect coverage. Results show that the proposed method outperforms a recently proposed competing approach in terms of unmodeled defect coverage. In many cases, higher defect coverage is obtained even than much larger N-detect test sets for several values of N. Finally, results provide the insight that, instead of using N-detect testing with as large N as possible, it is more efficient to combine the output deviations metric with multi-detect testing to get high-quality, compact test sets.