November 2013 Newsletter Placing you one click away from the best new CAD research! Plain-text version at http://www.umn.edu/~tcad/newsletter/2013-11.txt REGULAR PAPERS ANALOG, MIXED-SIGNAL, AND RF CIRCUITS Martins, R. ; Lourenco, N. ; Horta, N. LAYGEN II-Automatic Layout Generation of Analog Integrated Circuits http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634591 This paper describes an innovative design automation tool, LAYGEN II, for analog integrated circuit (IC) layout generation based on template descriptions and on evolutionary computation techniques. LAYGEN II was developed giving special emphasis to the reusability of expert knowledge and to the efficiency of retargeting operations. The designer specifies the sized circuit-level structure, the required technology and also, the layout template consisting of technology and specification independent high-level layout guidelines. For placement, the topological relations present in the template are extracted to a nonslicing B*-tree layout representation, and the tool automatically merges devices and improves the floorplan quality. For routing an optimization kernel consisting of a tailored version of the multiobjective multiconstraint evolutionary algorithm NSGA-II is used. The Router optimizes all nets simultaneously and uses a built-in engine to evaluate each of the layout solutions. The automatic layout generation is demonstrated here using the LAYGEN II tool for typical analog circuit structures, and the results in GDSII format were validated using the industrial grade verification tool Calibre¨. EMERGING TECHNOLOGIES Chang, J.-W. ; Yeh, S.-H. ; Huang, T.-W. ; Ho, T.-Y. An ILP-Based Routing Algorithm for Pin-Constrained EWOD Chips With Obstacle Avoidance http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634553 Electrowetting-on-dielectric (EWOD) chips have become the most popular actuators, particularly for droplet-based digital microfluidic biochip (DMFB) systems. In order to enable the electrical manipulations, wire routing is a key problem in designing EWOD chips. Unlike traditional very-large-scale-integration (VLSI) routing problems, in addition to routing-path establishment on signal pins, the pin-constrained EWOD-chip routing problem must address the issue of signal sharing for pin-count reduction under a practical constraint posed by a limited pin-count supply. Moreover, EWOD-chip designs might incur several obstacles in the routing region due to embedded devices for specific fluidic protocols. However, no existing work considers the EWOD-chip routing with obstacles and, therefore, lots of manual design efforts are involved. To remedy this insufficiency, we propose in this paper the first routing algorithm for pin-constrained EWOD chips with obstacle avoidance. The proposed algorithm, based on effective integer-linear-programming (ILP) formulation as well as efficient routing framework, can achieve high routability with a low design complexity. Experimental results based on real-life chips with obstacles demonstrate the high routability of proposed algorithm for pin-constrained EWOD chips with obstacle avoidance. FPGAS AND RECONFIGURABLE COMPUTING Mehta, G. ; Patel, K.K. ; Parde, N. ; Pollard, N.S. Data-Driven Mapping Using Local Patterns http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634546 The problem of mapping a data flow graph onto a reconfigurable architecture has been difficult to solve quickly and optimally. Anytime algorithms have the potential to meet both goals by generating a good solution quickly and improving that solution over time, but they have not been shown to be practical for mapping. The key insight into this paper is that mapping algorithms based on search trees can be accelerated using a database of examples of high quality mappings. The depth of the search tree is reduced by placing patterns of nodes rather than single nodes at each level. The branching factor is reduced by placing patterns only in arrangements present in a dictionary constructed from examples. We present two anytime algorithms that make use of patterns and dictionaries: Anytime A* and Anytime Multiline Tree Rollup. We compare these algorithms to simulated annealing and to results from human mappers playing the online game UNTANGLED. The anytime algorithms outperform simulated annealing and the best game players in the majority of cases, and the combined results from all algorithms provide an informative comparison between architecture choices. MODELING AND SIMULATION Brachtendorf, H.G. ; Bittner, K. Grid Size Adapted Multistep Methods for High Oscillators http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634587 The numerical calculation of the limit cycle of oscillators with resonators exhibiting a high-quality factor Q such as quartz crystals is a difficult task in the time domain. Time domain integration formulas, when not carefully selected, introduce numerical damping that leads to erroneous limit cycles or spurious oscillations. A novel class of adaptive multistep integration formulas based on finite difference (FD) schemes is derived, which circumvent the aforementioned problems. The results are compared with the well-known harmonic balance (HB) technique. Moreover, the range of absolute stability is derived for these methods. The resulting discretized system by FD methods is sparser than that of HB and, therefore, easier to solve and easier to implement. Jung, M. ; Pan, D.Z. ; Lim, S.K. Chip/Package Mechanical Stress Impact on 3-D IC Reliability and Mobility Variations http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634536 In this paper, we propose a fast and accurate chip/package thermomechanical stress co-analysis tool for through- silicon-via (TSV)-based 3-D ICs. We use our tool for full-stack mechanical reliability as well as stress-aware timing analyses. First, we analyze the stress induced by chip/package interconnect elements, i.e., TSV, micro-bump, and package bump. Second, we explore and validate the principle of lateral and vertical linear superposition of stress tensors (LVLS), considering all chip/package elements. The proposed LVLS method greatly reduces the complexity of stress calculation compared with the conventional finite element analysis method with high enough accuracy for full-chip/package-scale stress simulations and reliability analysis. In addition, we build hole and electron mobility variation maps based on LVLS. Finally, we study the mechanical reliability issues and provide full-stack timing analysis results in practical 3-D chip/package designs including wide-I/O and block-level 3-D ICs. PHYSICAL DESIGN Cortadella, J. Area-Optimal Transistor Folding for 1-D Gridded Cell Design http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634571 The 1-D design style with gridded design rules is gaining ground for addressing the printability issues in subwavelength photolithography. One of the synthesis problems in cell generation is transistor folding, which consists of breaking large transistors into smaller ones (legs) that can be placed in the active area of the cell. In the 1-D style, diffusion sharing between differently sized transistors is not allowed, thus implying a significant area overhead when active areas with different sizes are required. This paper presents a new formulation of the transistor folding problem in the context of 1-D design style and a mathematical model that delivers area-optimal solutions. The mathematical model can be customized for different variants of the problem, considering flexible transistor sizes and multiple-height cells. An innovative feature of the method is that area optimality can be guaranteed without calculating the actual location of the transistors. The model can also be enhanced to deliver solutions with good routability properties. Chang, H.-Y. ; Jiang, I.H.-R. ; Chang, Y.-W. ECO Optimization Using Metal-Configurable Gate-Array Spare Cells http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634578 Due to the rapidly increasing design complexity in modern IC designs, metal-only engineering change order (ECO) becomes inevitable to achieve design closure with a low respin cost. Traditionally, preplaced redundant standard cells are regarded as spare cells. However, these cells are limited by predefined functionalities and locations, and they always consume leakage power despite their inputs being tied off. To overcome the inflexibility and power overhead, a new type of spare cells, called metal-configurable gate-array spare cells, are introduced. In this paper, we address a new ECO problem, which performs design changes using metal-configurable gate-array spare cells. We first study the properties of this new ECO problem and propose a new cost metric, aliveness, to model the capability of a spare gate array. Based on aliveness and routability, we then develop two ECO optimization frameworks, one for timing ECO and the other for functional ECO. Experimental results show that our approach delivers superior efficiency and effectiveness. Sai, M.P.D. ; Yu, H. ; Shang, Y. ; Tan, C.S. ; Lim, S.K. Reliable 3-D Clock-Tree Synthesis Considering Nonlinear Capacitive TSV Model With Electrical-Thermal-Mechanical Coupling http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634568 A robust physical design of 3-D IC requires investigation on through-silicon via (TSV). The large temperatures and stress gradients can severely affect TSV delay with large variation. The traditional physical model treats TSV as a resistor with linear electrical-thermal dependence, which ignores the fundamental device physics. In this paper, a physics-based electrical-thermalÐmechanical delay model is developed for signal TSVs in 3-D IC. With consideration of liner material and also stress, a nonlinear model is established between electrical delay with temperature and stress. Moreover, sensitivity analysis is performed to relate the reduction of temperature and stress gradients with respect to dummy TSVs insertion. Taking the design of 3-D clock tree as a case study, we have formulated a nonlinear optimization problem for clock-skew reduction. By allocating dummy TSVs to reduce the temperature and stress gradients, the clock skew introduced by signal TSVs and drivers can be minimized. A number of 3-D clock-tree benchmarks are utilized in experiments. We have observed that with the use of dummy TSV insertion, clock skew can be reduced by 61.3% on average when the accurate nonlinear electrical-thermalÐ mechanical delay model is applied. SYSTEM-LEVEL DESIGN Lee, J. ; Chung, M.-K. ; Cho, Y.-G. ; Ryu, S. ; Ahn, J.H. ; Choi, K. Mapping and Scheduling of Tasks and Communications on Many-Core SoC Under Local Memory Constraint http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634549 There has been extensive research on mapping and scheduling tasks on a many-core SoC. However, none considers the optimization of communication types, which can significantly affect performance, energy consumption, and local memory usage of the SoC. This paper presents an approach to automatic mapping and scheduling of tasks and communications on a many-core SoC. The key idea is to decide the type of each communication between message passing and shared memory when we do the mapping and scheduling. By assigning a proper type to each communication, we can optimize the energy consumption, performance, or energy-delay product. To solve the optimization problem, the approach adopts a probabilistic algorithm coupled with some heuristics. To enhance throughput of the system, it performs software pipelined scheduling of the tasks using a modified iterative modulo scheduling technique. Experiments show that our algorithm achieves on average 50.1% lower energy consumption, 21.0% higher throughput, and 64.9% lower energy- delay product, compared to shared memory only communication. TEST Sismanoglou, P. ; Nikolos, D. Input Test Data Compression Based on the Reuse of Parts of Dictionary Entries: Static and Dynamic Approaches http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634582 In this paper, we present a new test data compression method for intellectual property (IP) cores testing, based on the reuse of parts of dictionary entries. Two approaches are investigated: the static and the dynamic. In the static approach, the content of the dictionary is constant during the testing of a core, while in the dynamic approach the testing of a core consists of several test sessions and the content of the dictionary is different during each test session. The efficiency of the proposed method is supported with extensive simulation results and comparisons to already known test data compression methods suitable for IP cores testing. Janicki, J. ; Kassab, M. ; Mrugalski, G. ; Mukherjee, N. ; Rajski, J. ; Tyszer, J. Test Time Reduction in EDT Bandwidth Management for SoC Designs http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634561 This paper presents novel methods of reducing test time and enhancing test compression for system-on-chip (SoC) designs armed with embedded deterministic test (EDT)-based compression logic. The ability of the proposed scheme to improve the encoding efficiency and test compression, while reducing test application time, is accomplished by appropriate selecting and laying out automatic test equipment channel injectors of every single core EDT-based decompressor as well as appropriate bandwidth management of the entire test procedure combined with new control data optimization techniques. The efficacy of the proposed scheme is validated through experiments on several industrial SoC designs and is reported herein. VERIFICATION Karfa, C. ; Banerjee, K. ; Sarkar, D. ; Mandal, C. Verification of Loop and Arithmetic Transformations of Array-Intensive Behaviors http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634544 Loop transformation techniques along with arithmetic transformations are applied extensively on array and loop intensive behaviors in design of area/energy efficient systems in the domain of multimedia and signal processing applications. Ensuring correctness of such transformations is crucial for the reliability of the designed systems. In this paper, array data dependence graphs (ADDGs) are used to represent both the input and the transformed behaviors and the correctness of the transformations is ensured by proving equivalence of the two ADDGs. A slice- based equivalence checking method of ADDGs is proposed for this purpose. The method relies on the normalization of arithmetic expressions and some simplification rules to handle arithmetic transformations. Unlike many other reported techniques, our method is strong enough to handle several arithmetic transformations along with all kinds of loop transformations. Correctness and complexity of the method have been dealt with. Experimental results on several test cases demonstrate the effectiveness of the method. Hazra, A. ; Mukherjee, R. ; Dasgupta, P. ; Pal, A. ; Harer, K.M. ; Banerjee, A. ; Mukherjee, S. POWER-TRUCTOR: An Integrated Tool Flow for Formal Verification and Coverage of Architectural Power Intent http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634589 With the growing complexity and gradually shrinking power requirements in the system-on-chip designs, sophisticated global power management policies (which orchestrate the switching between power states of multiple power domains) are commonplace. Recent research has paved some novel ways to verify the sophisticated on-chip architectural power management decisions and analyze the verification coverage. However, one of the primary challenges in verifying such power management architectures stems from the mixed implementation of such strategies, where the local power controllers are in hardware and the global power management is implemented in software/firmware. There has been lack of effort to build a unified and automated framework for power intent verification and coverage analysis for generic power management logics. This paper tries to develop an end-to-end automated framework enabled by a tool named POWER-TRUCTOR for power intent validation. SHORT PAPERS Huang, L. ; Wang, Z. ; Xiao, N. ; Wang, Y. ; Dou, Q. Dynamic Streamization Model Execution for SIMD Engines on Multicore Architectures http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634570 This paper proposes dynamic streamization model execution (DSME), a dynamic vectorization technique for single instruction multiple data (SIMD) engines on multicore architectures. The technique uses stream model as intermediate representation for programs to optimize the combination of computation and memory accesses of SIMD engines in general-purpose (GP) designs. DSME allows the dynamic placement of computations on different cores when they are not in use to utilize multiple SIMD engines. This study also discusses hardware extensions to existing GP processor designs as well as related compiler extensions that use the special hardware components. Our extensive experiments demonstrate that performance gains of DSME can be achieved. Prakash, A. ; Patel, H.D. An Instruction Scratchpad Memory Allocation for the Precision Timed Architecture http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634576 This paper presents a static instruction scratchpad memory allocation scheme for the precision timed architecture (PRET). Since PRET provides timing instructions to control the temporal execution of programs, the objective of the allocation scheme is to ensure that the explicitly specified temporal requirements are met. Furthermore, this allocation incorporates the timing requirements from the multiple hardware threads of the PRET architecture. We formulate the allocation problem as an integer-linear programming problem, and we implement a tool that takes compiled ARMv4 binaries, constructs a timing-requirements-aware control-flow graph, performs a WCET analysis and SPM allocation, and rewrites the binaries with the allocation. We evaluate our approach using a modified version of the Malardalen benchmarks to show the benefits of the proposed approach. We also present a UAV benchmark derived from the PapaBench benchmark. Kundu, S. ; Pal, S. ; Chattopadhyay, S. ; Sengupta, I. ; Kapur, R. A Metric for Test Set Characterization and Customization Toward Fault Diagnosis http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634573 This paper introduces a new metric to characterize test sets in terms of their diagnostic power. Our method uses much less space compared to the existing ones and is quite accurate. The metric can be utilized to increase the diagnosability of incompletely specified test sets via don't care filling. The X-filling approach can be integrated with test pattern generation tools to aid in better diagnostic pattern set generation. Teodorovic, P. ; Dautovic, S. ; Malbasa, V. Recursive Boolean Formula Minimization Algorithms for Implication Logic http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6634562 In this paper are given two novel algorithms for minimization of recursive Boolean formula (RBF), which is adequate for implementation of N-input 1-output Boolean functions (BFs) over basis imply, false using two memristors. Both of our algorithms are direct consequence of necessary and sufficient conditions related to regular ordering of positive product terms within recursive formula. The results demonstrate how developed algorithms provide up to 26% gain in average number of implications and shorter recursive Boolean formula length in up to 77% of problem instances than previously published algorithms, tested on the set of all 4-input 1-output BFs.