December 2011 Newsletter 
Placing you one click away from the best new CAD research!
Plain-text version at http://www.umn.edu/~tcad/newsletter/2011-12.txt 

REGULAR PAPERS

EMBEDDED SYSTEMS

Yang, S.  Khursheed, S.  Al-Hashimi, B. M.  Flynn, D.  Idgunji, S. Reliable
State Retention-Based Embedded Processors Through Monitoring and Recovery 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071090

State retention power gating and voltage-scaled state retention are two
effective design techniques, commonly employed in embedded processors, for
reducing idle circuit leakage power. This paper presents a methodology for
improving the reliability of embedded processors in the presence of power
supply noise and soft errors. A key feature of the method is low cost, which is
achieved through reuse of the scan chain for state monitoring, and it is
effective because it can correct single and multiple bit errors through
hardware and software, respectively. To validate the methodology, ARM¨
Cortexª-M0 embedded microprocessor (provided by our industrial project partner)
is implemented in field-programmable gate array and further synthesized using
65-nm technology to quantify the cost in terms of area, latency, and energy. It
is shown that the proposed methodology has a small area overhead (8.6%) with
less than 4% worst-case increase in critical path and is capable of detecting
and correcting both single bit and multibit errors for a wide range of fault
rates.

EMERGING TECHNOLOGIES

Huang, T.-W.  Yeh, S.-Y.  Ho, T.-Y. A Network-Flow Based Pin-Count Aware
Routing Algorithm for Broadcast-Addressing EWOD Chips 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071082

Electrowetting-on-dielectric (EWOD) chips have emerged as the most widely used
actuators for digital microfluidic (DMF) systems. These devices enable the
electrical manipulation of microfluidics with various advantages, such as low
power consumption, flexibility, accuracy, and efficiency. In addressing the
need for low-cost and practical fabrication, pin-count reduction has become a
key problem to the large-scale integration of EWOD-chip designs.  One of the
major approaches, broadcast addressing, reduces the pin count by assigning a
single control pin to multiple electrodes with mutually compatible control
signals. Most previous studies utilize this addressing scheme by scheduling
fluidic-level synthesis on pin-constrained chip arrays. However, the associated
interconnect routing problem is still not provided in currently available DMF
automations, and thus the broadcast-addressing scheme cannot be actually
realized. In this paper, we present the first network-flow based pin-count
aware routing algorithm for EWOD-chip designs with a broadcast
electrode-addressing scheme. Our algorithm simultaneously takes pin- count
reduction and wirelength minimization into consideration for higher integration
and better design performance. Experimental results show the effectiveness and
scalability of our algorithm on a set of real-life chip applications.

FPGAs AND RECONFIGURABLE COMPUTING

Kim, K.  Shin, S.  Kang, S.-M. Field Programmable Stateful Logic Array 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071085

Recently, researchers have demonstrated that memristive switches can be used to
implement logic and latches as well as memory and programmable interconnects.
In this paper, we propose a novel stateful logic pipeline architecture based on
memristive switches. The proposed architecture mapped to the field programmable
nanowire interconnect fabric produces a field programmable stateful logic
array, in which general-purpose computation functions can be implemented by
configuring only nonvolatile nanowire crossbar switches. CMOS control switches
are used to isolate stateful logic units so that multiple operations can be
executed in parallel. Since basic operation of the stateful logic, namely,
material implication, cannot fan out, a new basic AND operation which can
duplicate output is proposed. The basic unit of the proposed architecture is
designed to execute multiple basic operations concurrently in a step so that
each basic unit implements a large fan-in OR or NOR gate. The fine-grain
ultradeep constant-throughput pipeline properties pose new design automation
problems. We address some of the issues, in particular logic representation
using OR-inverter graphs, two-level optimization synthesis strategy, data
synchronization with data forwarding, stall-free pipelined finite state
machines, and constraints for synthesis and mapping onto the fabric.

MODELING AND SIMULATION

Zhang, W.  Li, X.  Liu, F.  Acar, E.  Rutenbar, R. A.  Blanton, R. D. Virtual
Probe: A Statistical Framework for Low-Cost Silicon Characterization of
Nanoscale Integrated Circuits 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071091

In this paper, we propose a new technique, referred to as virtual probe (VP),
to efficiently measure, characterize, and monitor spatially-correlated
inter-die and/or intra-die variations in nanoscale manufacturing process. VP
exploits recent breakthroughs in compressed sensing to accurately predict
spatial variations from an exceptionally small set of measurement data, thereby
reducing the cost of silicon characterization. By exploring the underlying
sparse pattern in spatial frequency domain, VP achieves substantially lower
sampling frequency than the well-known Nyquist rate. In addition, VP is
formulated as a linear programming problem and, therefore, can be solved both
robustly and efficiently. Our industrial measurement data demonstrate the
superior accuracy of VP over several traditional methods, including 2-D
interpolation, Kriging prediction, and k-LSE estimation.

Ionutiu, R. I. J.  Rommes, J.  Schilders, W. H. A. SparseRC: Sparsity
Preserving Model Reduction for RC Circuits With Many Terminals 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071083

A novel model order reduction (MOR) method, SparseRC, for multiterminal RC
circuits is proposed. Specifically tailored to systems with many terminals,
SparseRC employs graph-partitioning and fill-in reducing orderings to improve
sparsity during model reduction, while maintaining accuracy via moment
matching. The reduced models are easily converted to their circuit
representation. These contain much fewer nodes and circuit elements than
otherwise obtained with conventional MOR techniques, allowing faster
simulations at little accuracy loss.

PHYSICAL DESIGN

Ozdal, M. M.  Hentschke, R. F. An Algorithmic Study of Exact Route Matching for
Integrated Circuits 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071088

As system-on-chip designs are getting more popular, the importance of design
automation for analog and mixed- signal integrated circuits is increasing. In
this paper, we study the problem of exact route matching, which is an important
physical design constraint commonly imposed on specific analog signals for the
purpose of correct functionality. For this, we first propose a mathematical
formulation that models the route matching problem exactly.  Based on this
formulation, we derive important theoretical conclusions, and propose
dynamic-programming and heuristic search algorithms to solve the min-cost route
matching problem. We also discuss various practical considerations related to
this problem. Our experimental results show the effectiveness of our
algorithms.

Chuang, Y.-L.  Kim, S.  Shin, Y.  Chang, Y.-W. Pulsed-Latch Aware Placement for
Timing-Integrity Optimization 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071081

Utilizing pulsed-latches in circuit designs is one emerging solution to timing
improvements. Pulsed-latches, driven by a brief clock signal generated from
pulse generators, possess superior design parameters over flip-flops. If the
pulse generator and pulsed-latches are not placed properly, however,
pulse-width degradations at pulsed-latches and thus timing violations might
occur. In this paper, we present a unified placement framework for
pulsed-latches to maintain the timing integrity. Our new placer has the
following distinguished features: 1) a multilevel analytical placement
framework to effectively prevent the potential pulse-width distortion problem;
2) a physical-location aware pulse-generator insertion algorithm to identify
each desired group of a pulse generator and latches; and 3) a new optimization
gradient for global placement to consider the impact of load capacitance of
generators.  Experimental results show that our placement flow can effectively
consider pulse-width integrity and thus achieve much smaller total/worst
negative slacks with marginal wirelength overheads, compared to a leading
commercial and an academic placement flows.

Lin, M. P.-H.  Hsu, C.-C.  Chang, Y.-T Post-Placement Power Optimization With
Multi-Bit Flip-Flops 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071086

Optimization for power is always one of the most important design objectives in
modern nanometer integrated circuit design. Recent studies have shown the
effectiveness of applying multi-bit flip-flops to save the power consumption of
the clock network. This paper presents: 1) a novel design methodology of
applying multi-bit flip- flops at the post-placement stage, which can be
seamlessly integrated in modern design flow; 2) a new problem formulation for
post-placement optimization with multi-bit flip-flops; 3) flip-flop clustering
and placement algorithms to simultaneously minimize flip-flop power consumption
and interconnecting wirelength; and 4) a progressive window-based optimization
technique to reduce placement deviation and improve runtime efficiency of our
algorithms. Experimental results show that our algorithms are very effective in
reducing not only flip-flop power consumption but also clock tree and signal
net wirelength. Consequently, the power consumption of the clock network is
minimized.

SYSTEM-LEVEL DESIGN

Sabry, M. M.  Coskun, A. K.  Atienza, D.  Rosing, T. S.  Brunschwiler, T.
Energy-Efficient Multiobjective Thermal Control for Liquid-Cooled 3-D Stacked
Architectures 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071089

3-D stacked systems reduce communication delay in multiprocessor
system-on-chips (MPSoCs) and enable heterogeneous integration of cores,
memories, sensors, and RF devices. However, vertical integration of layers
exacerbates temperature-induced problems such as reliability degradation.
Liquid cooling is a highly efficient solution to overcome the accelerated
thermal problems in 3-D architectures; however, it brings new challenges in
modeling and run-time management for such 3-D MPSoCs with multitier liquid
cooling. This paper proposes a novel design-time/run-time thermal management
strategy. The design-time phase involves a rigorous thermal impact analysis of
various thermal control variables. We then utilize this analysis to design a
run-time fuzzy controller for improving energy efficiency in 3-D MPSoCs through
liquid cooling management and dynamic voltage and frequency scaling (DVFS). The
fuzzy controller adjusts the liquid flow rate dynamically to match the cooling
demand of the chip for preventing overcooling and for maintaining a stable
thermal profile. The DVFS decisions increase chip-level energy savings and help
balance the temperature across the system. Our controller is used in
conjunction with temperature-aware load balancing and dynamic power management
strategies. Experimental results on 2-tier and 4-tier 3-D MPSoCs show that our
strategy prevents the system from exceeding the given threshold temperature. At
the same time, we reduce cooling energy by up to 63% and system-level energy by
up to 21% in comparison to statically setting a flow rate setting to handle
worst-case temperatures.

Kao, Y.-H.  Yang, M.  Artan, N. S.  Chao, H. J. CNoC: High-Radix Clos
Network-on-Chip 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071084

Many high-radix network-on-chip (NoC) topologies have been proposed to improve
network performance with an ever-growing number of processing elements (PEs) on
a chip. We believe high-radix Clos network-on-chip (CNoC) is the most promising
with its low average hop counts and good load-balancing characteristics. In
this paper, we propose: 1) a high-radix router architecture with virtual output
queue (VOQ) buffer structure and packet mode dual round-robin matching (PDRRM)
scheduling algorithm to achieve high speed and high throughput in CNoC; 2) the
design of hierarchical round-robin arbiter for high-radix high-speed NoC
routers; and 3) a heuristic floor-planning algorithm to minimize the power
consumption caused by the long wires. Experimental results show that the
throughput of a 64-node three-stage CNoC under uniform traffic increases from
62% to 78% by replacing the baseline virtual channel routers with PDRRM VOQ
routers. We also compared the delay, power, and area performance of the 64-node
CNoC with other NoC topologies under various synthetic traffic patterns and
SPLASH- 2 benchmark traces. The simulation results show that in general CNoC
improves the throughput, low-load delay, and energy efficiency over the
compared NoC topologies.

TEST

Arumi, D.  Rodriguez-Montanes, R.  Figueras, J.  Eichenberger, S.  Hora, C.
Kruseman, B. Diagnosis of Interconnect Full Open Defects in the Presence of
Fan-Out 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071080

The development of accurate diagnosis methodologies is important to identify
process problems and achieve fast yield improvement. As open defects are
becoming dominant in some CMOS technologies, their accurate diagnosis is key to
improving the quality of new very large-scale integrated circuits. Widely used
interconnect full open diagnosis procedures are based on the assumption that
neighboring lines determine the voltage of the defective line.  However, this
assumption decreases the diagnosis efficiency for opens in interconnect lines
with fan-out, where the influence of transistor capacitances is significant.
This paper presents a diagnosis methodology for interconnect full open defects
which considers and models the impact of transistor parasitic capacitances on
the defective node accurately. The methodology is able to properly diagnose
interconnect opens with fan-out even in the presence of Byzantine behavior.
Diagnosis results for real defective devices from different technology nodes
are also provided.

Ma, J.  Tehranipoor, M. Layout-Aware Critical Path Delay Test Under Maximum
Power Supply Noise Effects 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6071087

As technology shrinks, gate sensitivity to noise increases due to supply
voltage scaling and limited scaling of the voltage threshold. As a result,
power supply noise (PSN) plays a greater role in sub-100 nm technologies and
creates signal integrity issues. It is vital to consider supply voltage noise
effects: 1) during design validation to apply sufficient guardbands to critical
paths, and 2) during path delay test to ensure the performance and reliability
of the chip. In this paper, a novel layout-aware pattern generation procedure
is proposed to maximize PSN effects on critical paths considering the impact of
local voltage drop. The proposed pattern generation and validation flow is
implemented on the ITC'99 b19 benchmark. Experimental results for both
wire-bond and flip-chip packaging styles are presented. Results demonstrate
that our proposed method is fast, significantly increases switching around the
functionally testable critical paths, and induces large voltage drop on cells
placed on the critical paths which results in increased path delay. The
proposed method eliminates the very time consuming pattern validation phase
that is practised in industry.