# Optimized standard cells for all-spin logic

MEGHNA G. MANKALALE, University of Minnesota SACHIN S. SAPATNEKAR, University of Minnesota

All-Spin Logic (ASL) devices provide a promising spintronics-based alternative for Boolean logic implementations in the post-CMOS era. In principle, any logic functionality can be implemented in ASL. In practice, the performance of an ASL gate is significantly affected by layout choices, but such implications have not been adequately explored in the past. This paper proposes a systematic approach for building standard cells in ASL, which are a basic building block in an overall design methodology for implementing large ASLbased circuits. We first propose a new technique to reduce the magnet count for an ASL majority gate but still ensure correct functioning through layout optimization methods. Building upon physics-based analysis, we then build a standard cell library with diverse functionality and characterize the library for delay, energy and area. We perform delay-optimized technology mapping on ISCAS85 benchmark circuits using our library. Our approach results in circuits that are 12.90% faster, consume 26.16% less energy and are 33.56% more area efficient compared to a standard cell library that does not incorporate layout-based optimization techniques of our work.

CCS Concepts: •Hardware  $\rightarrow$  Physical design (EDA); Spintronics and magnetic technologies;

Additional Key Words and Phrases: All-Spin Logic, Emerging technologies, Standard cell design, Computer aided design

#### **ACM Reference Format:**

Meghna G. Mankalale and Sachin S. Sapatnekar, 2015. Optimized standard cells for all-spin logic. ACM J. Emerg. Technol. Comput. Syst. V, N, Article A (January YYYY), 21 pages. DOI: http://dx.doi.org/10.1145/0000000.0000000

### **1. INTRODUCTION**

Spin-based technologies are a promising candidate for the post-CMOS era [Nikonov and Young 2013; Kang et al. 2015]. Early implementations of spin-based logic circuits required logic states, stored in the spin domain, to be transmitted using charge currents: this translation between spin and charge domains incurs large overheads. All-Spin Logic (ASL) [Behin-Aein et al. 2010] overcomes this problem through a scheme that transmits information using spin currents along an interconnect, and is an effective platform for spintronic logic.

The ASL structure naturally performs a majority operation, and can be used to implement any logic function since majority logic is universal. Prior work on ASL circuits [Calayir et al. 2014; Sharad et al. 2013; Augustine et al. 2011; Nikonov and Young 2013; Su et al. 2015a; Su et al. 2015b; An et al. 2015] has focused on constructing individual blocks such as gates and adders, with larger circuits being composed manually. A high-level study of ASL was conducted in [Kim et al. 2015]. The recent work in [Pajouhi et al. 2015] concentrates on building large circuits; the emphasis is on devising circuit optimization techniques rather than the optimization of the standard cell library.

© YYYY ACM. 1550-4832/YYYY/01-ARTA \$15.00

DOI: http://dx.doi.org/10.1145/0000000.0000000

ACM Journal on Emerging Technologies in Computing Systems, Vol. V, No. N, Article A, Pub. date: January YYYY.

This work was supported in part by C-SPIN, one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.

Author's addresses: Meghna G. Mankalale and Sachin S. Sapatnekar, Department of Electrical and Computer Engineering, 200 Union Street SE, University of Minnesota, Minneapolis MN-55455.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

We propose a systematic approach to build compact and functionally correct standard cells in ASL. Although such cells are critical to enabling the design of large ASLbased circuits, the problem has received scant attention in the literature. The design of majority gates in ASL is predicated on a very careful balance between spin current injection, where cell layout choices can significantly impact logical correctness. We develop an understanding of the impact of the length of the interconnect segments connecting the magnets, and the effect of magnet dimensions, on the functionality and performance of the gate. Key features of our work include:

- We systematically study the limits on magnet spacing using OOMMF-based micromagnetic simulations [Donahue and Porter 1999].
- We employ a single fixed magnet, instead of n-1 fixed magnets in the conventional scheme [Kong et al. 2010], to build compact *n*-input (N)AND/(N)OR gates. When compared to an implementation of a conventional ASL AND3 gate [Behin-Aein et al. 2010], our approach results in compact standard cells, and also improves power and delay.
- We ensure logical correctness of the layout by formulating the constraints on spin currents, and
- We use our optimized standard cells to perform delay-optimized technology mapping on benchmark circuits. Our approach results in 12.90% faster, consume 26.16% less energy and are 33.56% more area efficient.

2. ASL-BASED MAJORITY LOGIC



Fig. 1. ASL implementations of (a) a three-input majority gate (b) an inverter.

#### 2.1. ASL fundamentals

The fundamental building block for ASL is the majority gate, conceptually shown in Fig. 1(a). Each input magnet of the gate,  $M_i$  has a certain binary value of spin ("up" or "down"), thus enabling binary encoding through spin directions. This spin information is transmitted via a nonmagnetic interconnect to the output magnet through a spin current contribution (up-spin or down-spin) from each magnet. These contributions are summed up on the way to the output magnet,  $M_o$ : the majority spin current contribution "wins," and switches  $M_o$  using the spin-transfer torque phenomenon. Hence this structure works very naturally as a majority gate. A degenerate case is the ASL buffer, where the spin current injected into the interconnect from the input is transmitted to the output magnet, and sets its state to be identical to the input.

The mechanism for spin injection into the interconnect is based on passing a DC charge current through the input magnet as shown in an ASL inverter in Fig. 1(b). For the  $V_{dd}/Gnd$  ( $Gnd/V_{dd}$ ) configuration,  $M_i$  acts as a spin filter and sends into the interconnect of length L only the spins that are in opposite (same) direction as those in the magnet: therefore, generating inversions in logic functionality is a simple matter of reversing the supply voltage [Srinivasan et al. 2011]. The spin current diffuses through the interconnect and exerts a spin torque on  $M_o$  that aligns its spin with the spin current. A key metric of the ability of the input current to switch the output is the spin injection efficiency. Quantitatively, this is the ratio between the spin current at the output end of the interconnect and the charge current injected at the input end.

Cascadability of this device is realized by passing a DC charge current through  $M_o$  which then transmits a spin current to the next stage through the interconnect. An isolation layer beneath the  $M_o$  is used to separate the present stage from the next stage as seen in Fig. 1(b). In spite of the isolation layer, this process could potentially send a spin current in the reverse direction, i.e. in the direction of the input magnet. The magnitude of such a reverse spin current is very small compared to the current in the forward direction as a result of the presence of the Gnd terminal only at the input end. This asymmetry of Gnd between the input and the output ensures the large resistance of the path to the reverse current. In this work, we model the reverse current by assuming its value to be 20% of that of the forward current.

We now analyze the following two factors that enable us to build compact optimized ASL standard cells:

- (1) Spin current modulation using magnet and interconnect layout parameters, and
- (2) Rethinking the structure of an ASL majority gate to produce compact layouts using a single fixed magnet.

#### 2.2. Spin current modulation

We consider the impact of geometry when an input magnet drives an output magnet through a interconnect. We use the inverter structure from Fig. 1(b) for illustration, but the results are extendable to any type of ASL majority gate.



Fig. 2. Spin current at the output end of the interconnect in an inverter as a function of (a) the interconnect length (b) the input magnet size.

**Impact of interconnect length:** At a 10nm technology node, we set the length of each magnet in an ASL inverter to 30nm, width to 10nm and thickness to 3nm. We will show later in Section 3.1 that this a minimum-sized inverter at 10nm technology node. We vary the interconnect length, L from 40nm to 400nm. Our simulations are performed using the model in Section 4, with the parameters in Table I. We consider

magnets with perpendicular magnetic anisotropy (PMA), where the magnetization is normal to the plane of the magnet. Fig. 2(a) shows that the magnitude of the spin current at the output end of the interconnect, which induces spin torque switching at the output, *decreases nonlinearly with interconnect length* due to the lossy nature of the interconnect.

**Impact of magnet width:** The reduction in the spin injection efficiency, which is constant over the above experiment, with interconnect length may be overcome by using a larger input magnet that injects more charge current into the input magnet, and hence more spin current at the input end of the interconnect. This in turn translates to more spin current at  $M_o$  for a fixed interconnect length. This is illustrated in Fig. 2(b), where the length of the interconnect is fixed at 40nm. The length and the thickness of  $M_i$  are set at 30nm and 3nm, respectively and its width is varied from 10nm to 200nm, while the dimensions of the output magnet remain unchanged from the previous plot. We see that a stronger input magnet boosts the spin current at  $M_o$ .

| Parameters                                    | Value                           |
|-----------------------------------------------|---------------------------------|
| Polarization Factor                           | 0.8                             |
| Resistivity of magnet $\rho_F$                | 170Ωnm                          |
| Resistivity of interconnect, $\rho_N$         | 7Ωnm                            |
| Spin flip length of magnet, $\lambda_F$       | 5nm                             |
| Spin flip length of interconnect, $\lambda_N$ | 500nm                           |
| Thickness of all magnets                      | 3nm                             |
| Thickness of ground lead                      | 3nm                             |
| Width of interconnect                         | 10nm                            |
| Thickness of interconnect                     | 10nm                            |
| Bohr magneton, $\mu_B$                        | $9.274 \times 10^{-24} JT^{-1}$ |
| Saturation magnetization, M <sub>s</sub>      | 780 emu/cc                      |
| Charge of an electron                         | $1.6 	imes 10^{-19} \mathrm{C}$ |
| Operating voltage                             | 100mV                           |

Table I. Parameters used for simulation in this work.

#### 2.3. Compact majority logic gate

The conventional realization of a n-input basic logic primitive such as AND (OR) gate using majority logic requires a total of 2n - 1 inputs, with n - 1 magnets with fixed polarities of 0 (1) to realize the desired functionality. This ensures that the output will only evaluate to logic 1 (0) when all n inputs to AND (OR) are at logic 1 (0), and only then can the spin contribution of the n - 1 fixed magnets be overcome. Fig. 3(a) shows the example of a three-input AND gate (AND3) where two fixed magnets ( $M_f$ ) with 0 polarity compete with the three inputs ( $M_i$ ) of the gate. The output magnet is set to logic 1 only when all the inputs are at logic 1. NAND (NOR) gates can thus be realized in a similar way as AND (OR) gates, but with the  $V_{dd}$  and Gnd polarities reversed, as explained in Section 2.1.

The effect of the additional n-1 fixed magnets leads to

- Increased layout complexity: The interconnects require careful layout (e.g., length matching) so that each magnet is weighted identically in the majority function.
- Degraded area, performance, and power: The larger number of magnets can result in a large cell area and hence longer interconnect lengths. Since the current transmitted to the output magnet weakens with interconnect length (Fig. 2(a)), this affects performance and power.

In this work, we show that it is possible to physically implement such a majority gate in ASL with just one fixed magnet rather than n-1 magnets. The key observation based on the requirement for the majority function is that the *spin current* from the fixed magnets should be equivalent to the spin current injected by n-1 fixed magnets.



Fig. 3. An ASL AND3 gate using (a) conventional design (b) our scheme.

Therefore, if a single magnet of fixed polarity is placed close to the primary output, then its size and its distance from the output can be modulated, as indicated by the trends in Fig. 2, to inject the same spin current. This idea is shown in Fig. 3(b) for an AND3. This improves upon the conventional ASL cell in terms of

- area, since we save n-2 magnets;
- power, since we must now drive charge current into n-2 fewer magnets, and
- delay, since we create compact cells, thus reducing the interconnect length, which is shown to provide a larger switching current to the output magnet in Fig. 2(a).

#### 3. ASL STANDARD CELL DESIGN

Let  $M_i$ ,  $M_f$ , and  $M_o$  denote the input, fixed, and output magnet(s), respectively (Fig.3(b)). Let L denote the vector of interconnect lengths from the input to output magnet, W denote the vector of input magnet widths,  $L_f$  the length of the interconnect segment from the fixed magnet to the output magnet and finally  $W_f$ , the width of the fixed magnet. The layout of the magnets and interconnects can be formulated as an optimization problem of minimizing the bounding box area of the cell,  $f_{Area}$ , under a set of constraints:

$$Minimize \quad f_{Area}(\mathbf{L}, \mathbf{W}, W_f) \tag{1}$$

subject to

- (1) Geometrical constraints on interconnect segment lengths to
  - (a) obey design rules based on feature size constraints,
  - (b) avoid dipolar coupling between magnets, and
  - (c) ensure a constant standard cell row height.
- (2) Functionality constraints to balance the spin currents driven by
  - (a) input magnets, and
  - (b) the fixed magnet, relative to input magnets.

While building the layout for ASL circuits, the following additional points need to be addressed:

Multiple drive strengths: The layout of an ASL circuit differs from its CMOS counterpart in the sense that it has two stages: (1) within a cell, each  $M_i$  drives an  $M_o$ , and spin currents are carefully balanced between all  $M_i-M_o$  paths to realize the majority logic function; (2) outside a cell, the input magnet of the next stage is then driven by  $M_o$ . In principle, it may sometimes be possible to eliminate the output magnet [Kim et al. 2015] and drive the next gate directly, but since the bottleneck of ASL delay is well known to be in the interconnects and vias, it is preferable to have a separate driver for the majority function and for the output load. A possible exception is when the load is placed right next to the current cell, the output magnet may be eliminated as a post-processing optimization after layout, and the standard cell delay model will continue to be valid since the output load is the same. Circuit delay optimization involves a balance between the internal cell delay ( $M_i$  driving  $M_o$ ) and the external delay ( $M_o$  driving the load). For this reason, we build standard cells considering all combinations of input magnet and output magnet sizes – 1X, 2X, 4X and 8X for each logic function. We then analyze these standard cells with respect to delay, energy, and area, and prune the ones that are suboptimal from the library.

Interconnect segment routing: Intra-cell interconnects are routed in a single layer because vias introduce additional variability that make it harder to meet the currentbalancing requirements of Constraint 2. Cell-to-cell connections may use vias since the magnet  $M_o$  drives the next cell through a interconnect, and is effectively a buffer structure that does not require spin current balancing.

#### 3.1. Geometrical constraints

**Constraint 1(a) – Design rules:** In this work, we assume that the minimum drawn length and spacing are 10nm.

**Constraint 1(b) – Dipolar coupling effects:** Each magnet can be viewed as a single magnetic dipole, which interacts with other magnetic dipoles in the same cell as well as those in the neighboring one. Dipolar coupling affects the circuit performance in the following ways:

- (1) Impact on  $M_o$ : It can provide an opposing field that slows down the switching of  $M_o$ .
- (2) Impact on  $M_i$ : It can weaken the spin current injected by  $M_i$  into the interconnect.
- (3) *Impact on majority logic gate*: It can introduce nonuniformities in spin injection since one of the input magnets may have larger dipolar coupling than another.

The constraints on the inter-magnet spacing should be adequately set to render these ineffective. In order to study these effects in more detail, we use the micro-magnetics simulation software, OOMMF [Donahue and Porter 1999], which takes the magnetic material parameters and configurations as an input and numerically computes the resultant magnetization dynamics by solving the Landau-Lifshitz-Gilbert-Slonczewski (LLGS) equation [Slonczewski 1996], incorporating the impact of all fields, including dipolar coupling effects.

$$\frac{d\vec{m}}{dt} = -\gamma \left[\vec{m} \times H_{eff}\right] + \alpha \left[\vec{m} \times \frac{d\vec{m}}{dt}\right] + \tau$$
<sup>(2)</sup>

Here,  $\gamma$ ,  $H_{eff}$ ,  $\alpha$ , and  $\tau$  are, respectively, the gyromagnetic ratio, the effective magnetic field (including dipolar coupling effects), the damping constant, and the effective spin torque, and  $\vec{m}$  is the normalized magnetization of the switching magnet.

The minimum separation distance between magnets is derived for the case of magnets with in-plane magnetic anisotropy (IMA), where the effect of dipolar coupling between the magnets is much stronger as compared to those with PMA as a result of the dominance of shape anisotropy [Johnson et al. 1996]. In the case of PMA magnets, the uniaxial anisotropy dominates over the shape anisotropy rendering the dipolar coupling effect negligible. We demonstrate the effect of dipolar coupling on IMA magnets by running a set of simulations using the parameters listed in Table I, with the exception of the magnet thickness being kept at 5nm.

**Impact of dipolar coupling on**  $M_o$ : To consider the worst-case dipolar coupling, we consider a scenario where a victim magnet  $M_o$  is surrounded by aggressor magnets. We consider the case where the aggressor magnets are as large as possible (8X) and the

victim magnet is as small as possible (1X), and determine the minimum safe distance D between magnets. We supply this topology to OOMMF and simulate the worst-case scenario where the magnetization of the magnets (the aggressors and victim  $M_o$ ) all point in one direction, and pass a spin torque current through  $M_o$  to switch it in the opposite direction, while being resisted by dipolar coupling from the aggressors. For various distances  $D \in [1nm,50nm]$  between the victim and aggressors, we determine the minimum switching current,  $I_{s0}$ , required to switch the magnet. The curve of  $I_{s0}$  vs. D, shown in Fig. 4, reflects the impact of dipolar coupling.

The figure also shows the  $I_{s0,lone}$  bound for the case of a lone magnet with no aggressors (i.e.,  $D \to \infty$ ). It can be seen that  $I_{s0}$  converges to  $I_{s0,lone}$  at around D = 10nm, implying that  $M_o$  essentially experiences no dipolar coupling beyond 10nm.

**Impact on**  $M_i$ : The impact of dipolar coupling on an input magnet may be to alter the ability of the magnet to polarize the charge current and convert it to spin current at the input end of the interconnect. This occurs because the dipole moments may realign themselves to rest at a nonzero angle to the easy axis. We can observe this in OOMMF for an input magnet  $M_i$  with an aggressor spacing of D = 1, 5, and 10nm, as shown in Fig. 5. Several dipoles are at an angle to the easy axis for lower values of D, but as D increases, the departures from the easy axis are minor, and negligible at about D = 10nm, allowing better polarization at the input magnet. In fact, this implies that such coupling is a non-issue since this constraint is subsumed by the design rule (Constraint 1(a)) at 10nm.



Fig. 4. Critical switching current vs. magnet separation.



Fig. 5. Steady-state magnetization for a separation distance of (a) 1nm, (b) 5nm, and (c) 10nm.

Thus, a minimum magnet separation distance of 10nm ensures no dipolar coupling between the magnets in a layout. The layouts and the results derived thereafter in this work consider PMA magnets with the same 10nm separation distance derived for the IMA magnets.

#### **Constraint 1(c) – Standard cell row heights:**

The standard cell row height constraint follows directly from Constraint 1(b). The most stringent constraint, which determines the uniform height for all standard cells, corresponds to the most compact layout of the largest cell in the library. We first look at the factors that decide the choice of the magnet and interconnect dimensions for a chosen drive strength, which would directly lead us to the choice of the row height for the standard cells.

Choice of magnet and interconnect dimensions: A magnified side view of the layout of  $M_i$  with 1X drive strength is shown in Fig. 6(a). It consists of a magnet with a layer of contact on top of it. Beneath  $M_i$ , we have the sections of interconnect from the previous stage and the present stage separated by an isolation layer. The isolation layer ensures that the input-output isolation is maintained as described in Section 2.1. A contact to Gnd lies below the interconnect to provide the asymmetrical ground on the input side in the present stage. Constraint 1(a) implies that the length and the width of the magnet, the interconnect and the isolation layer should each be at least 10nm. It therefore follows that when each section of the interconnect and the isolation layer beneath  $M_i$  is drawn at its minimum length of 10nm, the length of the  $M_i$  would then have to be at least 30nm, the sum of the length of each section of the interconnect on the either side and the isolation layer beneath  $M_i$ . Therefore, in this work we set a unit drive strength  $M_i$  to have the dimensions  $30 \times 10 \times 3$  nm<sup>3</sup>. The dimensions of 2X, 4X and 8X  $M_i$  are obtained such that the cross-section area enclosed by the length and the width of the magnet are respectively 2X, 4X and 8X that of a 1X  $M_i$ . These dimensions are shown in Table II. The thickness of the magnet being a process parameter, is kept constant at 3nm for all the magnets.

Table II. Dimensions of the magnet corresponding to the different drive strengths.

| Drive strength | <b>Dimensions</b> (Length×Width×Thickness) |
|----------------|--------------------------------------------|
| 1X             | $30$ nm $\times 10$ nm $\times 3$ nm       |
| 2X             | $30$ nm $\times 20$ nm $\times 3$ nm       |
| 4X             | $40$ nm $\times 30$ nm $\times 3$ nm       |
| 8X             | $80$ nm $\times 30$ nm $\times 3$ nm       |



Fig. 6. (a) Magnified side view of the layout of a 1X magnet, and (b) top view of the layout of MAJ3\_8X\_8X.

The largest cell in the library, that would decide standard cell row height corresponds to one which has all its  $M_i$  and  $M_o$  set to 8X drive strength. The layout of such a three-input majority gate is shown in Fig. 6(b). We illustrate our design choices for each standard cell using this figure.

- Signal interconnect: The visible portion of the interconnect segment length, L, from each  $M_i$  to the junction point T, where these segments meet is set to 30nm. This interconnect length allows for a separation distance of 10nm between the magnets as required by Constraint 1(b).
- Magnet orientation: The orientation of the magnet with respect to its interconnect segment length, is such that
  - the shorter dimension of the magnet (width) is along the interconnect length. This ensures that the total length of the interconnect segment which extends until the isolation layer beneath the magnet is the minimum possible.
  - the required symmetry of the input magnet and the interconnect structure for the majority logic is maintained, such that there is equal weighting of the spin currents from each  $M_i$  at  $M_o$ .
  - the isolation layer abuts the interconnect beneath the magnet as shown in Fig. 6(a).
- Supply rails: The supply rails for  $V_{dd}$  and Gnd, drawn with different metal layers, are drawn along the width of the standard cell at the top, bottom and the centre to allow for easy abutment of the cells. The contact layer on top of the magnets shown in Fig. 6(a) is omitted here for convenience.

With this structure, the standard cell row height is thus derived to be 130nm. However, we will see later in Section 6 that some standard cells dominate others in area, delay, and energy. For this reason, the standard cell row heights would eventually not be 130nm, but instead could be set to 80nm.

#### 3.2. Functional Constraints

**Constraint 2(a) – Input magnet spin currents:** For a majority logic function, all input magnets must be equally weighted, i.e., must contribute an equal magnitude of spin current under either polarity. We achieve this by *symmetry* – using identical input magnets and balancing the wire lengths to obtain identical spin currents at the end of each interconnect fed by an input magnet. This ensures that

- the spin current contributed by each input magnet to the majority logic function is identical.
- the load presented to the predecessor stage by any input magnet is identical.
- the gate is robust to systematic process variations, which may affect all magnets and interconnects identically.

Note that for the scheme in Fig. 6(b), the input magnets  $M_i$  inject a spin current that accumulates at a junction point T and the algebraic sum of the currents is then transmitted along a interconnect to the output magnet  $M_o$ . Since this last interconnect segment is common to all  $M_i$  currents, it is sufficient to ensure equal spin current contributions to point T, thus guaranteeing an equal contribution from each  $M_i$  to  $M_o$ . Constraint 2(b) – Output magnet spin currents:

For logic gates that require the use of fixed magnets, the spin injection efficiency at the output end of the interconnect driven by the single fixed magnet must be carefully controlled to realize the desired logic function.

Consider an *n*-input AND gate implemented as a majority gate with equallyweighted inputs, as illustrated in Fig. 3(a) for n = 3, and let the integer *p* represent the number of fixed magnets. The worst case scenarios that constrain *p* are:

- (a) Logic 0 at the output: When all but one input is at logic 1, then the lone input at logic 0, plus the p fixed magnets, must set the output to logic 0, i.e., p + 1 > n 1, or p > n 2.
- (b) Logic 1 at the output: When all input magnets are at logic 1, they must overwhelm the effect of p logic-0 fixed magnets, i.e., n > p.

In other words, for a *n*-input AND gate,

$$n > p > n - 2 \tag{3}$$

i.e., if *p* must be an integer, then p = n - 1.

In a concrete implementation of an ASL gate, the quantities n and p are not abstract integers, but correspond to the actual real-valued spin current contributed by the  $M_i$ and  $M_f$ . Let  $I_{s,i}$  be the contribution of input magnet  $M_i$  to the spin current that reaches  $M_o$ . We maintain these values to be identical for each input, based on the symmetry argument in Constraint 2(a). Let  $I_{s,f}$  be the spin current contributed by the single  $M_f$ . Using the same logic as above, if  $I_{s,i}$  and  $I_{s,f}$  are the spin current contributions entering  $M_o$  from  $M_i$  and  $M_f$ , respectively, then the excess spin current should exceed the critical switching current,  $\kappa I_{s0}$ , where  $\kappa > 1$  is a multiplier that ensures sufficiently fast switching. For the logic 1 output, this means that

$$nI_{s,i} - I_{s,f} > \kappa I_{s0} \tag{4}$$

i.e, the collective spin current from n input magnets should exceed that by the fixed magnet by a factor  $\kappa I_{s0}$ . This ensures that  $M_f$  is sized and placed such that  $I_{s,f}$  does not overwhelm the n inputs. For the case of logic 1 output, we define a Boolean variable  $c_1$  which is 1 if Inequation (4) is satisfied and 0, otherwise.

Similarly, for the logic 0 output,

$$I_{s,f} - (n-2)I_{s,i} > \kappa I_{s0}$$
<sup>(5)</sup>

Here, we define another Boolean variable  $c_0$  for the case of logic 0 output, which is set to 1 if Inequation (5) is satisfied and 0, otherwise. The value  $c_1 = 1$  ( $c_0 = 1$ ) denotes that  $I_{s,f}$  is lower (greater) than the required upper (lower) bound. We can see that a value  $c_0 = c_1 = 0$  is not possible. We utilize these Boolean variables later in Section 5 to derive the dimensions of  $M_f$  and the corresponding length of the interconnect,  $L_f$ .

Consolidating Inequations (4) and (5), we have

$$nI_{s,i} - \kappa I_{s0} > I_{s,f} > (n-2)I_{s,i} + \kappa I_{s0},$$

i.e.,

$$n-\delta > \frac{I_{s,f}}{I_{s,i}} > (n-2) + \delta \tag{6}$$

where

$$\delta = \kappa I_{s0} / I_{s,i} \tag{7}$$

We can observe that

— Comparing (3) and (6),  $p = I_{s,f}/I_{s,i}$ . — It is essential to have  $\delta < 1$ , otherwise there is no solution to the inequalities in (6).

We note that the currents in the expression for p can be independently tuned by altering the relative magnet sizes and interconnect lengths. Therefore, the ratio p may be any real number *and is not constrained to be an integer*.

The condition  $\delta < 1$  leads to the requirement  $I_{s,i} > \kappa I_{s0}$ , which follows from the definition of  $\delta$ . Intuitively, the inequation states that each  $M_i$  to  $M_o$  path could be considered a two magnet system and the output would only switch if  $I_{s,i}$  is greater than the switching current threshold of  $M_o$  by a factor  $\kappa$ . Our experiments set  $\kappa = 1.5$ . Our

experiments consider magnets with perpendicular magnetic anisotropy. whose critical switching current,  $I_{s0,PMA}$  [Sun 2000], is given by

$$I_{s0,PMA} = \frac{2q\alpha M_s V H_K}{\hbar} \tag{8}$$

where q,  $\alpha$ ,  $M_s$ , V,  $H_K$ ,  $\hbar$  refer to the electron charge, damping constant, saturation magnetization, volume of the magnet, critical switching field and reduced Planck's constant, respectively.

### 4. SPIN CIRCUIT MODELING

We briefly provide an overview of the physics-based spin circuit model [Srinivasan et al. 2013] used in this work for performance assessment. An ASL device consists of a ferromagnet (FM) that generates spin current from charge current, and a nonmagnetic (NM) interconnect that receives the spin current through a tunneling barrier and transmits it to the output magnet. These are modeled as lumped  $\pi$ -networks. The ground lead at the input side creates the asymmetry which ensures that the input affects the output and not vice-versa. Any ASL gate may be modeled using these basic components. For gates with multiple input magnets and interconnect segments, each is represented by a  $\pi$ -model.

The voltage [current] at each node [branch] in the system can be represented by a two-component vector containing the spin and charge components of the voltage [current] at the node [branch]. In other words, if we represent the voltage [current] at node *i* [branch *b*] as  $V_i$  [I<sub>b</sub>], and if we use the subscripts *c* and *s* to denote their charge and spin components, respectively, then

$$\vec{V_i} = \begin{bmatrix} V_{c,i} \\ V_{s,i} \end{bmatrix} \vec{I_b} = \begin{bmatrix} I_{c,b} \\ I_{s,b} \end{bmatrix}$$
(9)

The cumulative circuit equation for a circuit with k nodes is:

$$G_{ckt}]_{2k \times 2k} \mathbf{V} = \mathbf{I} \tag{10}$$

where  $\mathbf{V} = [\vec{V}_1 \cdots \vec{V}_k]^T$  and I corresponds to excitations. To populate  $G_{ckt}$ , the use the  $\pi$ -model between nodes i and j:

$$\begin{bmatrix} I_{c,ij} \\ I_{s,ij} \end{bmatrix} = [G^{se}]_{2 \times 2} \begin{bmatrix} V_{c,i} - V_{c,j} \\ V_{s,i} - V_{s,j} \end{bmatrix} + [G^{sh}]_{2 \times 2} \begin{bmatrix} 0 \\ V_{s,i} \end{bmatrix}$$
(11)

where  $G_{se}$  and  $G_{sh}$  are the series and shunt conductance matrices for the  $\pi$ -model and are related to physical dimensions:

$$G_{FM}^{se} = \frac{A_F}{\rho_F L_F} \begin{bmatrix} 1 & \beta \\ \beta & \beta^2 + \left(\frac{(1-p^2)L_F}{\lambda_{sfF}}\right) \operatorname{cosech}\left(\frac{L_F}{\lambda_{sfF}}\right) \end{bmatrix}$$
(12)

$$G_{FM}^{sh} = \frac{A_F}{\rho_F L_F} \begin{bmatrix} 0 & 0\\ 0 & \left(\frac{(1-p^2)L_F}{\lambda_{sfF}}\right) \tanh\left(\frac{L_F}{2\lambda_{sfF}}\right) \end{bmatrix}$$
(13)

$$G_{NM}^{se} = \frac{A_N}{\rho_N L_N} \begin{bmatrix} 1 & 0\\ 0 & \left(\frac{L_N}{\lambda_{sfN}}\right) \operatorname{cosech}\left(\frac{L_N}{\lambda_{sfN}}\right) \end{bmatrix}$$
(14)

$$G_{NM}^{sh} = \frac{A_N}{\rho_N L_N} \begin{bmatrix} 0 & 0\\ 0 & \left(\frac{L_N}{\lambda_{sfN}}\right) \tanh\left(\frac{L_N}{2\lambda_{sfN}}\right) \end{bmatrix}$$
(15)

where A,  $\rho$ , L, and  $\lambda_{sf}$  represent the the cross-sectional area, resistivity, length, and spin diffusion length. The subscript F[N] relates to the FM [NM] and  $\beta$  is the FM spin polarization factor.

These matrices are used to build element stamps and populate the nodal analysis matrix  $G_{ckt}$ . For nodes held at  $V_{dd}$  or ground, the voltages are substituted in Equation (10). The system is solved to obtain all charge and spin currents and voltages.

The spin current at the output magnet is then used to compute the switching delay,  $t_{sw}$ , for the standard cell as [Behin-Aein et al. 2011]:

$$t_{sw} = \frac{2qN_sf_1f_2}{I_s} \tag{16}$$

where  $N_s$  is the number of Bohr magnetons in the FM,  $I_s$  is the spin current at the output magnet and  $f_1$  and  $f_2$  are constants. The switching energy,  $E_{switching}$  is then calculated as,

$$E_{switching} = t_{sw} I_{charge} V_{dd} \tag{17}$$

where  $I_{charge}$  refers to the total charge current.

## 5. LAYOUT

In this section, we illustrate the procedure to derive the layout of a gate that results in optimal area. We first examine the procedure for obtaining the layout for majority gates and then proceed to the case of an *n*-input AND gate, which requires careful sizing and placement of the fixed magnet. The problem of exploring all possible layout topologies to obtain one that yields the minimum possible area would involve an exhaustive search. While deriving the layout, we therefore fix the topology of the layout such that the spin currents from the input magnets are balanced and we need only to modify the fixed magnet and the interconnect parameters to obtain the optimal area for the chosen layout. This methodology is similar to cell generation strategies for standard cell layout in CMOS technologies, where simple versions of the layout problem are shown to be NP-complete; therefore, the layout topology is typically first fixed using a heuristic [Maziasz and Hayes 1987; 1992; Chakravarty et al. 1991] and then the layout parameters are tuned to derive an optimal layout.

#### ALGORITHM 1: Fixed Magnet Sizing Algorithm

**Input**: Initial layout of ANDn:  $m_i, m_o, L_i, W_f, L_f, V_{dd}$ **Output**: Final  $L_f$  and  $W_f$ 1  $c_1 = 0; c_0 = 0;$ 2 repeat 3 Populate conductance matrix,  $G_{ckt}$  for the layout structure. Populate the vector of excitations, **I** for the case when all inputs are at logic 1. 4 5 Calculate  $\mathbf{V} = G_{ckt}^{-1} \mathbf{I}$ . Evaluate  $c_1$ . 6 Populate I for the case when all inputs but one are at logic 1. 7 Calculate  $\mathbf{V} = G_{ckt}^{-1} \mathbf{I}$ . 8 Evaluate  $c_0$ . 9 if  $c_1 == 1$  and  $c_0 == 0$  then 10 Increase  $I_{sf}$  by increasing  $W_f$  or decreasing  $L_f$ . 11 end 12 **else if**  $c_1 == 0$  and  $c_0 == 1$  **then** 13 Decrease  $I_{sf}$  by increasing  $L_f$  or decreasing  $W_f$ . 14 end 15 16 until  $c_1 = 1$  and  $c_0 = 1$ ;

**Majority gate**: The layout of a majority gate for a chosen drive strength is obtained by choosing (a) the length of the interconnect segments, *L* and (b) the orientation of

 $M_i$  and  $M_o$ , such that the standard cell area is the minimum possible and also adheres to geometrical constraints. The layout of one such MAJ3 gate with the input and output magnet sizes set to 8X was introduced in Fig. 6(b). As seen from Section 3.2, the identical input magnets and the interconnect lengths ensure that the spin current contribution by each input magnet is identical, thereby ensuring correct majority logic functioning.

**ANDn:** The layout of an *n*-input AND gate which needs the fixed magnet is non-trivial. As explained in Section 3,  $M_f$  should be sized and placed appropriately to ensure correct functioning of the AND gate using the majority paradigm. The procedure to arrive at the final layout is detailed below. We use the following notation:

- $m_i$ : Dimensions of  $M_i$
- $m_o$ : Dimensions of  $M_o$
- $W_f$ : Width of  $M_f$ 
  - $L_i$ : Length of interconnect segments from  $M_i$
- $L_f$ : Length of interconnect segment from  $M_f$
- $V_{dd}$ : Operating voltage

For a chosen drive strength of  $M_i$  and  $M_o$ , we set  $\mathbf{m}_i$  and  $\mathbf{m}_o$  to be a vector of three elements representing their length, width and the thickness. These vectors are populated from Table II. For  $M_f$ , we fix the length to be the minimum possible and vary only its width  $(W_f)$  and  $L_f$  to obtain the desired functionality. We explain our algorithm with the help of a three-input AND gate (AND3) with the input and output magnet drive strength set to 1X, corresponding to a magnet footprint of  $30 \text{nm} \times 10 \text{nm} \times 3 \text{nm}$ (Table II).

**Step 1:** First, we obtain a layout with only the input magnets laid out. For that, we choose  $L_i$  and the orientations of  $M_i$  and  $M_o$  such that the resulting layout is as compact as possible and also adheres to geometrical constraints. Initially, we assume a minimum value for  $L_f$  and  $W_f$ . We observe that this compact initial layout also ensures that the final layout with the final values of  $W_f$  and  $L_f$  determined, would also correspond to the minimum possible standard cell area for the chosen input and output magnet drive strength.

The initial layout of the AND3 gate is shown in Fig. 7(a). We set  $W_f$  to be that of a minimum sized magnet (10nm) and  $L_f$  to 10nm, in order to maintain a minimum separation distance between  $M_f$  and  $M_o$ . From Fig. 7(a), we see that (a) setting the dimensions of all  $M_i$  to be the same value and (b) constraining their respective  $L_i$  to be also the same value balances the spin currents from all  $M_i$  by design. We then only need to obtain  $L_f$  and  $W_f$  in order to balance the spin current from  $M_f$  as described in Section 3.2.

**Step 2:** We apply the *Fixed Magnet Sizing Algorithm* shown in Algorithm 1 to the initial layout from **Step 1** to obtain  $W_f$  and  $L_f$ . The inputs to the algorithm are the initial layout parameters ( $\mathbf{m}_i, \mathbf{m}_o, L_i, W_f, L_f, V_{dd}$ ). In Line 1 of the algorithm, we initialize the two Boolean variables  $c_1$  and  $c_0$ , defined earlier in Section 3.2, to be 0. Recall that  $c_0$  and  $c_1$  are set to 1 when the correct spin currents are delivered in the logic 0 and logic 1 case, respectively.

**Step 2a:** Next in line 2, from the given layout structure, we populate the conductance matrix,  $G_{ckt}$  using the set of Equations (12–15) as described in Section 4. For the case of the AND3 gate, its circuit model is shown in Fig. 7(b).

**Step 2b:** In lines 4–6, we calculate  $c_1$ . In order to do so, we first set the vector of excitations, I such that all  $M_i$ s are at a logic 1, i.e., a positive voltage  $V_{dd}$  is applied to each one of them. The vector of node voltages and branch currents through the voltage sources, V is then obtained by solving the system of equations  $G_{ckt}$ V = I. The required



Fig. 7. (a) Initial layout of an AND3 gate to *Fixed Magnet Sizing Algorithm*, (b) circuit model of AND3 gate, (c) final layout of an AND3 gate using this work and (d) final layout of and AND3 gate using the conventional implementation.

branch currents  $I_{s,i}$  and  $I_{s,f}$  are calculated from the computed V using Equation (11). We then evaluate  $c_1$  using Inequation (4).

**Step 2c:** Similarly,  $c_0$  is populated in lines 7–9 by setting one  $M_i$  to be at logic 0 and the remaining n-1 to be at logic 1 in I. As before, this is done by applying a positive  $V_{dd}$  to n-1  $M_i$  to be set at logic 1 and a negative  $V_{dd}$  to the single  $M_i$  to be set to logic 0. The resulting new system of equations  $G_{ckt}\mathbf{V} = \mathbf{I}$ , with updated I is solved to obtain **V**. Following the same procedure as in **Step 2b**, we evaluate  $c_0$  using Inequation (5).

**Step 2d:** With  $c_0$  and  $c_1$  now available, lines 10–11 of the algorithm check their values to determine whether  $I_{sf}$  is lower than the required lower bound. This is the case if  $c_1 = 1$ , but  $c_0 = 0$ . We therefore need to increase  $I_{s,f}$  by either decreasing  $L_f$  or increasing  $W_f$  for the next iteration. This argument follows from the relation of the spin current at  $M_o$  to  $M_i$  and interconnect dimensions that we saw in Section 2.2. On the other hand, lines 13–14 evaluate the case when  $c_1 = 0$  and  $c_0 = 1$ . In this case,  $I_{s,f}$  is greater than the required upper bound and we therefore need to decrease  $I_{s,f}$  by either increasing  $L_f$  or decreasing  $W_f$  for the next iteration.

For the example AND3 gate, applying **Steps 2a-2c** once, we obtain  $c_1 = 0$  and  $c_0 = 1$ . In this case, obtaining a logic 1 at the output is impossible since the proximity of  $M_f$  to  $M_o$  always drives  $M_o$  to a logic value 0 irrespective of the values of  $M_i$ . We therefore need to decrease  $I_{s,f}$ . In order to do so, we increase  $L_f$  for the next iteration as opposed to decreasing  $W_f$ . This is done due to the fact that the dimensions of  $M_f$  are already at their minimum possible value. Steps 2a–2d are repeated with the updated  $W_f$  and  $L_f$  until the constraints in Inequations (4) and (5) are satisfied, i.e., both  $c_1 = 1$  and  $c_0 = 1$ . The algorithm returns the final  $W_f$  and  $L_f$  that satisfies both the constraints. The final value of  $L_f$  for the AND3 gate is obtained as 50nm, with the dimensions of  $M_f$  still set to its minimum. **Step 3:** With the knowledge of  $W_f$  and  $L_f$ , the fixed magnet is laid and oriented to obtain the final layout. In our work, we consider PMA magnets whose magnetization is predominantly due to uniaxial anisotropy as opposed to the shape anisotropy. This allows us the freedom to orient the magnets in a manner that results in the most compact layout. The final layout of the AND3 gate with  $W_f$  and  $L_f$  set to the values obtained from the algorithm is shown in Fig. 7(c). For comparison, Fig. 7(d) shows the layout for the same gate using the conventional design. Our algorithm returns a layout which is compact and as a result faster and consumes less energy compared to the conventional design.

### 6. RESULTS AND DISCUSSION

Standard cell library: We develop a range of standard cells of different input and output magnet strengths. We use the suffix  $P_Q$  to denote a gate where all input magnets are of size PX and the output magnet is of size QX. Both P and Q take the values (1, 2, 4, 8). With every combination of P and Q considered, we therefore design 16 standard cells for each functionality.



Fig. 8. (a) Delay, (b) energy and (c) area of AND3 for a combination of PX ( $M_i$  drive strength) and QX ( $M_o$  drive strength).

We build the layout for each of these gates using the method described in Section 5. We then characterize them for the following metrics – switching delay, energy and area. We proceed to compare the 16 standard cells for each functionality against these metrics. In the case of AND3, the switching delay, energy and area comparison are shown in Fig. 6(a), (b) and (c) respectively. For a chosen QX, we see an increase in switching delay with increasing PX. This result is surprising, since a larger  $M_i$  should inject a relatively large spin current at  $M_o$  leading to smaller delay as seen from Equation (16). However, with increase in P, the layout area increases in order to ensure that the geometrical constraints are met. This translates to longer interconnect segment lengths that are highly lossy to spin resulting in an overall reduction of spin current supplied to  $M_o$ , and therefore leading to the *switching delay degradation*. The increase in switching delay and the charge current with an increase in P contribute to an increase in switching energy according to Equation (17). Hence, for a chosen QX, choosing the minimum-sized input magnets (P = 1) yields the optimum delay, energy and area.

The comparison here though has been shown only for AND3, the same result is seen to be true for all the gates in the library. For this reason, we have four gates per functionality, with P = 1 and Q = 1, 2, 4, 8. The exact choice of Q depends on the logic path in the circuit: choosing a larger Q would increase the standard cell internal delay but reduce the delay of the intra-cell interconnect. It would also be capable of driving larger fanouts as compared to smaller values of Q. Our standard cell library provides this flexibility to the designer to choose a combination of standard cells that would minimize the total path delay.

The dominance of the unit-size input gates also allows us to re-examine the standard cell height which we fixed to be 130nm depending on an 8X input magnet size. We observe that a standard cell height of 80nm would now incorporate the largest cell in the library – with 1X  $M_i$  and 8X  $M_o$ . An example is the MAJ3\_1X\_8X gate shown in Fig. 9(a).

Figs. 9(b-e) show the layouts for several minimum-sized (1X\_1X) gates, developed using our approach: a five-input majority (MAJ5), a two-input AND gate (AND2), a four-input AND gate (AND4), and a two-input exclusive OR (XOR2). As mentioned earlier, the layouts of NAND and (N)OR are identical to AND but with appropriate polarities for fixed magnets. All layouts obey the 80nm height constraint. These standard cells along with the inverter and AND3 gate constitute our standard cell library. Note that unlike CMOS, where AOIs/OAIs can be built efficiently, ASL does not have a special low magnet count implementation for these functions, and they are not included in our library; however, unlike CMOS, majority gates are a natural fit to ASL and are included in the library. We also include the XOR gate in the library due to its widespread use in arithmetic and error-checking circuits.

We compare our work by applying our layout techniques to the conventional implementation that uses n-1 fixed magnets for an *n*-input (N)AND/(N)OR gate: since there is no prior work in this area to compare against, we build layouts for these structures using the methods proposed in this work. Following the same argument for the layout, standard cell layouts with conventional design can be built with a row height of 100nm.

The comparison between the switching delay, energy and area numbers of the layouts for our cells and the conventional structures are shown in Table III, Table IV and Table V respectively for all possible values of PX and QX.

On average, our approach yields 4.8% faster AND2 and XOR2 devices, 22.80% faster AND3, and 23.50% AND4 devices over conventional structures. Consequently, AND2, AND3 and AND4 are an average of 4.40%, 37.90% and 46.60% more energy efficient as a result of our optimization, and our layouts occupy 19.14% smaller area on average. The delay, energy and area for the inverter and the majority gates are shown in Table VI. Since the three-input majority gate (MAJ3) and five-input majority gate (MAJ5) use no fixed magnets, there is no area improvement over their conventional implementations, and their layouts are similar to AND2 and AND4, respectively. The energy dissipation and the switching delay of all the standard cells are calculated at an operating voltage,  $V_{dd} = 100$ mV. For different operating voltages, we could obtain different energy and delay points for the devices. For example, for the case of INV\_1X\_1X, choosing  $V_{dd} = 10$ mV results in the delay of the device to be 4.50ns with the energy







Fig. 9. Layouts for (a) MAJ3\_1X\_8X (b) MAJ5\_1X\_1X (c) AND2\_1X\_1X (d) AND4\_1X\_1X (e) XOR2\_1X\_1X.

dissipation of 100fJ. Similarly,  $V_{dd} = 50$ mV yields a delay of 0.91ns with the energy dissipated being 0.50pJ.

The layout area is calculated by multiplying the standard cell row height (80nm) by the width of the layout. The width of the layout is decided by the length of the interconnect segments and the width of  $M_i$  and  $M_o$  as seen from Figs. 9(a-e). From Table II, we can see that the width of the magnet changes from 1X to 2X, and from 2X to 4X, but remains at 30nm from 4X to 8X. For this reason we do not see an increase in the layout area when the  $M_o$  drive strength changes from 4X to 8X. The only exception is the case of AND3 gate as shown in Fig. 7(c) where the layout area is decided by the interconnect segment lengths and the width of  $M_f$ , which is determined by the sizing algorithm shown in Section 5.



Fig. 10. EM analysis for (a) an inverter and (b) a MAJ5 gate.

While the improvements on AND3 and AND4 are expected since we replace multiple fixed magnets by a single magnet, it is interesting that we also achieve significant improvements for AND2. The AND2 gate functions as a MAJ3 gate whose third input is  $M_f$  in addition to the two  $M_i$ . However, the two differ in the sense that the spin current symmetries that need to be maintained for both these gates constrain the three  $M_i$  of MAJ3 gate to be equally-sized, but for an AND2 gate, it constrains only two  $M_i$  to have the same size. Our optimization places the fixed magnet closer to  $M_o$ to provide faster switching.

Impact of electromigration in scaled dimensions: The scaling of dimensions to obtain a compact layout leads to increased current density through the interconnects and could potentially lead to performance degradation due to electromigration (EM) [Su et al. 2015a]. Here, we study the impact of EM in our standard cell layouts. We begin the analysis with the simple case of an inverter and then proceed to examine the worst-case EM scenario, which corresponds to a MAJ5 gate. For each cell, every input magnet  $M_i$  is set to 1X; recall that it was shown at the beginning of this section that the optimal input magnet size for all standard cells is 1X. In our EM analysis, for each cell, we set the dimensions of  $M_o$  to be 8X to consider the longest interconnect length over all allowable sizes.

A simplified schematic of the layout topology of an ASL inverter is shown in Fig. 10(a). We denote the spin current density by j and the interconnect length between  $M_i$  and  $M_o$  by l. It is well known that the interconnect is immortal to EM if the product (*jl*) satisfies

$$jl < (jl)_c \tag{18}$$

where  $(jl)_c$  is the critical (jl) product, also called the Blech limit. We have considered Copper (Cu) as our interconnect material, and the value of  $(jl)_c$  for Cu is 4000–6000 A/cm for patterned nanoscale wires [Oates 2013]. In the case of INV\_1X\_8X, we obtain the value of j as  $2.22 \times 10^8$  A/cm<sup>2</sup> at the input end of the wire: note that this is the maximum value of j in the interconnect since the current degrades over the interconnect due to spin injection losses. For this layout, the value of l in the cell layout is 40nm, leading to a (jl) product of 890 A/cm, which is well below  $(jl)_c$ .

For a more complex structure such as MAJ5, whose layout topology is illustrated in Fig. 10(b), EM analysis is more involved, owing to the different current densities in the different branches of the layout. From the figure, we see that three input magnets, annotated as  $M_i$  lie to the left of  $M_o$ , and the other two are on the right. The spin current from each  $M_i$  combines at an intermediate node ( $T_{left}$  for the left, and  $T_{right}$  for the right substructure). This combined current then flows through the interconnect towards  $M_o$ , thus resulting in different spin current densities in the different branches. The length of the interconnect from each  $M_i$  to their respective nodes,  $T_{left}$  or  $T_{right}$ , is

equal and is denoted as  $l_i$ . Similarly, the length of the interconnect from  $T_{left}$  or  $T_{right}$  to  $M_o$  is also equal, and is denoted as  $l_o$ . Due to the symmetric structure, the spin current density from each  $M_i$  to its nearest intermediate node,  $T_{left}$  or  $T_{right}$ , is equal and is denoted by  $j_i$ . The current density from  $T_{left}$  ( $T_{right}$ ) to  $M_o$  is defined as  $j_{ol}$  ( $j_{or}$ ). The worst–case scenario for EM occurs when all input magnets  $M_i$  are at the same

The worst-case scenario for EM occurs when all input magnets  $M_i$  are at the same logic value, resulting in the largest spin current in each wire. The spin current at  $M_o$  is the algebraic sum of the spin currents from all  $M_i$ , but is diminished from the value injected at  $M_i$  due to significant spin losses along the interconnect. We follow the methodology used in [Hau-Riege 2000] to calculate the impact of EM on the total structure. The effective (*jl*) product is obtained as

$$(jl)_{eff} = max(j_i l_i + j_{ol} l_o, j_i l_i + j_{or} l_o)$$
(19)

For the layout of MAJ5\_1X\_8X, we obtain the following values:  $l_i = 20$ nm,  $l_o = 50$ nm,  $j_i = 1.93 \times 10^8$  A/cm<sup>2</sup>,  $j_{ol} = 3.26 \times 10^8$  A/cm<sup>2</sup>,  $j_{or} = 2.46 \times 10^8$  A/cm<sup>2</sup>. The value of  $(jl)_{eff}$  is thus obtained as 1811 A/cm which is below the Blech limit. As observed earlier,  $j_{ol}$ , which is obtained by combining the  $j_i$  from three input magnets, is actually  $< 3j_i$  due to the spin current losses in the interconnect segment. Similarly,  $j_{or} < 2j_i$ .

For our layouts, the worst-case  $(jl)_{\text{eff}}$  is achieved for the layouts of MAJ5\_1X\_8X and AND4\_1X\_8X (whose layout topology is similar as that of MAJ5; for both cases, the  $(jl)_{\text{eff}}$  corresponds to the left substructure). Both are verified to be EM–safe.

Table III. Delay comparison between conventional design and our approach in ns.

|                | Co                                                   | nve  | ntior | nal  | Our approach |      |      |      |  |  |
|----------------|------------------------------------------------------|------|-------|------|--------------|------|------|------|--|--|
|                | M <sub>o</sub>                                       |      |       |      |              |      |      |      |  |  |
|                | 1X   2X   4X   8X    1X   2X   4X   8                |      |       |      |              |      |      |      |  |  |
| (N)AND2/(N)OR2 | 0.65                                                 | 1.22 | 2.68  | 5.87 | 0.49         | 1.02 | 2.45 | 5.94 |  |  |
| (N)AND3/(N)OR3 | 0.59                                                 | 1.20 | 2.86  | 6.82 | 0.45         | 0.92 | 2.21 | 5.28 |  |  |
| (N)AND4/(N)OR4 | $0.56 \ 1.03 \ 2.27 \ 5.22 \ 0.52 \ 0.92 \ 1.90 \ 4$ |      |       |      |              |      |      |      |  |  |

Table IV. Energy comparison between conventional design and our approach in pJ.

|                | С                       | onvei                                               | ntion | al    | Our approach |      |       |       |  |  |  |
|----------------|-------------------------|-----------------------------------------------------|-------|-------|--------------|------|-------|-------|--|--|--|
|                | $M_o$                   |                                                     |       |       |              |      |       |       |  |  |  |
|                | 1X 2X 4X 8X 1X 2X 4X 8X |                                                     |       |       |              |      |       |       |  |  |  |
| (N)AND2/(N)OR2 | 4.27                    | 7.97                                                | 17.51 | 38.27 | 3.24         | 6.64 | 15.95 | 38.67 |  |  |  |
| (N)AND3/(N)OR3 |                         |                                                     |       |       |              |      |       |       |  |  |  |
| (N)AND4/(N)OR4 | 12.44                   | $12.44\ 20.85\ 39.85\ 77.15\ 5.69\ 10.03\ 20.68\ 4$ |       |       |              |      |       |       |  |  |  |

Table V. Area comparison between conventional design and our approach in nm<sup>2</sup>.

|                | С     | onvei                   | ntiona | al    | Our approach |       |       |       |  |  |  |  |
|----------------|-------|-------------------------|--------|-------|--------------|-------|-------|-------|--|--|--|--|
|                |       | $M_o$                   |        |       |              |       |       |       |  |  |  |  |
|                | 1X    | 1X 2X 4X 8X 1X 2X 4X 8X |        |       |              |       |       |       |  |  |  |  |
| (N)AND2/(N)OR2 |       |                         |        |       |              |       |       |       |  |  |  |  |
| (N)AND3/(N)OR3 | 12500 | 13500                   | 14500  | 14500 | 10400        | 11000 | 11800 | 12000 |  |  |  |  |
| (N)AND4/(N)OR4 | 12500 | 13500                   | 14500  | 14500 | 10000        | 10800 | 11600 | 11600 |  |  |  |  |

Table VI. Delay, energy, and area of inverter and majority gates in the library.

|      | ]    | Dela  | y (ns | )    |      | Ener  | <b>gy</b> (pJ | )     |       | (nm <sup>2</sup> ) |       |       |
|------|------|-------|-------|------|------|-------|---------------|-------|-------|--------------------|-------|-------|
|      |      | $M_o$ |       |      |      |       |               |       |       |                    |       |       |
|      | 1X   | 2X    | 4X    | 8X   | 1X   | 2X    | 4X            | 8X    | 1X    | 2X                 | 4X    | 8X    |
| INV  |      |       |       |      |      |       |               |       |       | 3200               | 4000  | 4000  |
| MAJ3 | 0.65 | 1.22  | 2.68  | 5.87 | 4.27 | 7.97  | 17.51         | 38.27 | 7500  | 8500               | 9500  | 9500  |
| MAJ5 | 0.59 | 1.20  | 2.86  | 6.82 | 6.48 | 13.08 | 31.15         | 74.17 | 12500 | 13500              | 14500 | 14500 |

Delay-optimized technology mapping: These standard cells are used with the logic synthesis tool, ABC [Berkeley Logic Synthesis and Verification Group 2015], to obtain



Fig. 11. Comparison of conventional approach and this work for ISCAS85 benchmarks with respect to (a) delay and (b) energy.

*delay-optimized* technology mapped circuits for ISCAS85 benchmarks. The delay, energy and area numbers for these circuits are plotted in Fig. 11(a), (b) and (c) respectively. Our approach on an average results in circuits that are 12.90% faster, consume 26.16% less energy and 33.56% more area efficient compared to the conventional approach. These improvements can largely be credited to the elimination of the additional interconnect segments for the multiple fixed magnets used in the conventional approach.

# 7. CONCLUSION

We have developed a procedure that optimizes the geometries and layouts of ASL standard cells, illustrating the design considerations and the optimizations that can be made in ASL standard cell design. Cells with multiple driving powers are built and are used in a technology mapper to optimize the ISCAS85 benchmarks. We generate compact layouts by studying physics and layout considerations to determine optimal cell spacing, as well as using an innovative method for building the fixed magnets in majority gates.

### REFERENCES

- Q. An, L. Su, J.- O. Klein, S. L. Beux, I. O'Connor, and W. Zhao. 2015. Full-adder Circuit Design Based on All-spin Logic Device. *IEEE/ACM International Symposium on Nanoscale Architectures* (July 2015), 163–168.
- C. Augustine, G. Panagopoulos, B. Behin-Aein, S. Srinivasan, A. Sarkar, and K. Roy. 2011. Low-Power Functionality Enhanced Computation Architecture Using Spin-Based Devices. *IEEE/ACM International* Symposium on Nanoscale Architectures (June 2011), 129–136.
- B. Behin-Aein, D. Datta, S. Salahuddin, and S. Datta. 2010. Proposal For an All-Spin Logic Device With Built-In Memory. Nature Nanotechnology 5 (April 2010), 266–270. Issue 4.
- B. Behin-Aein, A. Sarkar, S. Srinivasan, and S. Datta. 2011. Switching Energy-Delay of All-Spin Logic Devices. Applied Physics Letters 98 (2011), 1–3. Issue 12.

- Berkeley Logic Synthesis and Verification Group. 2015. ABC: A System for Sequential Synthesis and Verification, Release 61225. (2015). Retrieved Oct 10, 2015 from http://www.eecs.berkeley.edu/~alanmi/abc/.
- V. Calayir, D. E. Nikonov, S. Manipatruni, and I. A. Young. 2014. Static and Clocked Spintronic Circuit Design and Simulation With Performance Analysis Relative to CMOS. *IEEE Transactions on Circuits* and Systems I 61 (Feb 2014), 393–406. Issue 2.
- S. Chakravarty, X. He, and S. S. Ravi. 1991. Minimum Area Layout of Series-Parallel Transistor Networks is NP-Hard. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 10 (July 1991), 943–949. Issue 7.
- M. J. Donahue and D. G. Porter. 1999. OOMMF User's Guide, Version 1.0, Release 1.2 alpha 5. Interagency Report NISTIR 6376, National Institute of Standards and Technology, Gaithersburg, MD. (Sept 1999). http://math.nist.gov/oommf.
- S. P. Hau-Riege. 2000. New Methodologies for Interconnect Reliability Assessments of Integrated Circuits. Ph.D. Dissertation. Massachusetts Institute of Technology.
- M. T. Johnson, P. J. H. Bloemen, F. J. A. Den Broeder, and J. J. De Vries. 1996. Magnetic Anisotropy in Metallic Multilayers. *Reports on Progress in Physics* 59 (July 1996), 1409–1458. Issue 11.
- W. Kang, Y. Zhang, Z. Wang, J. O. Klein, C. Chappert, G. Wang, Y. Zhang, and W. Zhao. 2015. Spintronics: Emerging Ultra-Low-Power Circuits and Systems Beyond MOS Technology. ACM Journal on Emerging Technologies in Computing 12 (Aug 2015). Issue 2.
- J. Kim, A. Paul, P. A. Crowell, S. J. Koester, S. S. Sapatnekar, J.-P. Wang, and C. H. Kim. 2015. Spin-Based Computing: Device Concepts, Current Status, and a Case Study on a High-Performance Microprocessor. Proceedings of the IEEE 103 (Jan 2015), 106–130. Issue 1.
- K. Kong, Y. Shang, and R. Lu. 2010. An Optimized Majority Logic Synthesis Methodology for Quantum-Dot Cellular Automata. *IEEE Transactions on Nanotechnology* 9 (March 2010), 170–183. Issue 2.
- R. L. Maziasz and J. P. Hayes. 1987. Layout Optimization of CMOS Functional Cells. Proceedings of the ACM/IEEE Design Automation Conference (June 1987), 544–551.
- R. L. Maziasz and J. P. Hayes. 1992. Layout Minimization of CMOS Cells. Kluwer Academic Publishers, Boston, Massachusetts.
- D. E. Nikonov and I. A. Young. 2013. Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking. *Proceedings of the IEEE* 101 (Dec 2013), 2498–2533. Issue 12.
- A. S. Oates. 2013. The Electromigration Short-Length Effect and Its Impact on Circuit Reliability. IEEE International Interconnect Technology Conference (June 2013), 1–3.
- Z. Pajouhi, S. Venkataramani, K. Yogendra, A. Raghunathan, and K. Roy. 2015. Exploring Spin-Transfer-Torque Devices for Logic Applications. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 34 (Sept 2015), 1441–1454. Issue 9.
- M. Sharad, K. Yogendra, K-. W. Kwon, and K. Roy. 2013. Design of Ultra High Density and Low Power Computational Blocks Using Nano-magnets. Proceedings of the IEEE International Symposium on Quality Electronic Design (March 2013), 223–230.
- J. C. Slonczewski. 1996. Current-Driven Excitation of Magnetic Multilayers. Journal of Magnetism and Magnetic Materials 159 (June 1996), L1–L7. Issue 1-2.
- S. Srinivasan, V. Diep, and B. Behin-Aien. 2013. Modeling Multi-Magnet Networks Interacting Via Spin Currents. (2013). Retrieved Oct 10, 2014 from http://arxiv.org/abs/1304.0742.
- S. Srinivasan, A. Sarkar, B. Behin-Aein, and S. Datta. 2011. All-Spin Logic Device With Inbuilt Nonreciprocity. IEEE Transactions on Magnetics 47 (Oct 2011), 4026–4032. Issue 10.
- L. Su, Y. Zhang, J.- O. Klein, Y. Zhang, A. Bournel, A. Fert, and W. Zhao. 2015a. Current-Limiting Challenges for All-Spin Logic Devices. *Scientific Reports* 5, 14905 (Oct 2015).
- L. Su, Y. Zhang, D. Querlioz, Y. Zhang, J-. O. Klein, P. Dollfus, and A. Bournel. 2015b. Proposal for a Graphene-Based All-Spin Logic Gate. *Applied Physics Letters* 106 (Feb 2015). Issue 7.
- J. Z. Sun. 2000. Spin-Current Interaction With a Monodomain Magnetic Body: A Model Study. Physical Review B 62 (July 2000), 570–578. Issue 1.