# Enabling Efficient System Design Using Vertical Nanowire Transistor Current Mode Logic Joonseop Sim, Mohsen Imani, Yeseong Kim and Tajana Rosing UC San Diego, La Jolla, CA 92093, USA {j7sim, moimani, yek048, tajana}@ucsd.edu Abstract—Vertical Nanowire-FET (VNFET) is a promising candidate to succeed in industry mainstream due to its superior suppression of short-channel-effects and area efficiency. However, to design logic gates, CMOS is not an appropriate solution due to the process incompatibility with VNFET, which creates a technical challenge for mass production. In this work, we propose a novel VNFET-based logic design, called VnanoCML (Vertical Nanowire Transistor-based Current Mode Logic), which addresses the process issue while significantly improving power and performance of diverse logic designs. Unlike the CMOSbased logic, our design exploits current mode logic to overcome the fabrication issue. Furthermore, we reduce drain-to-source resistance of VnanoCML, which results in higher performance improvement without compromising the subthreshold swing. In order to show the impact of the proposed VnanoCML, we present key logic designs which are SRAM, full adder and multiplier, and also evaluate the application-level effectiveness of digital designs for image processing and mathematical computation. Our proposed design improves the fundamental circuit characteristics including output swing, delay time and power consumption compared to conventional planar MOSFET (PFET)based circuits. Consequentially our architecture-level results show that VnanoCML can enhance the performance and power by $16.4\times$ and $1.15\times$ , respectively. Furthermore, we show that VnanoCML improves the energy-delay product by $38.5 \times$ on average compared to PFET-based designs. ### I. INTRODUCTION Conventional planar transistors, called PFET, face the challenge in further scaling due to : (i) short channel effects on transistors and (ii) physical limitations of design rules such as complexity of metal routing and shortage of the distance from the metal contact to the gate [1]. To scale down the transistor, the fabrication process should address the inherently shortened distance between a gate and a contact (source/drain). In addition, the degradation of subthreshold swing (SS) should be also addressed due to increase of the leakage current and latency. To overcome the scaling issue, three dimensional (3D) gate structures such as FinFET [2], [3] and Nanowire-FET [4] have been proposed. Nanowire FET (NWFET) has many excellent characteristics for scaling such as low power, high density and steeper SS [5]. NWFET can be classified into two groups, called lateral and vertical scheme, according to the physical structure as depicted in Fig. 1a. Vertical-NWFET (VNFET) has higher area efficiency and better device performance especially at sub 7nm technology node than the lateral NWFET [6]. Thus, VNFET is considered a promising candidate to succeed in industry mainstream. 978-1-5386-2880-5/17/\$31.00 © 2017 IEEE A conventional way to create logic is the CMOS-based design. However, since the CMOS logic requires NMOS and *PMOS* on a single die, this leads to process incompatibility issue of VNFET, e.g., different dopants of NMOS and PMOS and gate overlap mismatch [7]. In this paper, we propose a novel VNFET-based logic design, called VnanoCML, that mitigates the incompatibility issue by exploiting current mode logic (CML) [8]. Since the CML only uses NMOS transistors for logic implementation unlike the conventional CMOS design, the proposed VnanoCML scheme highly reduces the fabrication overhead. In addition, utilizing the high density of VNFET, we dramatically decrease resistance of VNFET so that the proposed design can operate with low supply voltage. The proposed design improves the logic performance and also saves static power consumption which has been considered as an intrinsic limitation of CML. In order to show the effectiveness of our proposed design in diverse architecture layers, we present two key logic designs, SRAM and Arithmetic Logic Units (ALUs), and evaluate additional ASIC designs which exploit the VnanoCML logic. In our experiments, we show that the proposed VnanoCML improves performance and power consumption significantly for circuit and application levels. The main contributions of this paper are listed as follows: - To the best of our knowledge, this is the first design which addresses the process compatibility of VNFET devices by employing CML logic which uses only NMOS transistors. - Our VnanoCML design resolves the power penalty of conventional CML circuits by using the multiplication of VNFET which reduces supply voltage for a given operation current and makes VnanoCML more efficient. - Our experimental results show that for the same operating condition, VnanoCML can provide 16.4× speedup and 1.15× lower power consumption as compared to PFETbased logic. In addition, we show that the proposed design can be also a good candidate for approximate computing. ## II. RELATED WORK Innovation of the transistor structure has led to performance improvements. The FinFET structure, which takes up three dimensional gates, exhibits better static noise margin at lower supply voltage [9], [10], and has been successfully commercialized in industry. As a next generation of technology, NWFET, also known as Gate-All-Around (GAA) FET, is the most promising candidate for sub 10nm scale due to its excellent electrostatic property. Device-level characterization of NWFET has been widely investigated in [11], [12], demonstrating better suppression of the short channel effects than FinFET and PFET. For example, [12] compares NWFET to FinFET, and presents that NWFET shows improved SS and reduced Drain-Induced-Barrier-Lowering. Prior research has studied characteristics of VNFET as a future transistor for sub 7nm, since VNFET has significantly higher density than lateral NWFET [6]. Despite of the superior characteristics of VNFET, it has not been used for circuit/system design because of not being compatible with mass production. A nanowire fabrication method suitable for the mass production is top-down etching process as a counterpart of bottom-up growth process, since the topdown method can yield uniform dimension, better alignment between layers and shorter process time [13]. However, to produce the CMOS logic based on VNFET, the most challenging issue is junction formation due to harsh implantation process and the high variation during subsequent dopant activation [7]. Since the CMOS logic requires both *NMOS* and *PMOS*, each doping step which creates a junction should be processed separately. Inevitably, for processing the gate formation, this incurs gate overlap mismatch to channel regions between N/PMOS transistors. Worse still, the mismatch is more severe since the dopant of each N and P has different diffusion length variations. To address this issue, we propose a novel design which allows process compatibility of VNFET devices. Instead of using the CMOS-based design, we exploit current mode logic which only requires NMOS-type transistors. Since our design utilizes high density of VNFET, the proposed *VnanoCML* improves the performance of circuits significantly and also mitigates the power issue which has been considered as the main limitation of CML. # III. PROPOSED VnanoCML ### A. Current mode logic In our design of *VnanoCML*, we utilize current mode logic, in short CML, to address the fabrication issue of CMOSbased design. Current mode logic is a technology to construct integrated circuits. Unlike CMOS circuits, CML requires only NMOS transistors to build logics. This fact significantly mitigates the process incompatibility which happens in CMOSbased design. As discussed in Section II, using complementary logic creates two major fabrication issues, gate overlap mismatch and dopant variation. However, when producing one type of transistor, i.e., NMOS, only a single doping step is required for junction formation, and the subsequent gate process does not incur gate mismatch variations since the same dopant can be exploited for all transistors. The performance of CML is in general better than the complementary logic since the output swing voltage is lower, thus providing faster switching speeds [8]. To better describe these advantages, Fig. 1b shows a simple CML schematic. A CML consists of three key parts: (1) load resistance ( $R_{load}$ ), (2) constant current source ( $I_c$ ), and (3) pull down logic designed using NMOS transistors which handle the inputs (In and $\overline{In}$ ). The CML operates based on current differentials of the branch pairs which corresponds to an output Fig. 1. Nanowire-FET structures and current mode logic schematic voltage, i.e., Out and $\overline{Out}$ . For example, when current flows through the branch 1, the electric potential of Out is $V_{dd} - \triangle V$ where $\triangle V = I_c \times R_{load}$ , while the other branch keeps $\overline{Out}$ by $V_{dd}$ . Since the output voltage swing of a CML, i.e., $\triangle V$ , is less than $V_{dd}$ , the dynamic power consumption $P = C \triangle V^2 f$ is lower and the switching speed is consequently faster than a CMOS logic which produces the full swing range of $V_{dd}$ . One known issue of CML is the relatively high static power consumption, since current of a constant amount, $I_C$ , keeps flowing through at least one branch during the operation, consuming $P_{static} = V_{dd} \times I_C$ . A number of strategies have been proposed to overcome this issue. For example, [14] presented a new circuit design, called near-threshold circuits, which operates in a region of low voltage. However, this approach makes two subsequent problems for PFET-based CML. First, as the supply voltage lowers, the voltage difference between 1 and 0 becomes smaller. Since a output swing range of CML is narrow, the small voltage difference is difficult to distinguish in sensing circuits. Furthermore, since NMOS transistors of the pull down logic are serially connected to each other, the potential at output node, i.e., $V_{dd} - \triangle V$ , is split into each transistor. Hence, the applied voltage to each transistor is insufficient to operate them in the device saturation region. ### B. CML integration with VNFET The two issues discussed above are mitigated in our proposed VnanoCML due to VNFET characteristics. Fig. 2a illustrates the comparison of SS between PFET and VNFET, while the drain-to-source voltage is at 0.5V. $V_{gs}$ is the applied voltage to the gate of a transistor, and $I_{ds}$ is the current from drain to source. When using one nanowire (NW) for a VNFET transistor, the SS of the VNFET is steeper than PFET. The high on-off current ratio coming from the better SS allows CML to have higher differential capability. Moreover, VNFET can run at relatively small voltage due to its low resistance. Fig. 2b shows $I_{ds} - V_{ds}$ characteristics for different drain-to-source voltages, $V_{ds}$ , when the $V_{gs}$ is 0.5V. Compared to the PFET, we can obtain enough $I_{ds}$ to operate in CML, since the resistance of VNFET is smaller. In our *VnanoCML* design, we further utilize the characteristics of VNFET so that logic is more compatible with CML. The main advantage of VNFET is its high density. Thus, for the same area of the conventional PFET, we can implement multiple NWs (MNW) to provide smaller resistance, i.e., higher operational current at a given voltage. This provides various advantages, e.g., better output swing, reduced delay, Fig. 2. Trasfer curve analysis of PFET and VNFET and less power consumption. For example, Fig. 3 shows the comparison of PFET and the MNW-based VNFET design. In our experimental setup, that uses GPDK45 PEFT model, 25 NWs can be implemented in the area taken by a single PFET. In the integration of CML logic, we create serial connection in the pull down logic. Fig. 4 illustrates an example structure which integrates two transistors in top and vertical views. A MNW is connected to a silicon (Si) and a metal that form the source and drain respectively<sup>1</sup>. The MNW is surrounded by a gate in the channel region. As shown in Fig. 2b, the MNW with 25 nanowires can drop drain-to-source resistance ( $R_{ds}$ ) by 1/25. In addition, the increase of the number of nanowires does not change the characteristic of SS. The significant increase in the operation current makes VnanoCML more efficient due to two main advantages, i) higher output swing in CML, and ii) higher speed by the general characteristic of integrated circuits, $T \propto 1/I_C$ where T is the cycle period. # C. VnanoCML SRAM and ALU VnanoCML SRAM A single CMOS SRAM consists of two inverters and two pass transistors [17]. In our VnanoCML SRAM design, we replace all the six transistors of the CMOS SRAM with the VNFET transistors. Fig. 5a shows the design of a VnanoCML inverter. As discussed in Section III-B, since the proposed VnanoCML has steep SS and high operation current due to the wire multiplication, the SRAM which uses Fig. 3. Layout comparison between PFET and VNFET Fig. 4. VNFET structure used in VnanoCML the *VnanoCML* inverters has high differential ability, i.e., high static noise margin. In Section IV-B, we evaluate the static noise margin of the proposed SRAM design in detail. VnanoCML ALU Fig. 5b and c show the design of a VnanoCML one-bit full adder (FA). The one-bit FA has two components, sum and carry-out. Fig. 5 shows the design of each part based on VnanoCML. In the pull down logic, three stacked transistors and current source $I_c$ are serially connected. For both sides of sum and carry-out, $300K\Omega$ resistors are loaded into the pull up network (denoted R) to balance the resistance of the pull down logic. Based on the one-bit FA, we designed an n-bit FA and an n-bit multiplier. For the nbit FA, we serially put the one-bit FAs as shown in Fig. 5d. Fig. 5e illustrates the n-bit multiplier design which uses a shift logic gate beside an n-bit FA. Given two n-bit operands, the shift gate takes each bit of the first operand and produces n bits of either shifted n bits of the second operand or all 0s. Then, the n-bit FA accumulates the output of the shift logic to compute the final output. We verify the detailed operation of the designed VnanoCML ALUs with a comparison with PFET-based CML in Section IV-B. ### D. Application design using VnanoCML Since the *VnanoCML* circuits are compatible with CMOS-based logic in terms of the logic functionalities, *VnanoCML* can be easily integrated with various system components such as memory and processors. In addition, as discussed in Section III-A, the use of CML mitigates the fabrication issues, and thus the *VnanoCML* can be a practical and viable solution for general architecture designs. To investigate how the *VnanoCML* circuits perform in the architecture level, we design *VnanoCML*-based ASICs. For many ASIC designs, which mostly compute and assimilate a stream of data adder, the adders and multipliers are the main building blocks. In addition, more complex arithmetic computation, e.g. square root, also can be approximated using the blocks. Thus, we replaced the logic gates of the ASICs using *VnanoCML* ALUs. An important aspect that we have also considered for designing circuits is process variation which most of today's technology suffers. In small feature size, process variation degrades the stability of design and increases the failure rates. To avoid the impact of process variation on computation <sup>&</sup>lt;sup>1</sup>Schottky contact issue can be eliminated by either silicide process or appropriate metal selection whose work-function is similar to silicon (≃4.05eV) [15], [16]. Fig. 5. VNANOCML ALU design accuracy, designers consider the variation bounds to guarantee the correct functionality even in the worst-case scenarios. To take explicitly into account the process variation, we use Monte Carlo simulation with 10% Gaussian distribution $(3\sigma=10\%)$ on the transistor gate length and diameter. In circuit and logic structures, the output signals are sampled in a specific moment defined by the clock frequency. Thus, given the process variation, we set the clock frequency, $f_{ref}$ , so that it guarantees the correct functionality even for the circuit which has the longest output delay. Note that, increasing the clock frequency above $f_{ref}$ may result in incorrect signal sampling for a part of circuits, thus degrading the accuracy of application outputs. However, for error-tolerant applications, such accuracy degradation would be acceptable and compensated by the speedup. For example, in multimedia and vision applications, the accuracy is limited by human ability to perceive and respond. In addition, there are some applications which are stochastic in nature, e.g., machine learning algorithms [18], [19]. To explore these feasibility of application approximation, in Section IV-C, we also evaluate the ASIC designs by relaxing the design constraint. ## IV. EXPERIMENTAL RESULTS # A. Experimental Setup We used BSIMCMG model [20] for VNFET and GPDK45 model [21] for the conventional PFET to estimate power and performance. We compute performance and energy consumption of the proposed design from circuit-level simulations with Cadence Virtuoso and Spectre simulators. For both technologies, we use gate length of 45nm. For fair comparison, the gate width and diameter of VNFET is set to 120nm and 40nm respectively. In this configuration, a VnanoCML transistor can have 25 NWs at most as discussed in Section III-B. One minor issue of VnanoCML fabrication would be the formation of multiple NWs since the edge line of a NW array may not be well formed due to dry etch damage caused by the reactive ion etching process. However, several solutions such as inserting sacrificial wire and optical proximity correction method [22] can highly minimize the damage. Moreover, even considering a severe case that the last rows of NW array are damaged, only around 15% of the resistance degradation occurs, still providing sufficiently lower resistance compared to PFET-based design. We evaluate the efficiency and functionality of the proposed *VnanoCML* circuits, compared to the PFET-based CML logic. In order to better show the practical value of the *VnanoCML* design, we also experiment four ASIC designs running different applications: *Sobel*, *Robert*, *Blackscholes* and *FFT*. For image processing applications (*Sobel* and *Robert*), the input data have been randomly chosen from Caltech 101 Library [23]. For the other applications (*Blackscholes* and *FFT*), the input data are given by streaming randomly generated data. Each ASIC design has been implemented using the *VnanoCML* circuits in Verilog RTL. We extract performance, switching activity and accuracy of each application during post-synthesis simulations with *ModelSim*. # B. Circuit-level efficiency of VnanoCML As discussed in Section III-A, one technical challenge in using CML is that it has lower output swing than CMOS logic, making voltage sensing difficult. To understand how the proposed VnanoCML exhibits differential ability for the output voltage swing at the circuit level, we first evaluate the static noise margin (SNM) of VnanoCML SRAM described in III-C. Fig. 6a shows that VnanoCML SRAM has higher SNM than the circuits using PFET for all the evaluated range of supply voltage. To explain the SNM difference in detail, Fig. 6b and 6c demonstrate the transfer curves of PFET-based CML and the VnanoCML SRAM, respectively. The results show that the PFET-based SRAM circuit has no SNM with $V_{dd}$ =0.5V. In contrast, a VnanoCML SRAM shows sufficient SNM for the same $V_{dd}$ , and the SNM is still acceptable even at $V_{dd}$ =0.4V. We also verify the differential ability of the *VnanoCML* full adder. Fig. 7 presents the transmission waveforms of the carry-out circuit. For all simulations we use the input waveforms illustrated in Fig. 7a. In order to quantify the sensing ability of different design approaches, we define a distance between lowest upper and highest lower signal as *Output Swing (W denoted in Fig. 7b)*. The larger the output swing, the better their differential ability. As shown in Fig. 7b, PFET-based circuit shows the output swing of 4.63mV at $V_{dd}$ =0.4V. In full adder design which uses one NW, it shows slightly better output swing of 16.25mV for the same $V_{dd}$ . In our *VnanoCML* design which uses 25 NWs, the output swing is 344mV for the same $V_{dd}$ , i.e., guaranteeing sufficient sensing resolution. Fig. 8 shows the comparison of PFET-based CML and VnanoCML. Fig. 8a summarizes the results of the output swing. The results show that to achieve the same output swing of 100mV, the available supply voltage $V_{dd}$ are 0.58V and 0.47V, and 0.36V for the PFET-based, one NW-based, and VMTCML circuits, respectively. In Fig. 8b, we compare the constant current, $I_c$ , for PFET-based CML and VMTCML. The proposed adder has higher $I_c$ value than PFET-based adder due to its lower drain-to-source resistance. Note that, in CML Fig. 6. SNM comparison of CML-based SRAM Fig. 7. Waveform of transmission in CML Full Adder circuits, the performance increases as the current grows. Thus, this result implies that the VMTCML design exhibits better performance than PFET-based CML. This fact is observed in Fig. 8c which presents the delay from an input signal to an output signal of the carry-out circuit, which is the critical path delay of the full adder. The results show that VnanoCML achieves lower delay than PFET-based circuit due to the lower device resistance which is a key factor in signal delay. The VMTCML adder achieves 45.6× lower delay time on average for the tested $V_{dd}$ range. The performance improvement enables better power efficiency. Fig. 8d illustrates this observation. For example, to drive a same current level, $I_c=4uA$ , i.e., same performance, VnanoCML presents 1.16× better power efficiency than the PFET case. Table I summarizes the performance comparison of VnanoCML with different technologies including CMOS and FinFET. Our result shows that VnanoCML outperforms all other technologies in terms of performance. ### C. VNFET Efficiency in Application In this section, we evaluate the impact of VnanoCML on the architecture level using four ASIC designs. We compare to PFET-based design for two scenarios, i) when both technology consume the same power ii) provide the same performance. Figure 9a first shows the comparison for the same power consumption. We adjust $V_{dd}$ to produce the same power Fig. 8. Comparison of VnanoCML to PFET-based CML for full adder design TABLE I PERFORMANCE COMPARISON OF VnanoCML WITH DIFFERENT DEVICE TECHNOLOGIES | Vdd (V) | Delay (ns) | | | | | | | |---------|------------|----------|------------|----------|--|--|--| | | CMOS_PFET | CML_PFET | CML_FinFET | VnanoCML | | | | | 0.5 | 494.4 | 207.0 | 209.3 | 18.0 | | | | | 0.6 | 125.9 | 198.1 | 149.3 | 4.9 | | | | | 0.7 | 44.8 | 162.6 | 60.7 | 2.2 | | | | | 0.8 | 22.9 | 83.7 | 35.4 | 1.4 | | | | | 0.9 | 15.2 | 49.9 | 25.0 | 1.0 | | | | | 1 | 11.5 | 34.7 | 19.8 | 0.9 | | | | consumption. The results show that *VnanoCML*-based ASICs achieves significantly higher performance than the designs which use PFET-based CML. This advantage is due to lower resistance of VNFET which provides higher I<sub>C</sub> current at lower supply voltage. For example, the proposed VnanoCML ASICs can achieve on average 16.4× performance speedup compared to the PFET-based design at the same power consumption. This significant performance improvement stem from the low resistance characteristic of our design, which creates high operation current and consequent high speed. Figure 9b compares the power efficiency of the two designs when the performance is controlled to the same level by adjusting $V_{dd}$ . The result shows that the VnanoCML can also achieve higher power efficiency than the PFET-based design. For example, our design can provide 1.15× improvement in terms of average power consumption for the four applications. Since the proposed design provides better SS and lower resistance than PFET, the device can work on lower supply voltage while providing the same current. As discussed in Section III-D, although ASICs can be performed precisely under $f_{ref}$ which considers the worst case circuit delay, we may further improve the design efficiency for error-tolerant applications by relaxing the clock frequency constraint. Figure 10 compares the energy-delay product (EDP) of the four ASICs on supply voltage of 0.6V for different error rates adjusted by increasing the clock frequency above $f_{ref}$ . Since the process variation simulation creates a distribution (a) Same power (b) Same performance Fig. 9. Normalized energy consumption and execution time of PFET-based design and VnanoCML at the same performance and power consumption. Fig. 10. Impact of error rates on EDP improvement TABLE II **OUALITY LOSS OF APPLICATIONS FOR DIFFERENT LEVELS OF** APPROXIMATION | Error Rate (α) | 10% | 8% | 6% | 4% | 2% | 1% | |----------------|--------------|--------------|--------------|--------------|--------------|--------------| | Sobel | 17 <i>dB</i> | 23 <i>dB</i> | 32 <i>dB</i> | 40 <i>dB</i> | 54 <i>dB</i> | 59 <i>dB</i> | | Robert | 25 <i>dB</i> | 34 <i>dB</i> | 38 <i>dB</i> | 44 <i>dB</i> | 51 <i>dB</i> | 62 <i>dB</i> | | BlackScholes | 13.1% | 10.6% | 8.3% | 5.2% | 2.1% | 0.9% | | FFT | 10.5% | 7.7% | 5.3% | 3.3% | 2.6% | 1.0% | function of the circuit delay, we define the error rate $\alpha\%$ by upper $(100 - \alpha)$ percentile of the distribution function. The clock frequency is chosen by the delay at the percentile. Our evaluation shows that, using the precise ASICs which set the clock frequency by $f_{ref}$ , the EDP of VnanoCMLbased design dramatically outperforms the PFET-based design, by $38.5\times$ on average for the four ASICs. In addition, if we allow the application approximation at $\alpha = 10\%$ , the EDP improvement is $48.3 \times$ . In fact, our experiment shows that the approximated results would be still acceptable. Table II summarizes the quality loss of each application for different approximation levels. For image processing applications, the quality loss is defined by the Peak Signal-to-Noise Ratio (PSNR), and for the other applications, Average Relative Error (ARE) is used as a quality loss metric. It is known that 30dB of PSNR and 10% of ARE are acceptable quality loss [24]. The results show, for all the four ASIC designs, provide acceptable accuracy when $\alpha = 6\%$ . For this acceptable accuracy, our design achieves 1.6× EDP improvement compared to the precise ASICs of VnanoCML. # V. CONCLUSION We have presented a novel design which allows process compatibility of VNFET devices by utilizing the current mode logic. The proposed design also addresses the existing power issues of CML circuits using the high density of VNFET. The experimental results show that, as compared to conventional design, the proposed logic can achieve 16.4× speedup and $38.5 \times$ EDP improvement for the four ASICs. Furthermore, our design shows advantage to approximate computation of error-tolerant applications. Compared to PFET-based approximation, our design achieves 48.3× EDP improvement while guaranteeing acceptable quality loss. ### VI. ACKNOWLEDGMENT This work was supported by NSF grants 1730158 and 1527034. ### REFERENCES - [1] V. Moroz et al., "Modeling the impact of stress on silicon processes and devices," Materials Science in Semiconductor Processing, 2003. - B. Yu et al., "Finfet scaling to 10 nm gate length," in Electron Devices Meeting, 2002. IEDM'02. International, IEEE, 2002. - M. Imani et al., "Hierarchical design of robust and low data dependent finfet based sram array," in Nanoscale Architectures (NANOARCH), 2015 IEEE/ACM International Symposium on, pp. 63-68, IEEE, 2015. - S. Bangsaruntip et al., "High performance and highly uniform gate-allaround silicon nanowire mosfets with wire size dependent scaling," in Electron Devices Meeting, 2009 IEEE International, IEEE, 2009. - C. Pan et al., "Technology/system codesign and benchmarking for lateral and vertical gaa nanowire fets at 5-nm technology node," IEEE Transactions on Electron Devices, 2015. - U. K. Das et al., "Limitations on lateral nanowire scaling beyond 7-nm node," IEEE Electron Device Letters, 2017. - B.-H. Lee et al., "A vertically integrated junctionless nanowire transistor," Nano letters, 2016. - M. Yamashina et al., "An mos current mode logic (mcml) circuit for lowpower sub-ghz processors," IEICE Transactions on Electronics, 1992. - T. Park et al., "Static noise margin of the full dg-cmos sram cell using bulk finfets (omega mosfets)," in Electron Devices Meeting, 2003. IEDM'03 Technical Digest. IEEE International, IEEE, 2003. - [10] G. Pasandi et al., "A new low-power 10t sram cell with improved read snm," IJE, vol. 102, no. 10, pp. 1621-1633, 2015. - B. Yu et al., "A unified analytic drain-current model for multiple-gate mosfets," *IEEE Transactions on Electron Devices*, 2008. S. Kim *et al.*, "Mugfet," 2008. - [13] K.-S. Im et al., "Fabrication of normally-off gan nanowire gate-allaround fet with top-down approach," Applied Physics Letters, 2016. - A. o. Shapiro, "Mos current mode logic near threshold circuits," Journal of Low Power Electronics and Applications, 2014. - C. Chuang et al., "Fabrication and properties of well-ordered arrays of single-crystalline nisi2 nanowires and epitaxial nisi2/si heterostructures," Nano Research, 2014. - J. Kedzierski et al., "Complementary silicide source/drain thin-body mosfets for the 20 nm gate length regime," in Electron Devices Meeting, 2000. IEDM'00. Technical Digest. International, IEEE, 2000 - [17] M. Imani et al., "Low power data-aware stt-ram based hybrid cache architecture," in Quality Electronic Design (ISQED), 2016 17th International Symposium on, pp. 88-94, IEEE, 2016. - [18] M. Imani et al., "Acam: Approximate computing based on adaptive associative memory with online learning," in International Symposium on Low Power Electronics and Design, 2016. - [19] M. Imani et al., "Exploring hyperdimensional associative memory," in High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on, pp. 445-456, IEEE, 2017. - "BSIMCMG." http://bsim.berkeley.edu/models/bsimcmg/. - "GPDK45." https://support.cadence.com/apex/ArticleAttachmentPortal? id=a1Od00000051TqEAI&pageName=ArticleContent. - J.-R. Gao, X. Xu, B. Yu, and D. Z. Pan, "Mosaic: Mask optimizing solution with process window aware inverse correction," in Proceedings of the 51st Annual Design Automation Conference, pp. 1-6, ACM, 2014. - "Caltech http://www.vision.caltech.edu/ImageDatasets/ Caltech101/. - M. Imani et al., "Resistive configurable associative memory for approximate computing," in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016, pp. 1327-1332, IEEE, 2016.