# Demonstration of intrinsic STDP learning capability in all-2D multi-state $MoS_2$ memory and its application in modelling neuromorphic speech recognition

Tathagata Paul,<sup>1\*§</sup> Akshaya A. Mukundan,<sup>2§</sup> Krishna Kanhaiya

<sup>1</sup>Department of Physics, Indian Institute of Science, Bangalore 560012,

India.<sup>2</sup> Department of Electronic Systems Engineering, Indian Institute of Science, Bangalore 560012, India.<sup>3</sup> Visva Bharati University Santiniketan, West Bengal 731235,

India. <sup>4</sup>Centre for Nanoscience and Engineering Indian Institute of Science, Bangalore 560012, India.

The human brain can be characterized by its large number of adaptive synapses, connecting billions of neurons capable of both learning and perceiving the environment. Neuromorphic computing, based on brain-inspired principles, is a promising technology, to build low-power, distributed, fault-tolerant intelligent systems mainly for perception tasks. Here, we demonstrate the intrinsic capability of floating gate (FG) MoS<sub>2</sub> device (MoS<sub>2</sub> FG-FET) to model the spike time dependent plasticity (STDP) learning rule that is based on the transient response of the  $MoS_2$  channel to spikes applied to the source and gate leads. We implemented the STDP learning protocol in a neuromorphic speech recognition system (NSRS), inspired by the human auditory pathway, for various auditory recognition tasks. Our proposed NSRS consists of a cochlea model, an unsupervised feature learning stage and a simple linear classifier. The unsupervised learning stage uses the biologically plausible STDP learning in novel two-dimensional MoS<sub>2</sub> FG-FET memory which circumvents the requirement of any other learning circuitry.

Keywords: Neuromorphic computing, electronic cochlea, brain-inspired learning, emerging devices, beyond CMOS, MoS<sub>2</sub> memristors

#### INTRODUCTION I.

von-Neumann architecture, introduced by John von Neumann in 1945, is the most widely used architecture in modern computing devices<sup>[1]</sup>. A key feature of this framework is the utilization of dedicated hardware components for computation and storage. Improvements in fabrication capabilities have resulted in an increase in the number of transistors per microchip, augmenting both the speed of computation and storage space available. However, this architectural framework wih a separation between the computation and storage units makes it inefficient at solving problems that require simultaneous processing of large amounts of data. These mostly include ill-defined constructs such as image, speech or pattern recognition and necessiate a learning capability similar to that in a biological brain. Meeting the computational challenges posed by problems of an ill-defined nature necessiates an architecture capable of in-memory computing, analogous to the synapses in a brain<sup>[2]</sup>. Silicon-based synapses with learning capabilities like spike time dependent plasticity (STDP) (the variation of synaptic conductance with time difference  $(\Delta t)$  between pre- and postsynaptic spikes) have been demonstrated [3-5]. However, integration complexities due to short-channel effects and high switching power in devices based on silicon nanotechnology, makes it fundamentally difficult to attain the synaptic density (~  $10^{10}$  synapses per cm<sup>2</sup> as in the human brain) and energy efficiency necessary for neuromorphic computing.

Investigating an alternative hardware platform based on two-dimensional (2D) materials has certain advantages over  $silicon^{[6]}$ . The atomic confinement leads to strong gate coupling, suppresses short-channel effects and reduces device footprints<sup>[7–9]</sup> which is ideal for large neuromorphic systems. Furthermore, advancement of fabrication techniques allows for the vertical and lateral stacking of different 2D materials into heterostructures with clean interfaces. This introduces flexibility in device design resulting in unique device architectures for electronics<sup>[10–14]</sup>, optoelectronics<sup>[15–18]</sup> and memory applica-tions<sup>[12,19–22]</sup> with functionalities often surpassing those of silicon-based solutions. MoS<sub>2</sub> is of particular interest in this context because of its large intrinsic band gap<sup>[23]</sup>, high ON/OFF ratio<sup>[12,21]</sup>, near-ideal subthreshold swing<sup>[21,24]</sup> and high carrier mobility<sup>[25]</sup>. These properties make MoS<sub>2</sub> an ideal candidate for low-power inmemory computing applications for edge devices  $^{[26-28]}$ .

Several architectures for 2D memory have already been explored. These include, intercalation of ions using liquid gates in  $MoO_3^{[29]}$  and  $WSe_2^{[30]}$ , filament formation in  $MoS_2^{[31]}$  and defect induced resistive switching in h- $BN^{[32,3\bar{3}]}$ . However, these methods either use ionic liquids which are environment sensitive and degrade with time or are dependent on the defect densities in the nanosheets which are difficult to control. There have also been reports of 2D materials with STDP capabilities using copper filament formation in MOCVD grown  $MoS_2$  flakes<sup>[31]</sup>. Although the vertical geometry reduces device footprint,

Tiwari,<sup>3</sup> Arindam Ghosh,<sup>1,4</sup> and Chetan Singh Thakur<sup>2\*</sup>

<sup>&</sup>lt;sup>§</sup>Equal contribution

<sup>\*</sup>e-mail:tathagata@iisc.ac.in, csthakur@iisc.ac.in



FIG. 1: Neuromorphic speech recognition system (NSRS). a, Block diagram of the proposed NSRS compared with its biological counterpart. The cochlear processing unit filters various frequency components from the raw audio samples and performs nonlinear transduction, to mimic basilar membrane (BM) and inner hair cells (IHC) functions respectively. Output spectrogram of digits 0-9 is shown. Poisson neurons model the spiral ganglion cell (SGC) and features are extracted through the STDP Layer to the output neurons. The postsynaptic potential at the output neurons are used to perform speech recognition task on TIDIGITS isolated digits. b, Typical shape of a bio-realistic spike used in the STDP layer. c, Schematic of the MoS<sub>2</sub> FG-FET used for STDP learning. Details related to the MoS<sub>2</sub> FG-FET and *bio-realistic* spike are presented in the main text.

the presence of copper filaments in both the low (LRS) and high (HRS) resistance states leads to a low ON/OFF ratio and reduces the range of accessible conductance states. In this work, we employ an all-2D architecture with a floating gate  $MoS_2$  device from multilayer van der Waals heterostructure, where the operation of the device is entirely based on tunneling of charge across the layers. While 2D material-based FG memory has been demonstrated before, their usage was restricted to that of passive memristors describing analog memory behaviour<sup>[19,21,22,26,34–36]</sup>. Notably, Kim *et al.*<sup>[37]</sup> demonstrated STDP in a FG carbon nanotube transistor. The authors achieve this by utilizing a 3T-Synapse, with the FG transistor acting as an analog memory. The current work demonstrates the inherent STDP learning capabilities in a single solid-state  $MoS_2$  FG transistor without any additional circuit overheads. The introduction of inherent learning capabilities like STDP, reduces hardware complexity by avoiding additional learning circuitry and is an important step towards developing emergent device platforms for neuromorphic applications.

We have considered the neuromorphic speech recognition system (NSRS), shown in Fig. 1a, to illustrate the utility of intrinsic STDP response in  $MoS_2$  FG devices. The NSRS is inspired by the human auditory pathway, which represents a neuro-physical action where the cochlea transforms the incoming acoustic signal to a time-frequency map, also called a spectogram (schematic of the process is in Fig. 1a). The cochlea is a fluidfilled, spiral structure that transduces the mechanical vibrations received into waves, which vibrate the basilar membrane (BM). The movement of the BM varies at different locations within the cochlea, since the stiffness of the membrane gradually decreases from its basal end to the apex. Inner Hair Cells (IHCs) are the transducers, that help to convert the sound generated motion of the cochlea, to neurotransmitter release at synapses which excite the primary auditory neurons. After the cochlea, the second major transformation occurs at primary auditory cortex, where more complex processing takes place, such as high-level feature extraction followed by accoustic tasks such as speaker recognition, sound segregation, denoising, voice activity detection. This high-level feature extraction stage involves the learning of synaptic connections between the neurons in the cortical areas. The synaptic plasticity is the ability to strengthen or weaken synapses, based on the order of occurrence of the pre- and post-synaptic neuronal spikes. The synapses are increased in strength if presynaptic spikes repeatedly occur before postsynaptic spikes, within a range of few milliseconds, otherwise they are decreased in strength for the opposite temporal  $\operatorname{order}^{[38]}$ .

The NSRS depicted in Fig. 1a can be broadly divided into three blocks. The first stage, or the cochlea, converts the sound waves into neural spikes that act as inputs

for the feature learning stage. The second stage comprises of a neural network. An important aspect of this stage is the inherent plasticity of the synaptic transistors which is used to model the STDP learning rule. We demonstrate here that individual MoS<sub>2</sub> FG-FET is capable of exhibiting such STDP learning capability, that will allow for unsupervised feature learning within the MoS<sub>2</sub> FG-FET without the necessity of additional learning circuitry. In the final stage, the classification performance of the system is checked for a target application using a linear classifier. The key advantage of realizing STDP capability on an all-2D MoS<sub>2</sub> FG-FET is that it is based entirely on the gate-controlled charge transfer across different lavers of a van der Waals heterostructure, and thus highly tunable both in response time and magnitude. This reflects in the maximum conductance change ( $\Delta G$ %) observable in STDP behaviour to be  $\geq 100\%$  for MoS<sub>2</sub> FG device (Fig. 3e and Fig. 6a). Larger changes in conductance allow for distinct, separated states and improve the temporal sensitivity of the STDP response<sup>[39]</sup>. The number of states attainable in a multi-state memory device is governed by two factors, the range of accessible conductance values and the temporal stability of each state. The former provides a choice of conductance values for possible memory levels while the latter defines how closely spaced these levels can be. Apart from quantum confined states such as those observed in the quantum hall effect<sup>[40]</sup>, memory levels are not perfectly stable and are prone to conductance decay, which occur, for example, in FG memory devices, due to dielectric relaxation, leakage of carriers through defects in the dielectric, thermal excitation etc.<sup>[41]</sup>. A larger operation range in conductance facilitates a larger choice of states which corresponds to a larger choice of time difference  $\Delta t$  between the pre- and post-synaptic spikes for the STDP response. This makes it possible to have distinct states which are separated by small differences in  $\Delta t$  and hence, a better temporal sensitivity. The methodology used for determining the number of states in the current work is presented in Section IIC.

The manuscript is organized as follows. In Section II A, we describe the electrical characterization and STDP response in  $MoS_2$  FG-FET. Section II B outlines the performance of the NSRS trained using the inherent STDP of the  $MoS_2$  synapse and Section II C investigates the effects of device-to-device mismatch on the performance of the NSRS. We conclude with a brief discussion of the main observations and future possibilities.



FIG. 2: Electrical characterization of  $MoS_2$  FG-FET. a, Optical micrograph of a typical  $MoS_2$  FG-FET with individual layers outlined. b, and c, show the transfer and output characteristics, respectively, for  $MoS_2$  FG-FET. Solid lines denote forward sweep directions (from -ve to +ve values) and dashed lines reverse sweep directions (from +ve to -ve values) in both graphs.

## II. EXPERIMENTAL RESULTS

#### A. STDP in the MoS<sub>2</sub> FG-FET

Fig. 2a shows the optical micrograph of a typical MoS<sub>2</sub> synaptic device with dashed lines outlining the component atomic layers. The device geometry consists of a top  $MoS_2$  layer, the channel, and bottom few layer graphene (FLG) FG, which are separated by a h-BN tunnel bar-The complete stack is placed on an  $Si^{++}/SiO_2$ rier. (285 nm) substrate acting as the back-gate (Fig. 1c). We fabricated the device using a dry stacking technique to assure clean interfaces for better performance. Details related to the fabrication technique are provided in the Methods Section (Section IV). We demonstrate the transfer and output characteristics of typical MoS<sub>2</sub> FG-FET devices in Fig. 2b and c, respectively. The transfer characteristics (Fig. 2b) are obtained by sweeping the back gate voltage  $(V_{bq})$  while maintaining a constant source-drain bias  $(V_{sd})$  of 0.05 V. The output characteristics (Fig. 2c) are measured by varying the  $V_{sd}$  at a constant  $V_{bq} = 0$  V. All MoS<sub>2</sub> FG-FET devices studied in this work demonstrate a large hysteresis in both the transfer (Fig. 2b) and output characteristics (Fig. 2c) with distinct low resistance (LR) erase and high resistance (HR) program states. The hysteresis is known to originate from field assisted tunneling and subsequent trapping of charges in the FG due to the applied gate or drain bias<sup>[21]</sup>. A temporal control over the charge tunneling leads to analog memory behaviour as outlined in Ref. [21] and Supplementary Section [III]. The resulting conductance states are a pre-requisite for demonstrating learning mechanisms like STDP. The devices also demonstrate near ideal subthreshold swing of  $\sim 85 \text{ mV/decade}$  for over three decades of current and large ON/OFF ratios of ~  $5 \times 10^5$  (Supplementary Section VII).

Our previous work<sup>[21]</sup> demonstrated the synaptic properties like pulsed potentiation and depression, and paired pulse potentiation (PPF) using the current device geometry. However, the demonstration of STDP used a combination of multiplexer and the FG device, resulting in additional circuit overhead as well as increased complexity. The current work rectifies these issues by demonstrating an intrinsic STDP learning rule in  $MoS_2$  FG-FET using the method of overlapping pulses<sup>[39,42]</sup>. As previously reported<sup>[42]</sup>, the shape of the spike plays a very important role in determining the nature of STDP response. The MoS<sub>2</sub> FG-FETs demonstrate STDP response for different pulse shapes (Supplementary Section [IV]), however, bio-realistic pulses with a functional form similar to action potentials in the nervous system<sup>[38]</sup> (Fig. 1b) result in STDP responses that



FIG. 3: Spike time dependent plasticity (STDP) in MoS<sub>2</sub> FG-FET.a, Circuit configuration for STDP measurements. Gate and source are considered as the pre- and post-synaptic terminals, respectively. The arrows show the direction of tunnel current ( $I_{tunnel}$ ) for positive and negative  $\Delta t$ . Top and middle horizontal panels of **b**, and **c**, depict the pre- and post-synaptic spikes for a -ve and  $+ve \Delta t$ , respectively. The bottom panel shows the resulting overlap potential ( $V_{pre} - V_{post}$ ). Vertical panels (I-IV) at the bottom of the figures demonstrate the band configuration for four different regimes of the overlap potential. Magnitude and direction of the tunneling current  $I_{tunnel}$  are indicated by the size and direction of black arrow at the base of each panel, respectively. **d**, Temporal response of the MoS<sub>2</sub> FG-FET to a pair of pre- and post-synaptic spikes.  $I_{initial}$  and  $I_{final}$  represent the channel current before and after the application of the spikes, respectively, for a  $V_{read} = 0.05$  V. **e**, Graph showing the variation of  $\Delta G$ % as a function of  $\Delta t$ . The experimental data is indicated by open circles while the dashed line is an exponential fit following Eq. 2.

are comparable to biological systems and is the pulse shape of choice for this work.

Fig. 1b shows a typical spike which can be modelled in the following fashion. The spike onset is marked by the time-period  $t_{ail}^+$  (= 10 µs) when an exponential rise to the positive peak amplitude of  $A_{mp}^+$  from the rest potential (0 V) commences with a small time constant ( $\zeta^+ = 3 \mu s$ ). This is followed by a sharp decrease in the voltage to the negative peak amplitude of  $-A_{mp}^-$  at which time a slow exponential decay with large time constant ( $\zeta^- = 330 \ \mu s$ ) takes the spike voltage to its resting value over a time period  $t_{ail}^-$  (= 1 ms). The functional dependence of  $V_{spike}$  is given by<sup>[42]</sup>

$$V_{spike}\left(t\right) = \begin{cases} A_{mp}^{+} \frac{e^{\frac{t}{\zeta^{+}}} - e^{\frac{-t_{ail}^{+}}{\zeta^{+}}}}{\frac{-t_{ail}^{+}}{\zeta^{+}}}, & if - t_{ail}^{+} < t < 0\\ 1 - e^{\frac{-t_{ail}^{-}}{\zeta^{+}}}, & -A_{mp}^{-} \frac{e^{\frac{-t_{ail}}{\zeta^{+}}}}{\frac{e^{\frac{-t}{\zeta^{-}}}}{\zeta^{-}}}, & if \ 0 < t < t_{ail}^{-} \end{cases}$$
(1)

To measure the STDP response, spikes are applied to the pre-  $(V_{pre})$  and post-  $(V_{post})$  synaptic terminal (source and gate, respectively) of the  $MoS_2$  FG-FET (Fig. 3a) with a pre-defined time difference ( $\Delta t =$  $t_{post}-t_{pre}$ ) as depicted in Fig. 3b, c. A small read voltage  $(V_{read} = 0.05 \text{ V})$  is also applied to measure the changes in channel current, which reflects the change in conductance (Details in the Methods Section (Section IV)). Fig. 3d shows the time series response of a typical  $MoS_2$  FG-FET to pre- and post-synaptic spike pairs with different  $\Delta t$  values.  $I_{inital}$  and  $I_{final}$  represent the channel current before and after the application of the pre- and post-synaptic spikes. The measurements are performed by applying pre- and post-synaptic spikes with  $A_{mp}^+/A_{mp}^-$  values 6.5 V/7 V and 7 V/6.5 V, respectively. The channel conductance increases (decreases) for positive (negative)  $\Delta t$  values with larger changes for smaller time differences, mimicking the observations in biological synapses following STDP protocol<sup>[38]</sup>. The STDP behaviour is also demonstrated by plotting the percentage change in device conductance  $\Delta G \% = (I_{final} - I_{inital})/I_{initial}$  (computed from the time series data shown in Fig. 3d) as a function of  $\Delta t$  which is depicted in Fig. 3e for a starting conductance of 6  $\mu$ S. We observe an exponential reduction in  $\Delta G\%$  with increasing  $\Delta t$  which is modelled as follows

$$\Delta G\% \propto \begin{cases} e^{\frac{-\Delta t}{\tau_{+}}} , & if \quad \Delta t \geq 0\\ e^{\frac{\pm \Delta t}{\tau_{-}}} , & if \quad \Delta t \leq 0 \end{cases}$$
(2)

where  $\tau_+$  and  $\tau_-$  denote the timescales for which the synaptic device remains sensitive to the pre- and post-synaptic spiking.

The microscopic origin of the plasticity of the device (Fig. 3e) can be explained from the time dependent overlap potential ( $V_{pre} - V_{post}$ ) of the pre- and post-synaptic spikes (Fig. 3b and c). We observe that the tunneling of charges and consequent changes in synaptic weights are only possible when this voltage is above a threshold voltage. Negative values of  $\Delta t$  (Fig. 3b) result in an increase of the overlap potential above  $V_{th_{-pos}}$  leading to a decrease in channel conductance while for positive  $\Delta t$  values (Fig. 3c) the opposite scenario occurs resulting in a decreased channel resistance. The positive ( $V_{th_{-pos}}$ ) and negative ( $V_{th_{-neg}}$ ) threshold voltages are indicated by dashed lines in the bottom panel of Fig. 3b and c.

The overlap potential can be broadly divided in four regions. Fig. 3b and c demonstrates this for a negative and positive  $\Delta t$ , respectively. Since both the scenarios can be explained by similar arguments, we limit ourself to the discussion of negative  $\Delta t$  values (Fig. 3b). Region I shows the band alignment before the arrival of pre- or post-synaptic spike with both pre- and postsynaptic potentials at their resting values. The  $MoS_2$ channel has a finite conductance indicating some effective positive charge on the FG. However, the potential across the h-BN tunnel barrier is insufficient for the tunneling of charges leading to a negligible tunnel current  $(I_{tunnel})$ . This is indicated by the small arrow at the base of the panel. The size and direction of the arrow represents the magnitude and direction of  $I_{tunnel}$ , respectively. Region II denotes the arrival of the post-synaptic spike with the pre-synaptic spike at rest-value. The positive voltage on the source terminal reduces the potential of the  $MoS_2$  channel relative to the FG, but since  $V_{pre} - V_{post} < V_{th_neg}$ , there is negligible  $I_{tunnel}$  and consequently no long term changes in channel conductance. Region III represents the arrival of the pre-synaptic spike while the post-synaptic potential is returning to its resting value. Now,  $V_{pre} - V_{post} > V_{th_pos}$ , leading to a considerable  $I_{tunnel}$ , and an accumulation of negative charges on the FG. The enhanced negative charge on the FG leads to a decrease in channel conductance (negative  $\Delta G\%$ ) for negative  $\Delta t$  (Panel IV). We define  $V_{th_pos}$  ( $V_{th_neg}$ ) as the maximum positive (negative) peak voltage of the overlap potential for the largest negative (positive) time difference  $\Delta t$  at which a  $|\Delta G\%| > 3\%$  is observed. For the DUT, we find,  $V_{th\_neg} \approx -8$  V, and  $V_{th\_pos} \approx 7.6$  V.

For a quantitative analysis, we hypothesize<sup>[21]</sup> that the experimentally observed STDP behavior of the MoS<sub>2</sub> FG-FET, occurs due to an electric field  $(E_{tunnel})$  driven tunneling of charges from the channel to the FG. In the context of our device geometry,  $E_{tunnel}$  is the electric field developed across the *h*-BN layer due to the voltage applied at the pre (gate) - and/or post (source) -synaptic terminal. This electric field results in a tunneling current  $(I_{tunnel})$  which follows the Fowler Nordheim (FN) mechanism

$$I_{tunnel}(V) = \frac{A_{ch}q^3 m V_{tunnel}^2}{8\pi h \phi_b d^2 m^*} \exp\left[\frac{-8\pi \sqrt{2m^*} \phi_b^{\frac{3}{2}} d}{3hq V_{tunnel}}\right] \quad (3)$$

TABLE I: Range of tunnel barrier height for electrons and holes in  $MoS_2$  FG-FET

| Electron<br>affinity<br>of $h$ -BN<br>(eV)<br>$(\chi^{h-BN})$ | Band<br>gap of $h$ -<br>BN (eV)<br>$(\lambda^{h-BN})$ | $\begin{array}{l} \textbf{Work} \\ \textbf{function} \\ \textbf{of}  \textbf{MoS}_2 \\ \textbf{(eV)} \\ \textbf{(}\phi^{MoS}\textbf{)} \end{array}$ | Electron<br>barrier<br>height<br>(eV)<br>$(\phi_{electron}^{electron} = \chi^{h-BN} - \phi^{MoS})$ | Hole<br>barrier<br>height<br>(eV)<br>$(\phi_{hole}^{hole} = \chi^{h-BN} + \lambda^{h-BN} - \phi^{MoS})$ |
|---------------------------------------------------------------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
| 1.1 - 2.3                                                     | 5.2 - 5.9                                             | 4.6 - 4.9                                                                                                                                           | 2.6 - 3.5                                                                                          | 1.7 - 3.3                                                                                               |

where  $A_{ch}$  and d are the channel area and the thickness of the *h*-BN tunnel barrier, respectively,  $V_{tunnel} = E_{tunnel} \times d$  is the tunnelling bias across the *h*-BN tunnel barrier and  $m^* = 0.26m$  is the effective mass of charge carriers in h-BN<sup>[43]</sup>. The barrier height for tunnelling  $\phi_b$  is the energy barrier at the MoS<sub>2</sub>/*h*-BN junction and is calculated from the electron affinity of *h*-BN and work function of MoS<sub>2</sub> (Fig. 4a). From previous reports, the electron affinity and band gap of h-BN ranges between 1.1-2.3 eV<sup>[19,44]</sup> and 5.2-5.9 eV<sup>[19,44]</sup>, respectively, while the work function of MoS<sub>2</sub> is 4.6-4.9 eV<sup>[19]</sup>. The corresponding range of  $\phi_b$  values in the current device geometry is presented in the Table I.

The STDP response is numerically calculated by finding the  $\Delta G\%$  for every  $\Delta t$ . To perform this, the overlap potential is discretized using a time interval of 0.1  $\mu$ s. The tunnel bias  $V_{tunnel}$ , at each instant of time  $t_i$ , is given by the overlap potential modified by the potential of the FG ( $V_{FG}$ ) at that instant of time.

$$V_{tunnel}(t_i) = V_{pre}(t_i) - V_{post}(t_i) + V_{FG}(t_i)$$

$$\tag{4}$$

where,  $V_{tunnel}(t_i)$ ,  $V_{pre}(t_i)$ ,  $V_{post}(t_i)$  and  $V_{FG}(t_i)$  are the tunnel, pre-synaptic, post-synaptic and FG bias, respectively at the instant  $t_i$ . Here, we have assumed that the total applied bias ( $V_{pre}$ ) is applied at the FG terminal. This is valid due to the capacitance engineering introduced by an extension of the FG<sup>[21]</sup>. For i=1, i.e. the first instant,  $V_{FG}(t_1)$  is the gate bias above threshold needed to attain the initial conductance state of the device. This is demonstrated for an initial current of 300 nA and drain bias of 0.05 V in Fig. [4]b.

In response to the applied  $V_{tunnel}(t_i)$ , a tunnel current  $I_{tunnel}(t_i)$  (Eq. 3) flows across the *h*-BN barrier leading to the storage of positive (negative) charges on the FG, thereby increasing (decreasing) its potential by

$$\Delta V_{FG}(t_i) = \frac{Q_{tunnel}(t_i)}{C_{self}} \tag{5}$$

where,  $Q_{tunnel}(t_i) = I_{tunnel}(t_i) \times 0.1 \ \mu s$ , is the charge stored on the FG due to a single pulse and  $C_{self} = 8\epsilon_0 A_{FG}$ , is the self-capacitance of the FG with  $\epsilon_0$  the permittivity of free space and  $A_{FG}$  ( $\approx$  45000  $\mu m^2$ ) the area of the FG.

The FG potential is updated for the next instant  $(t_{i+1})$  following

$$V_{FG}(t_{i+1}) = V_{FG}(t_i) + \Delta V_{FG}(t_i) \tag{6}$$

and the same is used to obtain the tunneling current for the next time interval using Eq. 3. Repeating this process for all  $t_i$  values gives us the total FG voltage change  $(\Delta V_{FG\_total})$  for a pair of pre- and post-synaptic spikes separated by  $\Delta t$ .

$$\Delta V_{FG\_total} = \sum_{i=1}^{N} \Delta V_{FG}(t_i) \tag{7}$$

 $\Delta V_{FG\_total}$  represents the total back-gate voltage change due to the overlapping of the pre- and post-synaptic spikes. The resulting change in the channel conductance is estimated from the transfer characteristics (Fig. [4]b) and the corresponding  $\Delta G\%$  is computed.

The simulated plot, dashed line in Fig. 4c, is obtained by repeating the process for different  $\Delta t$  values and shows good agreement with the experimental data for hole and electron barrier of 2.66 eV and 3.04 eV, respectively. These values match closely with the reported values for  $\rm MoS_2/h\text{-}BN$  interface presented in Table [I], making us believe that electric field dependent FN tunneling is at the heart of the observed STDP behaviour in  $\rm MoS_2$  FGFET. A numerical analysis of the pulsed potentiation and depression behavior in the  $\rm MoS_2$  FG-FET devices and the energy dissipated during training and reading can be found in Supplementary Section [III] and [V], respectively.

Notably, the STDP timescales  $(\tau_+, \tau_-)$  of ~100  $\mu$ s obtained in Fig. 3e are smaller than the millisecond timescales observed in biological systems<sup>[38]</sup>. However, these timescales are tunable by changing the spike parameters. Fig. 4d demonstrates this for four different preand post-synaptic spike combinations.  $\tau_+$ ,  $\tau_-$  and maximum  $\Delta G\%$  increase with increasing spike amplitudes  $A_{mp}^+$  and  $A_{mp}^-$ . For a particular  $\Delta t$ , the magnitude of overlap potential  $(V_{pre} - V_{post})$  and hence  $\Delta G\%$  is determined by the magnitude of the pre and post-synaptic spike. Higher spike heights result in a larger overlap potential and tunnel current (Eq. 3). The increased storage of charges on the FG leads to greater changes in conductance lasting for a wider range of  $\Delta t$  values. We obtained a time-scale tunability over a factor of five by changing the pulsing amplitude, indicating a key advantage of pure electrostatic operation. This is an important feature and opens up the possibility of engineering tailor-made STDP responses for specific applications.



FIG. 4: Modelling spike time dependent plasticity (STDP) in MoS<sub>2</sub> FG-FET. a, Schematic depiction of the tunnel barrier height and band alignment at MoS<sub>2</sub>/h-BN interface. b, Graphical determination of  $V_{FG}$  from the transfer characteristics. c, Experimentally observed (symbols) and numerically simulated (red dashed line) STDP characteristics of MoS<sub>2</sub> FG FET. Inset shows the full STDP response while the main panel zooms into time differences near  $\Delta t = 0 \ \mu$ s. d, From top to bottom panel. Graphs demonstrating STDP response for increasing spike heights. We find an increase in decay timescales for both positive ( $\tau_+$ ) and negative ( $\tau_-$ )  $\Delta t$  with increasing spike height.

#### B. Cochlear recognition with STDP in $MoS_2$ FG-FETs

Our earlier work demonstrated the real-time hardware implementation<sup>[45]</sup> of a biologically plausible cochlea model known as Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) model<sup>[45,46]</sup>. The system proposed here only uses the CAR part of the model, which emulates the behavior of the BM, the IHC and the spiral ganglion cells (SGCs). The SGC is simply modelled as a Poisson process, which converts the IHC output into spike trains. The output of the IHC is the spectrogram , which is down sampled in time to reduce computational complexity (LHS of Fig. 5a). A detailed description of this block is provided in the Supplementary Section [I]. The spiked output of the SGCs are connected to the next layer with spiking neurons via adaptive

synaptic connections, which learn acoustic features from the input using the STDP rule. The synapses in this unsupervised feature learning stage are implemented using a  $MoS_2$  FG-FET acting as a multi-state memory with inherent STDP learning capability.

The update rule (mentioned in detail in Supplementary Section [II]) for all synapses connecting the pre- and post- synaptic neurons in the feature learning stage of the NSRS can be summarized using Eq. 2. For a relatively short ( $\leq 0.5 \times \tau_{+/-}$ ) and long ( $\geq 4.5 \times \tau_{+/-}$ ) time interval, as compared to  $\tau_+$  and  $\tau_-$ , synapses are not altered since they don't contribute to the post-synaptic firing. Otherwise, the synaptic weights, w, are updated with learning rate,  $\eta$ , times the conductance,  $\Delta G\%$ , corresponding to  $\Delta t$ . Since, the update rule is independent of the class of the spoken digit, the synapses of the output neurons in the neural network learns the auditory features using



FIG. 5: Speech recognition from IHC output using  $MoS_2$  FG-FET. a, (LHS) The down sampled inner hair cell (IHC) output of dimension 54×108 for digits 0-9 in the TIDIGITS dataset is shown. (RHS) Spectogram of 54×108 synapses of each output neuron after training showing weight patterns which are similar to the original spectrograms (LHS). b, Confusion matrix and c, t-SNE visualization of the speech recognition task performed over TIDIGITS dataset.

the states of the  $MoS_2$  memory, in an unsupervised manner.

$$w_{new} = w_{old} + \eta \Delta G\%, \ if \ 0.5 \times \tau_{+/-} \le |\Delta t| \le 4.5 \times \tau_{+/-}$$
 (8)

The proposed neuromorphic hearing system is validated using isolated digits of the TIDIGITS database. The dataset consists of 6520 audio recordings of isolated digits 0 to 9, sampled at 20kHz, split equally into train and test set. Based on onset and offset of the spoken digit recording, all audios are fit into a 1 second time frame. The N-channel CAR model used to mimic cochlear behavior, takes the 1 second audio samples as input and based on the constant Q-filter parameters (Supplementary Section [I]), outputs the time-frequency mapping, i.e the spectrogram corresponding to the audio input. Here, we use 54-channel CAR model to extract frequency components within 100 Hz - 4 kHz range, as this range captures most of the speech information.

Fig. 5a (LHS) shows the CAR output spectrogram (54-channel  $\times$  20,000-samples) obtained for digits 0-9, respectively, down-sampled into  $54 \times 108$  dimension to reduce the computational complexity on the neural network. Poisson neurons are used to model the response of the spiral ganglion neurons within the cochlea, to the down-sampled spectrogram. Each sample from the Poisson neuron's distribution is fed to the STDP neural network. The synapses in the STDP architecture learn connections from the Poisson neurons to the output integrate and fire (IF) neurons in an unsupervised manner. We use 16 IF neurons at the output, to capture various audio patterns within the down-sampled spectrogram of all digits. During the training phase of the neural network, the firing threshold of all IF neurons,  $V_{threshold}$  is set as 1.5% of the energy in the input down-sampled spectrogram. This is done to compensate for the variation of energy within different digits, thereby ensuring uniform firing rate for all digits.

We adopt a winner-take-all strategy to update the synaptic connections, when IF neurons fire (details in Supplementary Section [II]). When more than one IF neuron fires simultaneously, the neuron with the highest membrane potential (the winner-neuron) is activated, and undergoes update based on STDP characteristics, to learn the input pattern. Since the input to the feature learning stage is the Poisson neuron response to the two-dimensional spectrogram, the learned synapses are also two-dimensional and of same size as that of downsampled spectrogram  $(54 \times 108)$ . Once activated, the neuron enters its refractory period, within which the membrane potential of the neuron is reset to its initial state. This way, all the neurons compete with each other for activation. The IF neurons are further connected to each other with inhibitory connections. Thus, when an IF

neuron fires, remaining IF neurons are forced to inhibit their membrane potential. Moreover, when none of the IF neurons fire, the membrane potential of all the neurons are decreased, so as to enhance competitive learning among the neurons.

The update value of synaptic connections ( $\Delta G\%$  in Eq. 8) is influenced by the time difference between the Poisson neuron firing (pre-synaptic spike) and the IF neuron firing (post-synaptic spike). A time-continuous STDP function, as in Eq. 2 is defined, which best fits the MoS<sub>2</sub> memory device STDP characteristics (as shown in Fig. 3e). The STDP function is further quantized using the defined states of the memory device, and these are used to update the synapses for the digit recognition task under consideration. The STDP update is limited to  $0.5 \times \tau_{+/-} \leq |\Delta t| \leq 4.5 \times \tau_{+/-}$ , as was explained using Eq. 8.

Fig. 5a (RHS) shows the synapses obtained after training is done on the STDP architecture with the downsampled spectrogram. The various patterns within the synapse shows that with STDP updates, each synapse learns different speech formants hidden in different frequency ranges of the spectrogram. For example, the high frequency information in digits 2, 3 and 8 around frequency channel 20 is captured dominantly by neurons 2, 7, 10 and 12 around the same channel. Whereas the energy of digits 4, 5 and 9 around frequency channel 30-40 is captured by neurons 2, 4, 6 and 14. The high frequency information of digit 1 is captured by neurons 8 and 13 only. Hence, we can conclude that the synapses learns patterns from spectrograms, few of which may be common among different spoken digits, and few of which differentiates them from one another. The accumulated post-synaptic potential (PSP) of all the 16 IF neurons, which indicates the correlation between the input spectrogram and the trained synapse of each IF neuron, can be used as an effective measure to classify the spoken digits.

The accumulated PSP of all the 16 IF neurons, for all 3260 spoken digits in the training set of TIDIGITS database, was used to train the linear SVM classifier in the final stage of the NSRS. The trained linear SVM provides 88.92% accuracy for the remaining 3260 spoken digits in the test set of the TIDIGITS database. Fig. 5b shows the confusion matrix for the speech recognition task performed. True label on the x-axis indicates the actual class/digit to which the audio belonged to, and the the predicted label on the y-axis indicates the class/digit the classifier predicted for the given audio. The higher value on the diagonal in the confusion matrix, therefore, indicates higher number of spoken digits were predicted correctly. Fig. 5c represents t-SNE visualization, a form of clustering representation of the the 3260 test audios. Each dot in the visualization represents an audio, defined by the 16 IF neuron's accumulated PSP, obtained after the feature learning stage for that audio. Audios of



FIG. 6: Device-to-device variation in STDP response. a, Plots of STDP response for six MoS<sub>2</sub> FG-FETs. Black lines are exponential fits to the experimental data for measuring the decay timescale, while error bars indicate the spread of each state in the  $\Delta G\%$  axis. b, Comparison of the number of states in different devices.

different spoken digits are represented using different colors. Hence, clusters of same color in the plot, represents how close the pattern of accumulated PSP of the output neurons are for audios belonging to same class/digit.

### C. Device-to-device variation and its effect on neuromorphic speech recognition

We performed STDP measurements on six separate devices and the results depicting device-to-device variation are presented in Fig. 6. The STDP response shows a variation in both  $\Delta G\%$  and time constant ( $\tau_+$  and  $\tau_-$ ) (Fig. 6a). Device-to-device variability can arise from numerous sources, however, in the case of  $MoS_2$  FG-FET, this is mostly due to a variability in the mobility and contact resistance of the  $MoS_2$  channel in different devices. A combination of these factors leads to a device-to-device variation in the maximum attainable conductance, and hence  $\Delta G\%$  as well. The variability in time constant is a direct consequence of the variation in  $\Delta G\%$ , since devices with a larger overall change in conductance generally demonstrate higher time constants (Fig. 6a (Device D01 and D03) and Fig. 4d). Another possible source of mismatch is the variability in barrier injection height at the  $MoS_2/h$ -BN interface, which arises from defects and/or interfacial hydrocarbon formation (more commonly referred to as bubbles)<sup>[47]</sup> during the stacking process for fabricating the heterostructure. Removal of bubbles from heterostructure interfaces is still

an ongoing effort in the two-dimensional device community, and effort was made in the current work to reduce the number of such disorders while fabricating the heterostructures. However, their effects, cannot be ruled out altogether. Alongwith the STDP response, the number of quantized states obtained per device also demonstrates slight variation (Fig. 6b). This number, which ranges between 19-24 for the devices measured, is obtained from the average spread of each state in the  $\Delta G\%$  axis (Error bars in Fig. 6a). The spread/error in each state is determined statistically by computing the standard deviation from multiple (> 3) independent measurements at the same  $\Delta t$ . Our analysis shows a temporal sensitivity of  $\approx 10-15 \ \mu s$  in our synaptic devices. The variation in the number of quantized states is an important parameter and affects the quantization of the STDP function used for the cochlear recognition algorithm. A larger number of states is favourable and improves the capability of the NSRS in classifying different auditory inputs leading to better performance.

The performance of the NSRS was checked for deviceto-device variation. Device mismatch was incorporated by a Gaussian distribution covering the range of timescales ( $\tau_+$  and  $\tau_-$ ) for the six measured devices, which were used to generate  $54 \times 108$  STDP characteristics employed in updating the synapses. The speech recognition system is insensitive to device-to-device variation as indicated by the high rate of correct classification in the confusion matrix (Supplementary Section [VI]). We report an accuracy rate of 89.11% which is similar to the accuracy rate without including device mismatch. The robustness of our speech recognition system to device mismatches is an important real-world-test as device-todevice variations are bound to exist in any fabrication process.

### **III. DISCUSSION**

In this paper, we have demonstrated a  $MoS_2$  memory device, with which a neuromorphic hearing system is developed and validated using the TIDIGITS audio database. The main novelty lies in demonstration and use of inherent plasticity of  $2D-MoS_2$  device to model the STDP rule as a synaptic memory, in a single device, without requiring any other learning circuitry. This is a significant advantage over other emerging memories, which only act as storage element. This reduces the computational complexity and memory requirement during training significantly as compare to conventional classification networks, which are based on back-propagation algorithm. The unsupervised feature learning stage learns low dimensional feature representation (in this case 16) from the input spectrogram, which makes our classifier very simple unlike traditional end-to-end supervised neural network architectures.

The proposed system has three major building blocks cochlea block, unsupervised feature extraction block and a linear classifier block. We have shown a novel neuromorphic architecture for machine hearing tasks applied to spoken digit recognition, but the system can be configured to learn for other auditory cognitive tasks such as speech recognition, speaker identifications and so on. We can envision the complete system integrated together using the three separate blocks in order to demonstrate the real time applications. We already have the in-house built cochlea model on FPGA<sup>[45]</sup>, which can be interfaced with the 2D-MoS<sub>2</sub> based memory array to implement the STDP learning capability, followed by a linear classifier system<sup>[48–54]</sup>. The future work will be to integrate all these three separate systems on a single substrate.

### IV. METHODS

To fabricate the  $MoS_2$  FETs, we mechanically exfoliate  $MoS_2$ , *h*-BN and graphene on Si<sup>++</sup>/SiO<sub>2</sub> (285 nm) substrate using the Scotch Tape method<sup>[55]</sup>. Exfoliated flakes are searched under an optical microscope. The thickness of hBN flakes are determined by atomic force microscopy (AFM). The layer number of  $MoS_2$  and crystallinity of graphene are verified using Raman spectroscopy<sup>[21]</sup>. Selected flakes are aligned, stacked, and transferred onto a pre-patterned substrate using a home built micro-mechanical transfer

The polymer used for transfer is comsetup. mercially available nail top coat solution (Lakme Color Crush), which shows good adhesion to twosimensional materials at temperatures between 60 and 120°C and is easily dissolved in acetone<sup>[56,57]</sup>. Electrical contacts to the  $MoS_2$  channel and the extension of the FG are defined using electron beam lithography (EBL). For EBL, a double layer of commercially available electron beam resist, PMMA (Poly methyl methacrylate) (495PMMA A/950PMMA A MicroChem) with a total thickness of ~ 250 nm was spincoated on the device followed by exposure to electron beam. Subsequently, the patterns are developed in 1:3 MIBK: IPA, metallization is performed via thermal deposition of Cr (5 nm)/ Au (50 nm) in high vacuum (~  $10^{-6}$  mbar) conditions and excess metal is removed via lift-off in acetone. Electrical measurements are performed in vacuum at room temperature in a probestation (Lakeshore CPX-VF). For the transfer and output characteristics, two source meter units (SMUs) of the semiconductor parameter analyser (Keithley 4200A-SCS) was used to apply the  $V_{bg}$  and  $V_{sd}$ , while the current was measured using the SMU supplying the  $V_{sd}$ . For STDP measurements, the pre- and post-synaptic spikes are applied using two separate function generators (SRS DS 345) operated in the arbitrary function mode (ARB), fng\_pre and  $fng_post$ , respectively. The pre- and postspikes with defined  $\Delta t$  are loaded onto  $fng_pre$ and  $fng_post$ , respectively. To perform the measurement, the trigger out of  $fng_{-}pre$  is connected to trigger in of fng\_post. fng\_pre is triggered using GPIB which triggers  $fng_post$  and two spikes with pre-determined time separation are applied. The resulting conductance change is obtained by measuring the current due to a small read voltage  $V_{read}$  = 0.05 V applied at the drain terminal using the Keithley 4200A-SCS.

### V. ACKNOWLEDGEMENT

We acknowledge the Department of Science and Technology (DST) for a funded project. The authors would also like to thank National Nanofabrication Facility (NNFC), CENSE, IISC and Micro and Nano Characterization Facility (MNCF), CENSE, IISC for fabrication and characterization facilities provided. This work is funded Ministry of Human Resource Development (MHRD), Science and Engineering Research Board (SERB), India (ECR/2017/002517) and IM-PRINT Grant (IMP/2018/000550) from the Department of Science and Technology, India.

## VI. AUTHOR CONTRIBUTIONS

TP, KKT and AG planned the experiments on  $MoS_2$  FG-FET devices. TP and KKT performed the experiments. AAM performed the cochlear recognition using NSRS. All authors contributed equally towards writing the manuscript. AG and CST conceived and supervised the project.

## VII. COMPETING INTERESTS

The authors declare no competing interests

- <sup>1</sup> Von Neumann, J. First Draft of a Report on the EDVAC. *IEEE Ann. Hist. Comput.* **1993**, *15*, 27–75.
- <sup>2</sup> Thakur, C. S. T.; Molin, J.; Cauwenberghs, G.; Indiveri, G.; Kumar, K.; Qiao, N.; Schemmel, J.; Wang, R. M.; Chicca, E.; Olson Hasler, J., et al. Large-scale neuromorphic spiking array processors: A quest to mimic the brain. *Front. Neurosci.* **2018**, *12*, 891.
- <sup>3</sup> Lai, Q.; Zhang, L.; Li, Z.; Stickle, W. F.; Williams, R. S.; Chen, Y. Ionic/electronic hybrid materials integrated in a synaptic transistor with signal processing and learning functions. *Adv. Mater.* **2010**, *22*, 2448–2453.
- <sup>4</sup> Ishiwara, H. Proposal of adaptive-learning neuron circuits with ferroelectric analog-memory weights. *Jpn. J. Appl. Phys.* **1993**, *32*, 442.
- <sup>5</sup> Jo, S. H.; Chang, T.; Ebong, I.; Bhadviya, B. B.; Mazumder, P.; Lu, W. Nanoscale memristor device as synapse in neuromorphic systems. *Nano Lett.* **2010**, *10*, 1297–1301.
- <sup>6</sup> Liu, C.; Chen, H.; Wang, S.; Liu, Q.; Jiang, Y.-G.; Zhang, D. W.; Liu, M.; Zhou, P. Two-dimensional materials for next-generation computing technologies. *Nat. Nanotechnol.* **2020**, *15*, 545–557.
- <sup>7</sup> Young, K. K. Short-channel effect in fully depleted SOI MOSFETs. *IIEEE Trans. Electron Devices* **1989**, *36*, 399–402.
- <sup>8</sup> Desai, S. B.; Madhvapathy, S. R.; Sachid, A. B.; Llinas, J. P.; Wang, Q.; Ahn, G. H.; Pitner, G.; Kim, M. J.; Bokor, J.; Hu, C.; Wong, H.-S. P.; Javey, A. MoS<sub>2</sub> transistors with 1-nanometer gate lengths. *Science* **2016**, *354*, 99–102.
- <sup>9</sup> Jayachandran, D.; Oberoi, A.; Sebastian, A.; Choudhury, T. H.; Shankar, B.; Redwing, J. M.; Das, S. A lowpower biomimetic collision detector based on an in-memory molybdenum disulfide photodetector. *Nat. Electron.* **2020**, 1–10.
- <sup>10</sup> Liu, Y.; Weiss, N. O.; Duan, X.; Cheng, H.-C.; Huang, Y.; Duan, X. Van der Waals heterostructures and devices. *Nat. Rev. Mater.* **2016**, *1*, 1–17.
- <sup>11</sup> Wang, L.; Meric, I.; Huang, P.; Gao, Q.; Gao, Y.; Tran, H.; Taniguchi, T.; Watanabe, K.; Campos, L.; Muller, D.; Guo, J.; Kim, P.; Hone, J.; Shepard, K. L.; Dean, C. R. One-dimensional electrical contact to a two-dimensional material. *Science* **2013**, *342*, 614–617.
- <sup>12</sup> Radisavljevic, B.; Radenovic, A.; Brivio, J.; Giacometti, V.; Kis, A. Single-layer MoS<sub>2</sub> transistors. *Nat. Nanotechnol.* **2011**, *6*, 147–150.
- <sup>13</sup> Bandurin, D.; Torre, I.; Kumar, R. K.; Shalom, M. B.; Tomadin, A.; Principi, A.; Auton, G.; Khestanova, E.; Novoselov, K.; Grigorieva, I.; Ponomarenko, L. A.; Geim, A. K.; Polini, M. Negative local resistance caused by viscous electron backflow in graphene. *Science* **2016**, *351*, 1055–1058.
- <sup>14</sup> Cao, Y.; Fatemi, V.; Fang, S.; Watanabe, K.; Taniguchi, T.; Kaxiras, E.; Jarillo-Herrero, P. Unconventional superconductivity in magic-angle graphene superlattices. *Nature* **2018**, *556*, 43–50.
- <sup>15</sup> Roy, K.; Padmanabhan, M.; Goswami, S.; Sai, T.; Ramalingam, G.; Raghavan, S.; Ghosh, A. Graphene - MoS<sub>2</sub> hybrid structures for multifunctional photoresponsive memory devices. *Nat. Nanotechnol.* **2013**, *8*, 826–830.
- <sup>16</sup> Lee, J. Y.; Shin, J.-H.; Lee, G.-H.; Lee, C.-H. Two-

dimensional semiconductor optoelectronics based on van der Waals heterostructures. *Nanomaterials* **2016**, *6*, 193.

- <sup>17</sup> Massicotte, M.; Schmidt, P.; Vialla, F.; Schädler, K. G.; Reserbat-Plantey, A.; Watanabe, K.; Taniguchi, T.; Tielrooij, K.-J.; Koppens, F. H. Picosecond photoresponse in van der Waals heterostructures. *Nat. Nanotechnol.* **2016**, *11*, 42–46.
- <sup>18</sup> Zhou, X.; Hu, X.; Yu, J.; Liu, S.; Shu, Z.; Zhang, Q.; Li, H.; Ma, Y.; Xu, H.; Zhai, T. 2D Layered Material-Based van der Waals Heterostructures for Optoelectronics. *Adv. Funct. Mater.* **2018**, *28*, 1706587.
- <sup>19</sup> Sup Choi, M.; Lee, G.-H.; Yu, Y.-J.; Lee, D.-Y.; Hwan Lee, S.; Kim, P.; Hone, J.; Jong Yoo, W. Controlled charge trapping by molybdenum disulphide and graphene in ultrathin heterostructured memory devices. *Nat. Commun.* **2013**, *4*, 1624.
- <sup>20</sup> Woo, M. H.; Jang, B. C.; Choi, J.; Lee, K. J.; Shin, G. H.; Seong, H.; Im, S. G.; Choi, S.-Y. Low-Power Nonvolatile Charge Storage Memory Based on MoS<sub>2</sub> and an Ultrathin Polymer Tunneling Dielectric. *Adv. Funct. Mater.* **2017**, *27*, 1703545.
- <sup>21</sup> Paul, T.; Ahmed, T.; Tiwari, K. K.; Thakur, C. S.; Ghosh, A. A high-performance MoS<sub>2</sub> synaptic device with floating gate engineering for neuromorphic computing. 2D Mater. **2019**, *6*, 045008.
- <sup>22</sup> Ahmed, T.; Islam, S.; Paul, T.; Hariharan, N.; Elizabeth, S.; Ghosh, A. A generic method to control hysteresis and memory effect in Van der Waals hybrids. *Mater. Res. Express* **2020**, *7*, 014004.
- <sup>23</sup> Wang, Q. H.; Kalantar-Zadeh, K.; Kis, A.; Coleman, J. N.; Strano, M. S. Electronics and optoelectronics of twodimensional transition metal dichalcogenides. *Nat. Nanotechnol.* **2012**, *7*, 699–712.
- <sup>24</sup> Sarkar, D.; Xie, X.; Liu, W.; Cao, W.; Kang, J.; Gong, Y.; Kraemer, S.; Ajayan, P. M.; Banerjee, K. A subthermionic tunnel field-effect transistor with an atomically thin channel. *Nature* **2015**, *526*, 91–95.
- <sup>25</sup> Yu, Z.; Ong, Z.-Y.; Li, S.; Xu, J.-B.; Zhang, G.; Zhang, Y.-W.; Shi, Y.; Wang, X. Analyzing the Carrier Mobility in Transition-Metal Dichalcogenide MoS<sub>2</sub> Field-Effect Transistors. Adv. Funct. Mater. **2017**, *27*, 1604093.
- <sup>26</sup> Gupta, S.; Kumar, P.; Paul, T.; van Schaik, A.; Ghosh, A.; Thakur, C. S. Low Power, CMOS-MoS<sub>2</sub> Memtransistor based Neuromorphic Hybrid Architecture for Wake-Up Systems. *Sci. Rep.* **2019**, *9*, 1–9.
- <sup>27</sup> Park, H.-L.; Lee, Y.; Kim, N.; Seo, D.-G.; Go, G.-T.; Lee, T.-W. Flexible neuromorphic electronics for computing, soft robotics, and neuroprosthetics. *Adv. Mater.* **2020**, *32*, 1903558.
- <sup>28</sup> Wan, H.; Cao, Y.; Lo, L.-W.; Zhao, J.; Sepulveda, N.; Wang, C. Flexible carbon nanotube synaptic transistor for neurological electronic skin applications. ACS Nano **2020**, 14, 10402–10412.
- <sup>29</sup> Yang, C.-S.; Shang, D.-S.; Liu, N.; Fuller, E. J.; Agrawal, S.; Talin, A. A.; Li, Y.-Q.; Shen, B.-G.; Sun, Y. All-Solid-State Synaptic Transistor with Ultralow Conductance for Neuromorphic Computing. *Adv. Funct. Mater.* **2018**, 1804170.
- <sup>30</sup> Zhu, J.; Yang, Y.; Jia, R.; Liang, Z.; Zhu, W.; Rehman, Z. U.; Bao, L.; Zhang, X.; Cai, Y.; Song, L.; Huang, R. Ion Gated Synaptic Transistors Based on 2D

van der Waals Crystals with Tunable Diffusive Dynamics. *Adv. Mater.* **2018**, *30*, 1800195.

- <sup>31</sup> Xu, R.; Jang, H.; Lee, M.-H.; Amanov, D.; Cho, Y.; Kim, H.; Park, S.; Shin, H.-j.; Ham, D. Vertical MoS<sub>2</sub> double-layer memristor with electrochemical metallization as an atomic-scale synapse with switching thresholds approaching 100 mV. *Nano Lett.* **2019**, *19*, 2411–2417.
- <sup>32</sup> Wu, X.; Ge, R.; Chen, P.-A.; Chou, H.; Zhang, Z.; Zhang, Y.; Banerjee, S.; Chiang, M.-H.; Lee, J. C.; Akinwande, D. Thinnest Nonvolatile Memory Based on Monolayer *h*-BN. Adv. Mater. **2019**, 1806790.
- <sup>33</sup> Pan, C. et al. Coexistence of Grain-Boundaries-Assisted Bipolar and Threshold Resistive Switching in Multilayer Hexagonal Boron Nitride. Adv. Funct. Mater. 27, 1604811.
- <sup>34</sup> Bertolazzi, S.; Krasnozhon, D.; Kis, A. Nonvolatile Memory Cells Based on MoS<sub>2</sub>/Graphene Heterostructures. ACS Nano **2013**, 7, 3246–3252.
- <sup>35</sup> Tran, M. D.; Kim, H.; Kim, J. S.; Doan, M. H.; Chau, T. K.; Vu, Q. A.; Kim, J.-H.; Lee, Y. H. Two-Terminal Multibit Optical Memory via van der Waals Heterostructure. Adv. Mater. **2019**, *31*, 1807075.
- <sup>36</sup> Bhattacharjee, S.; Wigchering, R.; Manning, H. G.; Boland, J. J.; Hurley, P. K. Emulating synaptic response in n-and p-channel MoS<sub>2</sub> transistors by utilizing charge trapping dynamics. *Sci. Rep.* **2020**, *10*, 1–8.
- <sup>37</sup> Kim, S.; Choi, B.; Lim, M.; Yoon, J.; Lee, J.; Kim, H.-D.; Choi, S.-J. Pattern recognition using carbon nanotube synaptic transistors with an adjustable weight update protocol. ACS Nano 2017, 11, 2814–2822.
- <sup>38</sup> Bi, G.-q.; Poo, M.-m. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 1998, 18, 10464–10472.
- <sup>39</sup> Kim, S.; Yoon, J.; Kim, H.-D.; Choi, S.-J. Carbon nanotube synaptic transistor network for pattern recognition. *ACS Appl. Mater. Interfaces* **2015**, *7*, 25479–25486.
- <sup>40</sup> Cage, M. E.; Klitzing, K.; Chang, A.; Duncan, F.; Haldane, M.; Laughlin, R.; Pruisken, A.; Thouless, D. *The quantum Hall effect*; Springer Science & Business Media, 2012.
- <sup>41</sup> Kahng, D.; Sze, S. M. A floating gate and its application to memory devices. *Bell Syst. tech* **1967**, *46*, 1288–1295.
- <sup>42</sup> Serrano-Gotarredona, T.; Masquelier, T.; Prodromakis, T.; Indiveri, G.; Linares-Barranco, B. STDP and STDP variations with memristors for spiking neuromorphic learning systems. *Front. Neurosci.* **2013**, *7*, 2.
- <sup>43</sup> Lee, G.-H.; Yu, Y.-J.; Lee, C.; Dean, C.; Shepard, K. L.; Kim, P.; Hone, J. Electron tunneling through atomically flat and ultrathin hexagonal boron nitride. *Appl. Phys. Lett.* **2011**, *99*, 243114.
- <sup>44</sup> Fiori, G.; Betti, A.; Bruzzone, S.; Iannaccone, G. Lateral graphene-hBCN heterostructures as a platform for fully two-dimensional transistors. ACS Nano **2012**, 6, 2642-

2648.

- <sup>45</sup> Xu, Y.; Thakur, C. S.; Singh, R. K.; Hamilton, T. J.; Wang, R. M.; van Schaik, A. A FPGA implementation of the CAR-FAC cochlear model. *Front. Neurosci.* **2018**, *12*, 198.
- <sup>46</sup> Lyon, R. F. Human and machine hearing; Cambridge University Press, 2017.
- <sup>47</sup> Mayorov, A. S.; Gorbachev, R. V.; Morozov, S. V.; Britnell, L.; Jalil, R.; Ponomarenko, L. A.; Blake, P.; Novoselov, K. S.; Watanabe, K.; Taniguchi, T.; Geim, A. K. Micrometer-scale ballistic transport in encapsulated graphene at room temperature. *Nano Lett.* **2011**, *11*, 2396–2399.
- <sup>48</sup> Thakur, C. S.; Wang, R.; Hamilton, T. J.; Etienne-Cummings, R.; Tapson, J.; van Schaik, A. An analogue neuromorphic Co-processor that utilizes device mismatch for learning applications. *IEEE Trans. Circuits Syst. I* **2018**, *65*, 1174–1184.
- <sup>49</sup> Thakur, C. S.; Wang, R.; Hamilton, T. J.; Tapson, J.; van Schaik, A. A low power trainable neuromorphic integrated circuit that is tolerant to device mismatch. *IEEE Trans. Circuits Syst. I, Reg. Papers* **2016**, *63*, 211–221.
- <sup>50</sup> Molin, J. L.; Eisape, A.; Thakur, C. S.; Varghese, V.; Brandli, C.; Etienne-Cummings, R. Low-power, low-mismatch, highly-dense array of VLSI Mihalas-Niebur neurons. 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 2017; pp 1–4.
- <sup>51</sup> Wang, R.; Cohen, G.; Thakur, C. S.; Tapson, J.; van Schaik, A. An SRAM-based implementation of a convolutional neural network. 2016 IEEE Biomedical Circuits and Systems Conference (BioCAS). 2016; pp 560–563.
- <sup>52</sup> Abdollahi, M.; Liu, S.-C. Speaker-independent isolated digit recognition using an AER silicon cochlea. 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS). 2011; pp 269–272.
- <sup>53</sup> Genov, R.; Cauwenberghs, G. Kerneltron: Support vector "machine" in silicon. International Workshop on Support Vector Machines. 2002; pp 120–134.
- <sup>54</sup> Papadonikolakis, M.; Bouganis, C.-S. A novel FPGA-based SVM classifier. 2010 International Conference on Field-Programmable Technology. 2010; pp 283–286.
- <sup>55</sup> Novoselov, K. S.; Geim, A. K.; Morozov, S. V.; Jiang, D.; Zhang, Y.; Dubonos, S. V.; Grigorieva, I. V.; Firsov, A. A. Electric Field Effect in Atomically Thin Carbon Films. *Science* **2004**, *306*, 666–669.
- <sup>56</sup> Roy, K. Optoelectronic Properties of Graphene-Based Van Der Waals Hybrids; Springer Nature, 2020.
- <sup>57</sup> Aamir, M. A.; Ahmed, T.; Hsieh, K.; Islam, S.; Karnatak, P.; Kashid, R.; Mahapatra, P. S.; Mishra, J.; Paul, T.; Pradhan, A.; Roy, K.; Sahoo, A.; Ghosh, A. 2D van der Waals Hybrid: Structures, Properties and Devices; World Scientific, 2017; p 169.