Big Data Processing Acceleration

Many machine learning and cognitive applications process large data set collected in the Internet of Things domain. Running data/memory intensive workloads on traditional cores results in high energy consumption and slow processing speed, majorly due to a large amount of data movement between memory and processing units. Processing in-memory (PIM) is a promising solution to address the data movement issue. The state-of-the-art PIM architectures are supporting vectormatrix multiplication in an analog way. However, this PIM technology uses analog-to-digital and digital-to-analog converters (ADCs/DACs) which are taking the majority of the chip area ( 98% in deep learning accelerators) and do not scale as fast as the digital technology. I designed a digital-based PIM platform capable of accelerating fundamental big data algorithms in real-time with orders of magnitude higher energy efficiency. The proposed platform implements the entire big data applications directly in memory blocks without using extra processing units.

The fundamental novelty of this project is the realization of a PIM-enhanced platform including; (1) novel hardware designs for efficient PIM acceleration of big data processing applications. In the hardware layer, the architecture has two main components: PIM-enabled processors and PIM-based accelerators. PIM-enabled processors support the essential operations (bit-wise operation, addition, and multiplication) inside memory blocks including SRAM, DRAM through the memory hierarchy. PIM-based accelerators are application-specific and fully get the benefit of PIM by processing the entire application on memory without any processing cores. In this platform, each memory block exploits the internal switching characteristics of the memory devices to support a highly parallel bit-wise operation and then extend it to row-parallel arithmetic operations. (2) The integrated software infrastructure which seamlessly orchestrates the hardware structures. Our software infrastructure provides abstracted interfaces corresponding to the PIM-enabled memory and accelerators. The proposed software layer is designed in the form of a library, i.e., a collection of automated data processing procedures that software developers are already familiar with. Our approach opened a new direction in PIM technology that provides significant advantages in order to make PIM technology more practical; (i) it provides a highly-parallel and dense computation by eliminating ADC/DAC blocks and working on digital data, (ii) addresses internal data movement issue by enabling in-place computation where the big data is stored, (iii) natively supports floating-point precision that is essential for many scientific applications including Deep learning training, (iv) it is compatible with any bipolar memory technology, including commercially available Intel 3D XPoint. To show the effectiveness, we exploited our digital-based PIM architecture to accelerate a wide range of big data processing applications including machine learning, query processing, graph processing, and bioinformatics. For example, I proposed FloatPIM, a novel high precision PIM architecture, which significantly accelerates Convolutional Neural Networks (CNNs) in both training and inference phase. FloatPIM breaks the CNN computation into data transfer and computing modes and directly supports the floating-point precision. Therefore, it enables highly parallel and high-precision CNN training. We also consider thermal and reliability needs of the proposed hardware blocks [19]. This enables the intelligent use of PIM-based efficient computations for general real-world applications.

Publications:
[ISCA'19] M. Imani, S. Gupta, Y. Kim, T. Rosing “FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision”, IEEE International Symposium on Computer Architecture (ISCA), 2019 (acceptance rate 16.9%) [PDF].
[ISLPED'19] S. Gupta, M. Imani, B. Khaleghi, V. Kumar, T. Rosing, “RAPID: A ReRAM Processing in Memory Architecture for DNA Sequence Alignment", IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2019.
[DATE'18] M. Imani, S. Gupta, T. Rosing, “GenPIM: Generalized Processing In-Memory to Accelerate Data Intensive Applications ”, IEEE/ACM Design Automation and Test in Europe Conference (DATE), 2018 [PDF].
[ICRC'17] M. Imani, Y. Kim, T. Rosing “NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing”, IEEE International Conference on Rebooting Computing (ICRC), 2017.
[ICCAD'17] Y. Kim, M. Imani, T. Rosing “ORCHARD: Visual Object Recognition Accelerator Based on Approximate In-Memory Processing”, IEEE International Conference On Computer Aided Design (ICCAD), 2017.
[ISLPED'17] M. Imani, S. Gupta, A. Arredondo, T. Rosing “Efficient Query Processing in Crossbar Memory”, IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2017.
[DAC'17] M. Imani, S. Gupta, T. Rosing “Ultra-Efficient Processing In-Memory for Data Intensive Applications”, IEEE/ACM Design Automation Conference (DAC), 2017.
[HPCA'17] M. Imani, A. Rahimi, D. Kong, T. Rosing, J. M. Rabaey “Exploring Hyperdimensional Associative Memory”, IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2017 [PDF].
[DATE'17] M. Imani, D. Peroni, Y. Kim, A. Rahimi and T. Rosing, "Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory," IEEE/ACM Design Automation and Test in Europe Conference (DATE), 2017.
[DATE'17] M. Samragh, M. Imani, F. Koushanfar and T. Rosing, "LookNN: Neural Network with No Multiplication," IEEE/ACM Design Automation and Test in Europe Conference (DATE), 2017.
[ISQED'17] M. Imani, T. Rosing, "CAP: Configurable Resistive Associative Processor for Near-Data Computing," IEEE International Symposium on Quality Electronic Design (ISQED), 2017.
[ASP-DAC'17] M. Imani, Y. Kim, T. Rosing, “MPIM: Multi-Purpose In-Memory Processing using Configurable Resistive Memory” Accepted IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), 2017 [PDF].