64-bit 4-wide Out-of-Order Processor core

- RV64IMAFDCSU Instruction set
- Support 9 Stage pipeline (Fetch, Predecode, Checking, Decode, Rename, Dispatch, Issue, Execute and Retire)
- Decode and Issue up to 4 instructions per cycle Out-of-Order execution and In-Order commit using ROB
- Six execution units with split functionality
- Branch prediction using BTB and 2-bit direction predictor
- Register renaming to avoid dependency
- 16 KiB 4-Way set-associative VIPT I-Cache & PIPT D-Cache

A 32 bit, 5-stage Pipeline Scalar RISC-V

- Supports RV32G (RV32IMAFD) instructions
- Integrated Floating-Point Unit
- Split I and D (Instruction and Data) caches
- Virtual Memory, Data and Instruction TLBs (DTLB & ITLB)
- Branch Prediction Unit
- System Counters
- Interrupt controller with four levels of pre-emptive priority
- Main Memory with Error Control Coding
- Wishbone B.3 Bus Protocol

A 32 bit, 5-stage Dual-pipeline Superscalar RISC-V

- 32 bit, 5-stage dual-pipeline superscalar processor based on RISC-V ISA
- Integrated Floating-Point Unit
- I and D (Instruction and Data) caches
- Virtual Memory Support
- Dynamic Branch Prediction Unit
- Interrupt controller
- Main Memory with Error Control Coding

A Softcore RISC-V Vector Processor for Edge AI

RISC-V OoO Vector Processor for ML Inference

High-Level Synthesis of Geant4 Particle Transport Application for FPGA

Photon Transport System
- Host: Intel Xeon Silver CPU @ 2.1 GHz
- FPGA: Xilinx Alveo U250
- Kernel frequency: 300 MHz
- Processes Implemented: Compton Scattering, Transportation
- Features: Parallel particle simulation, flexible host code for multiple CUs, burst read/writes on DDR

Runtime Programmable Coprocessor for DCNN

- Runtime Programmable
- Computational throughput of more than 140 G operations/s
- Max image size: 256 × 256, filter size: 16 × 16, Convolution stride: 1
- Maximum of 32 layers in network and 256 filters in a layer.
- Rectified linear unit (ReLU) as nonlinearity and max-pooling of size 2 × 2 and stride 2