Artificial Intelligent Systems & Architecture


Our primary goal is to maximize the performance, scalability, and efficiency of cutting-edge technologies such as large language models (e.g., GPT, LLaMA) and recommendation systems (e.g., DLRM). We are also exploring the Near Memory Processing paradigm to significantly reduce data movement bottlenecks and enhance system throughput. Additionally, we are focused on optimizing the execution of Machine Learning (ML) inference across GPU and CPU resources to improve speed and responsiveness. By implementing these optimization strategies, we aim to deliver faster and more efficient ML inference. Our ultimate objective is not only to create AI systems with robust computational power but also to design systems that provide maximum efficiency and performance across a wide range of applications and scenarios.

Publications

  1. (Journal) Deep Partitioned Training From Near-Storage Computing to DNN Accelerators