People – Digital smart systems and AI laboratory

Mohammadreza Pourfard

Profile Contents

About
Theses
Research Interests
Research Groups
Research Projects
Current Students
Employment Records
Contact
Publications
News

About

Faculty member (Assistant Professor) of Amirkabir University of Technology

Theses

B.S Degree:
- Simulating Multi-Level Cache Memory
M.S Degree:
- Detection and Tracking of a Human in different positions of its body
PhD Degree:
- Texture Analysis and Separation for Characterization of Material’s Structure through

Research Interests

Artificial Intelligence
Big data
LLMs
Genetics data processing
Cancer Detection and Prediction
Personalized Medicine
5G
GPU
HPC
Medical Image analysis
SAR image processing

Research Groups

Research Projects

Projects inside the laboratory:
Projects outside the laboratory:
- Consulting knowledge-based companies of Iran to export.

Current Students

Under Supervision:
- Mehrdad Khan Ahmadi
- Hadi Salavati

Employment Records

Consultant of Knowledge-based center of ISTI in Iran for Evaluation of knowledge-based companies of Electronic, Control, Communication, Medical Devices, Electrical device areas (2016/1/1_2018/1/1 )
Consultant of workgroup of Digital economy and Intelligent systems center in ISTI in Iran (2018/1/1_2022/1/1 )
Consultant of Evaluation of Big Project center of ISTI in Iran in Evaluating Project and Exporting the Top knowledge-based companies of Iran (2022/1/1_Present)

Courses

The objective of this course is to establish a connection between the latest topics in digital systems and the field of artificial intelligence. In this course, we will first examine various systems and parallel processing approaches from a fully electronic perspective. Then, the architecture of the latest-generation NVIDIA GPUs, such as the Blackwell 2024 architecture, will be studied.
In the second stage, large language models (LLMs) and the most recent advancements in this field will be explored. In the third stage, the use of large language models and GPUs for processing different types of data—including image data, signal data, and genetic data—will be examined.

The target audience of this course consists of three different groups:
The first group includes individuals who want to become familiar with parallel processing systems, particularly the latest-generation NVIDIA GPUs.
The second group consists of individuals who wish to gain specialized knowledge of large language models and their most recent developments.
The third group includes individuals who intend to work on processing various types of data, including medical imaging data (MRI, CT scan, PET scan, radiology, mammography), medical signals (ECG, EEG, EMG, and nerve conduction signals), and genetic data (genomics, epigenomics, and transcriptomics), as well as other medical data such as proteomics, metabolism, and gut microbiota data.
Of course, the scope of data is not limited to medical data and includes many other types of data. However, to facilitate better understanding of image data and to maintain focus on a specific domain, greater emphasis will be placed on medical data.

The course syllabus is as follows:
Part 1) Parallel processing and architecture of NVidia's latest generation GPUs (33%)
 Parallel processing systems
 Introduction to the architecture of multi-core systems
 Introduction to multi-threaded programming, related programming models and languages
 Introduction to the concepts of vector processing, SIMD, SSE, AVX and how to use it
 Implementation of algorithms in multi-threaded and vector form using multi-core programming languages (OpenMP)
 Introduction to common methods of thread synchronization, locks, barriers
 Introduction to the architecture of graphics processors, memory hierarchy in GPU
 Introduction to the history of Nvidia's GPU architectures including Fahrenheit, Kelvin Turing, Hopper, Blackwell, Blackwell Ultra
 Introduction to the blocks of the CB202 Chipset graphics chip
 Introduction to the GPC, Memory Controller, Cache, AMP & Giga Thread Engine, NVENC / NVDEC, Optical Flow Engine, PCI Express 5.0 Host Interface units.
 Review of Mixed FP32/INT32 module
 Familiarity with the architecture and capabilities of Ray tracing core, Cude core, Tensor core
 Familiarity with Warp Schedulers & Dispatch Units
 Familiarity with Texture Units: perform texture fetches and filtering
 Familiarity with Load/Store Units (LD/ST): handle memory access (global/local memory 10-reads/writes)
 Familiarity with Register File: private storage per thread
 Familiarity with fast memories Shared Memory / L1 Cache: memory accessible by all threads in the SM. and rapid implementation of algorithms using these memories
 Introduction to the AI Management Processor (AI kernels, Multi-GPU Scaling, tensor parallelism, Training massive models (LLMs), data parallelism) and how to use them in implementing LLM models
 Introduction to CUDA architecture and GPU Driver
 Introduction to the User-Space Driver of GPU (Resource & State Management (Logical View), Build Command Buffers, Interface with Kernel Driver, API Front-End, State Manager, Compiler Stack, Resource Manager, Command Buffer Builder, Caching & Pipeline Database)
 Introduction to the Kernel-Space Driver of GPU (Context Manager, Memory Manager, Command Processor Interface, Scheduler / Dispatcher, Synchronization Manager, Interrupt Handler, Power & Thermal Control, Virtualization Layer)
 Introduction to GPU parallel programming and the CUDA programming language
 Providing examples of implementing common applications on GPU
 Programming with CUDA
 Modification of CUDA kernels for a specific purpose such as FFT with an arbitrary number of points

Part Two) LLM Networks (34 percent)
 Review of LLM networks and the latest developments in this field
 Introduction to the basics of Large Language Modeling
 Introduction to Tokenization and its hidden effects
 Introduction to the basic models of GPT, BERT, T5
 Introduction to Reasoning in LLMs
 Introduction to the types of Hallucination and uncertainty
 Introduction to the types of RAG methods and connection to external knowledge (such as Naïve RAG, Advanced RAG, Modular RAG)
 Benchmark evaluation and crisis
 Review of the architecture of the latest generation of Llama, GPT, Deep Seek and other global emerging models

Part Three) Application of LLM and GPU on processing various data (including medical data) (33 percent)
 Implementation of LLM methods using GPU on various types of data
 How to use the RAG concept in using genetic time-varying databases in LLM models
 How to use AI Kernel and AI Processor units in data processing
 Implementation of some heavy algorithms on Nvidia's latest generation GPUs using fast L1 Cache memory.
 Familiarity with medical image formats such as MRI, CT-scan, Pet-scan, Radiology, Mammography and the role of each of these data in understanding medical issues
 Familiarity with medical signal data formats such as EEG, cardiac pacemaker, muscle and nerve tape and the role of each of these data in understanding medical issues
 Familiarity with basic genetic concepts such as DNA, RNA, Protein and Gut Microbiota
 Familiarity with levels of medical data analysis including genomics, epigenomics, transcriptomics, proteomics, metabolomics, phenomics
 How to Fine-tune LLM models for analyzing various Multi-omics data
 How to design Encoder and Decoder layers in LLM models on images, signals and genetics
 Familiarity with genetic databases in medicine

Teacher Assistants

Books

Slides

LLM and Advanced GPU-Chap 1-Introduction-ver 1.0

Supplementary Files

The Big Data course addresses the current topic and global trend of Big Data with two approaches: algorithm-based and system-based.
The algorithm approach focuses on big data analysis algorithms, and the system-based approach refers to the architecture of big data systems in a way that is stable and scalable.
In fact, the algorithm-based approach tells how to analyze data and extract knowledge from it, and the system-based approach tells how to store and process data at the infrastructure level.
This course will also address new trends in the field of big data, such as blockchain.

Teacher Assistants

Maryam Talebi
Parsa Mohammadi
Nima Mousavi Monfared
Faezeh Sadeghi
AmirMohammad Talischian

Books

Slides

Assignments

Assignmennt1

Basic Programming

Teacher Assistants

Maryam Talebi

The Computer Architecture and Microcontroller course covers concepts related to advanced, license-free RISC_V architectures and the AVR microcontroller.

Teacher Assistants

Javad Sobhani
Ali Ghasemi

Books

Slides

course materials

Assignments

Software Programs

Mentor Graphics MODELSIM

در این درس معماری FPGA های پایه و مدرن تدریس خواهد شد.
همچنین شما در این درس یک PCB چهرلایه با یک FPGA پیشرفته را طراحی خواهید کرد.
در این درس با میکروکنترلر نرم افزاری Microblaze در داخل FPGA ها امروزین کار خواهید و با زبان C در داخل آن کد خواهید نوشت.
همچنین در این درس مفاهیم برنامه نویسی پیشرفته VHDL تدریس خواهد شد.