The objective of this course is to establish a connection between the latest topics in digital systems and the field of artificial intelligence. In this course, we will first examine various systems and parallel processing approaches from a fully electronic perspective. Then, the architecture of the latest-generation NVIDIA GPUs, such as the Blackwell 2024 architecture, will be studied.
In the second stage, large language models (LLMs) and the most recent advancements in this field will be explored. In the third stage, the use of large language models and GPUs for processing different types of data—including image data, signal data, and genetic data—will be examined.
The target audience of this course consists of three different groups:
The first group includes individuals who want to become familiar with parallel processing systems, particularly the latest-generation NVIDIA GPUs.
The second group consists of individuals who wish to gain specialized knowledge of large language models and their most recent developments.
The third group includes individuals who intend to work on processing various types of data, including medical imaging data (MRI, CT scan, PET scan, radiology, mammography), medical signals (ECG, EEG, EMG, and nerve conduction signals), and genetic data (genomics, epigenomics, and transcriptomics), as well as other medical data such as proteomics, metabolism, and gut microbiota data.
Of course, the scope of data is not limited to medical data and includes many other types of data. However, to facilitate better understanding of image data and to maintain focus on a specific domain, greater emphasis will be placed on medical data.
The course syllabus is as follows:
Part 1) Parallel processing and architecture of NVidia's latest generation GPUs (33%)
Parallel processing systems
Introduction to the architecture of multi-core systems
Introduction to multi-threaded programming, related programming models and languages
Introduction to the concepts of vector processing, SIMD, SSE, AVX and how to use it
Implementation of algorithms in multi-threaded and vector form using multi-core programming languages (OpenMP)
Introduction to common methods of thread synchronization, locks, barriers
Introduction to the architecture of graphics processors, memory hierarchy in GPU
Introduction to the history of Nvidia's GPU architectures including Fahrenheit, Kelvin Turing, Hopper, Blackwell, Blackwell Ultra
Introduction to the blocks of the CB202 Chipset graphics chip
Introduction to the GPC, Memory Controller, Cache, AMP & Giga Thread Engine, NVENC / NVDEC, Optical Flow Engine, PCI Express 5.0 Host Interface units.
Review of Mixed FP32/INT32 module
Familiarity with the architecture and capabilities of Ray tracing core, Cude core, Tensor core
Familiarity with Warp Schedulers & Dispatch Units
Familiarity with Texture Units: perform texture fetches and filtering
Familiarity with Load/Store Units (LD/ST): handle memory access (global/local memory 10-reads/writes)
Familiarity with Register File: private storage per thread
Familiarity with fast memories Shared Memory / L1 Cache: memory accessible by all threads in the SM. and rapid implementation of algorithms using these memories
Introduction to the AI Management Processor (AI kernels, Multi-GPU Scaling, tensor parallelism, Training massive models (LLMs), data parallelism) and how to use them in implementing LLM models
Introduction to CUDA architecture and GPU Driver
Introduction to the User-Space Driver of GPU (Resource & State Management (Logical View), Build Command Buffers, Interface with Kernel Driver, API Front-End, State Manager, Compiler Stack, Resource Manager, Command Buffer Builder, Caching & Pipeline Database)
Introduction to the Kernel-Space Driver of GPU (Context Manager, Memory Manager, Command Processor Interface, Scheduler / Dispatcher, Synchronization Manager, Interrupt Handler, Power & Thermal Control, Virtualization Layer)
Introduction to GPU parallel programming and the CUDA programming language
Providing examples of implementing common applications on GPU
Programming with CUDA
Modification of CUDA kernels for a specific purpose such as FFT with an arbitrary number of points
Part Two) LLM Networks (34 percent)
Review of LLM networks and the latest developments in this field
Introduction to the basics of Large Language Modeling
Introduction to Tokenization and its hidden effects
Introduction to the basic models of GPT, BERT, T5
Introduction to Reasoning in LLMs
Introduction to the types of Hallucination and uncertainty
Introduction to the types of RAG methods and connection to external knowledge (such as Naïve RAG, Advanced RAG, Modular RAG)
Benchmark evaluation and crisis
Review of the architecture of the latest generation of Llama, GPT, Deep Seek and other global emerging models
Part Three) Application of LLM and GPU on processing various data (including medical data) (33 percent)
Implementation of LLM methods using GPU on various types of data
How to use the RAG concept in using genetic time-varying databases in LLM models
How to use AI Kernel and AI Processor units in data processing
Implementation of some heavy algorithms on Nvidia's latest generation GPUs using fast L1 Cache memory.
Familiarity with medical image formats such as MRI, CT-scan, Pet-scan, Radiology, Mammography and the role of each of these data in understanding medical issues
Familiarity with medical signal data formats such as EEG, cardiac pacemaker, muscle and nerve tape and the role of each of these data in understanding medical issues
Familiarity with basic genetic concepts such as DNA, RNA, Protein and Gut Microbiota
Familiarity with levels of medical data analysis including genomics, epigenomics, transcriptomics, proteomics, metabolomics, phenomics
How to Fine-tune LLM models for analyzing various Multi-omics data
How to design Encoder and Decoder layers in LLM models on images, signals and genetics
Familiarity with genetic databases in medicine
Teacher Assistants
Books
Slides
Supplementary Files