GPU Tales

Posted 2025-06-10Updated 2025-06-105 minutes read (About 720 words)

🧠 When Lazy Kernels Hang - A Quirky Tale of CUDA, Streams, and Warmups

Summary:
Ever had your CUDA kernels mysteriously hang, even though everything looked fine? You’re not alone. This post walks through a deceptively simple code snippet that deadlocks — and explains how lazy loading, asynchronous streams, and cold GPUs all conspire to make benchmarking and debugging… interesting. We’ll break down what happens, why it matters, and how to keep your GPU pipelines warm and humming.

Posted 2025-06-10Updated 2025-06-10a few seconds read (About 40 words)

Upcoming - in no order

Benchmarking setup
Concurrency
Matmul
- nsight compute
nsight sanitizer
Tensor cores
- mma
- wmma
- wgmma + TMA
torch.compile
MoE : FlashDMoE
Reduction
Prefix Scan
Flashattention + FlashMLA
Model Parallelism or Distributed training/inference
- FSDP
- Expert Parallelism
- Context Parallelism
- Sequence Parallelism
- Pipeline Parallelism
- 4D Parallelism

Posted 2025-06-10Updated 2025-06-104 minutes read (About 659 words)

🔍 Know thy GPU - A Fun Dive into CUDA Device Introspection

Ever wondered what your GPU is made of? I don’t mean physically (though that would make a great teardown video) — I mean capability-wise. If you’re working with CUDA, it’s crucial to know whether your GPU supports managed memory, tensor cores, or concurrent kernel execution. And hey, maybe you’re just trying to settle a bet about whose card is faster. 🏎️

In this post, we’ll go on a quick and entertaining tour through a powerful C++ tool that queries all your CUDA-capable GPUs and tells you everything from warp size to peak memory bandwidth. Buckle up!

Posted 2025-06-07Updated 2025-06-07a minute read (About 123 words)

Hello World

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

Quick Start

Create a new post

Run server

Generate static files

Deploy to remote sites

Recents

Archives