blog.sh
$ ls -la ~/posts
Found 2 posts
NEW 2026Featured
RDMA Transfer for LLM Inference
Open-source implementation of trillion-parameter instant transfer between training and inference using RDMA P2P. Transfer a 512-GPU 1T Kimi FP8 model in 7 seconds, and a 744B BF16 GLM5 model in 8.5 seconds on H100s with InfiniBand — roughly 7x faster than previous open-source solutions.
RDMALLMDistributed SystemsSGLangMiles
Llama 4 - Multimodal AI
Contributing to Meta's Llama 4 Multimodal Posttraining.
Meta AIMultimodalLLM