blog.sh

$ ls -la ~/posts

Found 2 posts

NEW 2026Featured

RDMA Transfer for LLM Inference

Open-source implementation of trillion-parameter instant transfer between training and inference using RDMA P2P. Transfer a 512-GPU 1T Kimi FP8 model in 7 seconds, and a 744B BF16 GLM5 model in 8.5 seconds on H100s with InfiniBand — roughly 7x faster than previous open-source solutions.

RDMALLMDistributed SystemsSGLangMiles

Llama 4 - Multimodal AI

Contributing to Meta's Llama 4 Multimodal Posttraining.

Meta AIMultimodalLLM