Surya Subramanian

Hey y’all — my name is Surya! I’m interested in the intersection of machine learning and systems for efficient training and inference. I spend a lot of my time thinking about large-scale distributed training, inference, and kernel optimization.

Previously, I was on NVIDIA’s cuBLAS team writing GEMM kernels for Blackwell GPUs. I worked on emulating higher-precision matmuls with low-precision tensor cores for faster throughput.

Before that, I was on the PyTorch team at Meta, where I worked on PyTorch Symmetric Memory. I worked on inter-node support via the NVSHMEM backend, and wrote fused distributed Triton kernels for tensor parallelism (e.g. GEMM + All Reduce, GEMM + Reduce Scatter, etc.). We open sourced some of our work in Kraken, a Triton library of symmetric memory kernels. Prior to that, I worked on ML infrastructure at Pinterest.

I studied computer science at Georgia Tech. There, I did some research in efficient multimodal inference and KV cache compression at SAIL (our Systems for AI Lab) and earlier, worked on inference optimizations for MoE models via expert reduction.

Outside of systems, I love weightlifting and finding good coffee spots.

If you’d like to get in touch, feel free to email me.