Surya Subramanian

Hey y'all — my name is Surya! I study computer science @ Georgia Tech. My interests lie in the intersection of machine learning and systems for efficient model training and inference.

Previously, I was on NVIDIA's cuBLAS team writing fast matmul CUDA kernels for Blackwell via emulation on low-precision tensor cores.

Before that, I was on the PyTorch team at Meta, where I worked on Triton and PyTorch Distributed. I worked on PyTorch Symmetric Memory to overlap communication and compute for distributed Triton kernels in tensor and expert parallelism. We open sourced some of our work in Kraken, a Triton library of symmetric memory kernels. Prior to that, I worked on ML infrastructure at Pinterest.

Interests

I really enjoy the full ML acceleration stack, from large-scale distributed training to low-latency inference and kernel optimization.

Outside of systems, I love weightlifting and finding good coffee spots.