Surya Subramanian
Hey y'all — my name is Surya! I study computer science @ Georgia Tech. My interests lie in the intersection of machine learning and systems for efficient model training and inference.
Previously, I was on NVIDIA's cuBLAS team writing fast matmul CUDA kernels for Blackwell via emulation on low-precision tensor cores.
Before that, I was on the PyTorch team at Meta, where I worked on Triton and PyTorch Distributed. I worked on PyTorch Symmetric Memory to overlap communication and compute for distributed Triton kernels in tensor and expert parallelism. We open sourced some of our work in Kraken, a Triton library of symmetric memory kernels. Prior to that, I worked on ML infrastructure at Pinterest.
Interests
I really enjoy the full ML acceleration stack, from large-scale distributed training to low-latency inference and kernel optimization.
Outside of systems, I love weightlifting and finding good coffee spots.