Surya Subramanian
Hey y’all — my name is Surya! I’m interested in the intersection of machine learning and systems for efficient training and inference.
Previously, I was on NVIDIA’s cuBLAS team writing GEMM kernels for Blackwell GPUs. I worked on emulating higher-precision matmuls with low-precision tensor cores for faster throughput.
Before that, I was on the PyTorch team at Meta, where I worked on PyTorch Symmetric Memory. I worked on inter-node support via the NVSHMEM backend, and wrote fused distributed Triton kernels for tensor parallelism (e.g. GEMM + All Reduce, GEMM + Reduce Scatter, etc.). We open sourced some of our work in Kraken, a Triton library of symmetric memory kernels. Prior to that, I worked on ML infrastructure at Pinterest.
I studied computer science at Georgia Tech. There, I did some research in efficient multimodal inference and KV cache compression at SAIL (our Systems for AI Lab) and earlier, worked on inference optimizations for MoE models via expert reduction.
Interests
I really enjoy the full ML acceleration stack, from large-scale distributed training to low-latency inference and kernel optimization.
Outside of systems, I love weightlifting and finding good coffee spots.