Surya Subramanian

About

I am an engineer and researcher interested in the intersection of machine learning, systems, and performance.

Currently, I'm working on Triton and PyTorch at Meta AI, specifically on Triton distributed and PyTorch symmetric memory. This fall, I’ll be joining NVIDIA’s cuBLAS team to work on GPU kernel performance optimization for matrix multiplication on next-generation NVIDIA GPUs. Previously, I was a SWE intern at Pinterest working on distributed systems and ML infrastructure for ads ranking and conversion user match models.

I'm also a computer science student at Georgia Tech researching efficient inference for mixture-of-experts models.

My Interests

I'm really interested in the full ML acceleration stack for foundation models — everything from large-scale distributed training and model parallelism to inference, GPU kernel performance, and AI compilers.

Lately, I've been especially focused on the intersection of model scaling and low-level ML systems. On the side, I also explore more algorithmic directions — things like efficient model architectures, faster decoding methods, and compression techniques.

Outside of work, I enjoy lifting weights, spending time with friends, trying out different restaurants and cafés, and watching football or basketball in my spare time.

Contact

Feel free to reach out at ssubrama32@gatech.edu!