Surya Subramanian

About

I am an engineer and researcher interested in the intersection of machine learning, systems, and performance.

This past summer, I worked on Triton and PyTorch at Meta AI, specifically on Triton distributed and PyTorch symmetric memory for commmuncation/compute overlap in distributed kernels. I'm currently on NVIDIA’s cuBLAS team, working on fast matmul kernels for NVIDIA's latest GPUs. Previously, I was a SWE intern at Pinterest working on distributed systems and ML infrastructure for ads ranking and conversion user match models.

I'm also a computer science student at Georgia Tech researching efficient inference for mixture-of-experts models.

My Interests

I'm really interested in the full ML acceleration stack for foundation models — everything from large-scale distributed training and model parallelism to low-latency inference and GPU kernel performance.

Lately, I've been especially focused on the intersection of model scaling and low-level ML systems. On the side, I also explore more algorithmic directions — things like efficient model architectures, faster decoding methods, and compression techniques.

Outside of work, I enjoy lifting weights, spending time with friends, trying out different restaurants and cafés, and watching football or basketball in my spare time.

Contact

Feel free to reach out at ssubrama32@gatech.edu!