I am an engineer and researcher interested in the intersection of machine learning, systems, and performance.
This past summer, I worked on Triton and PyTorch at Meta AI, specifically on Triton distributed and PyTorch symmetric memory for commmuncation/compute overlap in distributed kernels. I'm currently on NVIDIA’s cuBLAS team, working on fast matmul kernels for NVIDIA's latest GPUs. Previously, I was a SWE intern at Pinterest working on distributed systems and ML infrastructure for ads ranking and conversion user match models.
I'm also a computer science student at Georgia Tech researching efficient inference for mixture-of-experts models.
I'm really interested in the full ML acceleration stack for foundation models — everything from large-scale distributed training and model parallelism to low-latency inference and GPU kernel performance.
Lately, I've been especially focused on the intersection of model scaling and low-level ML systems. On the side, I also explore more algorithmic directions — things like efficient model architectures, faster decoding methods, and compression techniques.
Outside of work, I enjoy lifting weights, spending time with friends, trying out different restaurants and cafés, and watching football or basketball in my spare time.
Feel free to reach out at ssubrama32@gatech.edu!