Surya Subramanian
I'm interested in the intersection of machine learning and systems. I spend a lot of time thinking about large-scale distributed training, low-latency inference, and kernel optimization.
I'm currently working on scaling frontier models at Tesla AI.
In the past, I worked on:
- Fast kernels for the cuBLAS library on Blackwell at NVIDIA.
- PyTorch Symmetric Memory and distributed Triton kernels for efficient tensor and expert parallelism at Meta.
- ML infrastructure for performant conversion user match models at Pinterest.
I studied computer science at Georgia Tech, where I did research on efficient MoE inference and KV cache compression.
Outside of systems, I like lifting and finding good coffee spots.
Feel free to reach out over email.