Hi, I am a research scientist at Mosaic AI Research, Databricks.

My research spans pre-training and post-training LLMs with a focus on optimizing data quality, distribution and curricula. I currently build synthetic data pipelines that scale inference compute to create diverse generations and develop strategies to verify and filter them into high-quality training data. Through rigorous evaluation of model behavior and how it is shaped by training data properties, I aim to create reliable, consistent and trustworthy AI systems.

Previously, I did my Ph.D. in Applied Physics at Stanford University where I was advised by Surya Ganguli and spent some time as a research intern at FAIR, Meta AI. During this time, I worked on the science of deep learning through the lens of data, loss landscapes and neural tangent kernels.

You can find my publications on Google Scholar.

I have also volunteered for SF New Deal where I helped research and draft their economic impact report. Whenever I get the chance, I like to go out social dancing, mostly West Coast Swing.