Fan Pu Zeng

Hello! I am Fan Pu, and I help large language models have a good time at Jane Street on the AI Assistants team.

Specifically, I work on fine-tuning LLMs to be proficient at writing OCaml, crafting evals for a variety of (mostly coding-related) tasks that we care about, and leading the development of several in-house RAG use cases, including for trading.

(2025-01-01) Our team is hiring - if this work sounds interesting to you and you have strong software engineering skills and a machine learning background, please shoot me an email!

I graduated with a B.S (2022) and M.S (2023) in Computer Science from Carnegie Mellon University.

At CMU, I was actively involved in the open-source programming assignment auto-grading platform Autolab from 2018-2023. I served as the Masters Student Liaison for the Singapore Students Association. I also used to play Capture-The-Flag (CTF) competitions with PPP. I previously interned at Jane Street, Meta, Asana, and Saleswhale (acquired 2022). I was a TA for 10-708 Probabilistic Graphical Models in the Spring 2023 semester.

My current academic interests lies in understanding reasoning in large language models and exploring the theoretical foundations of deep learning. Specifically, I am interested in the principles underlying generalization and the mechanisms contributing to the effectiveness of optimization algorithms.

In my free time, I enjoy bouldering, K-pop dance, running, reading and learning new things, writing things for my blog, and watching anime. I used to do sprint canoe competitively. If I have an extended break I enjoy traveling, especially hiking and exploring the great outdoors. Most of the banner pictures on my blog posts were taken during these hikes. My favorite classroom in CMU is GHC 4303.

I grew up in my hometown Singapore before moving to the US for college and work. I try to go back and visit once a year.

Feel free to reach out to me at fzeng[at]alumni[dot]cmu[dot]edu. I am happy to chat and provide advice.

Regrettably, I am unable to provide referrals for people that I have not directly collaborated with, as I cannot write you a meaningful recommendation.

I have a Technician amateur radio license, with callsign KC3UFE.

This blog was originally started on 24 June 2018, although it has taken many forms since then. All banner pictures on the blog are taken by yours truly!

Talks

Slides I developed for talks on various LLM-related topics. You are free to share, adapt, and reuse these materials, provided that you give appropriate credit.

(2024-11-18) A Statistical Approach to Language Model Evaluations
(2024-10-08) Advanced Retrieval Augmented Generation Techniques
(2024-07-24) Superalignment, or how to train models smarter than us
(2024-05-03) Rotary Positional Embeddings (RoPE)
(2024-04-30) Parameter-Efficient Fine-Tuning
(2024-03-01) Understanding Transformers

Starred Blog Posts

Some of my more popular posts:

Technical Posts

General

CMU

CMU 15-712 Advanced Operating Systems and Distributed Systems Course Review
CMU 15-441/641 Computer Networks Course Review
My Sharing at the Hwa Chong Undergrad Alumni Forum, i.e. why study Computer Science at CMU

news

Jan 21, 2025	Completed a new post on what I think is an under-appreciated topic: An Intuitive Introduction to Gaussian Processes
Jan 12, 2025	Wrote on the beautiful connection between how long a Markov Chain takes to mix (commonly used in MCMC methods in ML), and the spectral gap of its transition matrix: Bounding Mixing Times of Markov Chains via the Spectral Gap
Nov 29, 2024	I will be at NeurIPs from 12/10-12/15. Let’s chat if you’re also there!
Aug 26, 2024	After a year of training and preparation, a night of bad sleep on Camp Muir, and tons of excitement and adrenaline, I summited Tahoma (Mt. Rainier) in clear skies and beautiful weather.
Aug 05, 2024	I wrote my 50th paper summary on this blog today with Reconciling modern machine learning practice and the bias-variance trade-off, just a little over a year from when I published my first summary

latest posts

Jan 21, 2025	An Intuitive Introduction to Gaussian Processes
Jan 12, 2025	Bounding Mixing Times of Markov Chains via the Spectral Gap
Aug 07, 2024	Notes on 'The Llama 3 Herd of Models'
Sep 02, 2023	Playing Sound Voltex at Home: Setting Up Unnamed SDVX Clone with the Yuancon SDVX Controller
Sep 01, 2023	Creating Trackback Requests for Static Sites
Jul 14, 2023	A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers: A Guided Walkthrough
Jun 16, 2023	The CMU Steam Tunnels and Wean 9