Fan Pu Zeng
Software Engineer at Jane Street. Currently located in New York, NY, USA.
Hello! I am Fan Pu, and I help large language models have a good time at Jane Street on the AI Assistants team.
Specifically, I work on fine-tuning LLMs to be proficient at writing OCaml, crafting evals for a variety of (mostly coding-related) tasks that we care about, and leading the development of several in-house RAG use cases, including for trading.
Our team is hiring - if this sounds interesting to you and you believe you are a good fit, please shoot me an email!
I recently graduated with a M.S (2023) and B.S (2022) in Computer Science from Carnegie Mellon University.
At CMU, I was actively involved in the open-source programming assignment auto-grading platform Autolab from 2018-2023. I served as the Masters Student Liaison for the Singapore Students Association. I also used to play Capture-The-Flag (CTF) competitions with PPP. I previously interned at Jane Street, Meta, Asana, and Saleswhale (acquired 2022). I was a TA for 10-708 Probabilistic Graphical Models in the Spring 2023 semester.
My current academic interests lies in understanding reasoning in large language models and exploring the theoretical foundations of deep learning. Specifically, I am interested in the principles underlying generalization and the mechanisms contributing to the effectiveness of optimization algorithms.
In my free time, I enjoy bouldering, K-pop dance, running, reading and learning new things, writing things for my blog, and watching anime. I used to do sprint canoe competitively. If I have an extended break I enjoy traveling, especially hiking and exploring the great outdoors. Most of the banner pictures on my blog posts were taken during these hikes. My favorite classroom in CMU is GHC 4303.
I grew up in my hometown Singapore before moving to the US for college and work. I try to go back and visit once a year.
Feel free to reach out to me at fzeng[at]alumni[dot]cmu[dot]edu. I am happy to chat and provide advice.
Regrettably, I am unable to provide referrals for people that I have not directly collaborated with, as I cannot write you a meaningful recommendation.
I have a Technician amateur radio license, with callsign KC3UFE.
This blog was originally started on 24 June 2018, although it has taken many forms since then. All banner pictures on the blog are taken by yours truly!
Talks
Slides for talks on LLM-related topics that I gave. You are free to share, adapt, and reuse these materials, provided that you give appropriate credit.
- (2024-11-18) A Statistical Approach to Language Model Evaluations
- (2024-10-08) Advanced Retrieval Augmented Generation Techniques
- (2024-07-24) Superalignment, or how to train models smarter than us
- (2024-05-03) Rotary Positional Embeddings (RoPE)
- (2024-04-30) Parameter-Efficient Fine-Tuning
- (2024-03-01) Understanding Transformers
Starred Blog Posts
Some of my more popular posts:
Technical Posts
- Score-Based Diffusion Models
- (Paper Summary) Zero-shot Image-to-Image Translation
- (Paper Summary) The Implicit Bias of Gradient Descent on Separable Data
- A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers: A Guided Walkthrough
- The Delightful Consequences of the Graph Minor Theorem
- Universal types, and your type checker doesn’t suck as much as you think
General
- The Art of LaTeX: Common Mistakes, and Advice for Typesetting Beautiful, Delightful Proofs
- Against Government Scholarships
- Notes On Founding A Startup To My Future Self
CMU
News
Nov 29, 2024 | I will be at NeurIPs from 12/10-12/15. Let’s chat if you’re also there! |
---|---|
Aug 26, 2024 | After a year of training and preparation, a night of bad sleep on Camp Muir, and tons of excitement and adrenaline, I summited Tahoma (Mt. Rainier) in clear skies and beautiful weather. |
Aug 5, 2024 | I wrote my 50th paper summary on this blog today with Reconciling modern machine learning practice and the bias-variance trade-off, just a little over a year from when I published my first summary |
Nov 20, 2023 | I’m really excited to be joining the AI Assistants team at Jane Street to work on large language models! |
Oct 22, 2023 | Read a really interesting paper on image translation via diffusion models this weekend and wrote a more detailed than usual summary for it: Zero-shot Image-to-Image Translation |
Sep 9, 2023 | Wrote a pretty interesting summary with high-level proof sketches for The Implicit Bias of Gradient Descent on Separable Data |
Sep 2, 2023 | Wrote a tutorial on setting up the Japanese arcade rhythm game Sound Voltex at home. |