ML Paper Summaries

Summaries and critiques of papers (mostly in machine learning) I’ve read in detail. This is not a summary of the traditional sense which will carefully go over all the major concepts in the paper (due to time constraints); instead, it will be rather concise and only contain the key points that I find interesting, with the expectation that the reader already has some familiarity with the paper.

This serves to both catalog my own reading and academic progress, and may also be of interest to others to find interesting papers to check out.

The format is inspired by the paper summaries of a class I took.

$$ \newcommand{\bone}{\mathbf{1}} \newcommand{\bbeta}{\mathbf{\beta}} \newcommand{\bdelta}{\mathbf{\delta}} \newcommand{\bepsilon}{\mathbf{\epsilon}} \newcommand{\blambda}{\mathbf{\lambda}} \newcommand{\bomega}{\mathbf{\omega}} \newcommand{\bpi}{\mathbf{\pi}} \newcommand{\bphi}{\mathbf{\phi}} \newcommand{\bvphi}{\mathbf{\varphi}} \newcommand{\bpsi}{\mathbf{\psi}} \newcommand{\bsigma}{\mathbf{\sigma}} \newcommand{\btheta}{\mathbf{\theta}} \newcommand{\btau}{\mathbf{\tau}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\boldf}{\mathbf{f}} \newcommand{\bg}{\mathbf{g}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bi}{\mathbf{i}} \newcommand{\bj}{\mathbf{j}} \newcommand{\bk}{\mathbf{k}} \newcommand{\bell}{\mathbf{\ell}} \newcommand{\bm}{\mathbf{m}} \newcommand{\bn}{\mathbf{n}} \newcommand{\bo}{\mathbf{o}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bs}{\mathbf{s}} \newcommand{\bt}{\mathbf{t}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bA}{\mathbf{A}} \newcommand{\bB}{\mathbf{B}} \newcommand{\bC}{\mathbf{C}} \newcommand{\bD}{\mathbf{D}} \newcommand{\bE}{\mathbf{E}} \newcommand{\bF}{\mathbf{F}} \newcommand{\bG}{\mathbf{G}} \newcommand{\bH}{\mathbf{H}} \newcommand{\bI}{\mathbf{I}} \newcommand{\bJ}{\mathbf{J}} \newcommand{\bK}{\mathbf{K}} \newcommand{\bL}{\mathbf{L}} \newcommand{\bM}{\mathbf{M}} \newcommand{\bN}{\mathbf{N}} \newcommand{\bP}{\mathbf{P}} \newcommand{\bQ}{\mathbf{Q}} \newcommand{\bR}{\mathbf{R}} \newcommand{\bS}{\mathbf{S}} \newcommand{\bT}{\mathbf{T}} \newcommand{\bU}{\mathbf{U}} \newcommand{\bV}{\mathbf{V}} \newcommand{\bW}{\mathbf{W}} \newcommand{\bX}{\mathbf{X}} \newcommand{\bY}{\mathbf{Y}} \newcommand{\bZ}{\mathbf{Z}} \newcommand{\bsa}{\boldsymbol{a}} \newcommand{\bsb}{\boldsymbol{b}} \newcommand{\bsc}{\boldsymbol{c}} \newcommand{\bsd}{\boldsymbol{d}} \newcommand{\bse}{\boldsymbol{e}} \newcommand{\bsoldf}{\boldsymbol{f}} \newcommand{\bsg}{\boldsymbol{g}} \newcommand{\bsh}{\boldsymbol{h}} \newcommand{\bsi}{\boldsymbol{i}} \newcommand{\bsj}{\boldsymbol{j}} \newcommand{\bsk}{\boldsymbol{k}} \newcommand{\bsell}{\boldsymbol{\ell}} \newcommand{\bsm}{\boldsymbol{m}} \newcommand{\bsn}{\boldsymbol{n}} \newcommand{\bso}{\boldsymbol{o}} \newcommand{\bsp}{\boldsymbol{p}} \newcommand{\bsq}{\boldsymbol{q}} \newcommand{\bsr}{\boldsymbol{r}} \newcommand{\bss}{\boldsymbol{s}} \newcommand{\bst}{\boldsymbol{t}} \newcommand{\bsu}{\boldsymbol{u}} \newcommand{\bsv}{\boldsymbol{v}} \newcommand{\bsw}{\boldsymbol{w}} \newcommand{\bsx}{\boldsymbol{x}} \newcommand{\bsy}{\boldsymbol{y}} \newcommand{\bsz}{\boldsymbol{z}} \newcommand{\bsA}{\boldsymbol{A}} \newcommand{\bsB}{\boldsymbol{B}} \newcommand{\bsC}{\boldsymbol{C}} \newcommand{\bsD}{\boldsymbol{D}} \newcommand{\bsE}{\boldsymbol{E}} \newcommand{\bsF}{\boldsymbol{F}} \newcommand{\bsG}{\boldsymbol{G}} \newcommand{\bsH}{\boldsymbol{H}} \newcommand{\bsI}{\boldsymbol{I}} \newcommand{\bsJ}{\boldsymbol{J}} \newcommand{\bsK}{\boldsymbol{K}} \newcommand{\bsL}{\boldsymbol{L}} \newcommand{\bsM}{\boldsymbol{M}} \newcommand{\bsN}{\boldsymbol{N}} \newcommand{\bsP}{\boldsymbol{P}} \newcommand{\bsQ}{\boldsymbol{Q}} \newcommand{\bsR}{\boldsymbol{R}} \newcommand{\bsS}{\boldsymbol{S}} \newcommand{\bsT}{\boldsymbol{T}} \newcommand{\bsU}{\boldsymbol{U}} \newcommand{\bsV}{\boldsymbol{V}} \newcommand{\bsW}{\boldsymbol{W}} \newcommand{\bsX}{\boldsymbol{X}} \newcommand{\bsY}{\boldsymbol{Y}} \newcommand{\bsZ}{\boldsymbol{Z}} \newcommand{\calA}{\mathcal{A}} \newcommand{\calB}{\mathcal{B}} \newcommand{\calC}{\mathcal{C}} \newcommand{\calD}{\mathcal{D}} \newcommand{\calE}{\mathcal{E}} \newcommand{\calF}{\mathcal{F}} \newcommand{\calG}{\mathcal{G}} \newcommand{\calH}{\mathcal{H}} \newcommand{\calI}{\mathcal{I}} \newcommand{\calJ}{\mathcal{J}} \newcommand{\calK}{\mathcal{K}} \newcommand{\calL}{\mathcal{L}} \newcommand{\calM}{\mathcal{M}} \newcommand{\calN}{\mathcal{N}} \newcommand{\calO}{\mathcal{O}} \newcommand{\calP}{\mathcal{P}} \newcommand{\calQ}{\mathcal{Q}} \newcommand{\calR}{\mathcal{R}} \newcommand{\calS}{\mathcal{S}} \newcommand{\calT}{\mathcal{T}} \newcommand{\calU}{\mathcal{U}} \newcommand{\calV}{\mathcal{V}} \newcommand{\calW}{\mathcal{W}} \newcommand{\calX}{\mathcal{X}} \newcommand{\calY}{\mathcal{Y}} \newcommand{\calZ}{\mathcal{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\F}{\mathbb{F}} \newcommand{\Q}{\mathbb{Q}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \newcommand{\nnz}[1]{\mbox{nnz}(#1)} \newcommand{\dotprod}[2]{\langle #1, #2 \rangle} \newcommand{\ignore}[1]{} \let\Pr\relax \DeclareMathOperator*{\Pr}{\mathbf{Pr}} \newcommand{\E}{\mathbb{E}} \DeclareMathOperator*{\Ex}{\mathbf{E}} \DeclareMathOperator*{\Var}{\mathbf{Var}} \DeclareMathOperator*{\Cov}{\mathbf{Cov}} \DeclareMathOperator*{\stddev}{\mathbf{stddev}} \DeclareMathOperator*{\avg}{avg} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\size}{size} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\dist}{dist} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\spn}{span} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\codim}{codim} \DeclareMathOperator{\diag}{diag} \newcommand{\PTIME}{\mathsf{P}} \newcommand{\LOGSPACE}{\mathsf{L}} \newcommand{\ZPP}{\mathsf{ZPP}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\P}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\TC}{\mathsf{TC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\SC}{\mathsf{SC}} \newcommand{\SZK}{\mathsf{SZK}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \newcommand{\EXP}{\mathsf{EXP}} \newcommand{\MIP}{\mathsf{MIP}} \newcommand{\NEXP}{\mathsf{NEXP}} \newcommand{\BQP}{\mathsf{BQP}} \newcommand{\distP}{\mathsf{dist\textbf{P}}} \newcommand{\distNP}{\mathsf{dist\textbf{NP}}} \newcommand{\eps}{\epsilon} \newcommand{\lam}{\lambda} \newcommand{\dleta}{\delta} \newcommand{\simga}{\sigma} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\wt}[1]{\widetilde{#1}} \newcommand{\wh}[1]{\widehat{#1}} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\ul}[1]{\underline{#1}} \newcommand{\ot}{\otimes} \newcommand{\zo}{\{0,1\}} \newcommand{\co}{:} %\newcommand{\co}{\colon} \newcommand{\bdry}{\partial} \newcommand{\grad}{\nabla} \newcommand{\transp}{^\intercal} \newcommand{\inv}{^{-1}} \newcommand{\symmdiff}{\triangle} \newcommand{\symdiff}{\symmdiff} \newcommand{\half}{\tfrac{1}{2}} \newcommand{\mathbbm}{\Bbb} \newcommand{\bbone}{\mathbbm 1} \newcommand{\Id}{\bbone} \newcommand{\SAT}{\mathsf{SAT}} \newcommand{\bcalG}{\boldsymbol{\calG}} \newcommand{\calbG}{\bcalG} \newcommand{\bcalX}{\boldsymbol{\calX}} \newcommand{\calbX}{\bcalX} \newcommand{\bcalY}{\boldsymbol{\calY}} \newcommand{\calbY}{\bcalY} \newcommand{\bcalZ}{\boldsymbol{\calZ}} \newcommand{\calbZ}{\bcalZ} $$

  1. (Jun 8, 2025) Diffusion-LM Improves Controllable Text Generation
  2. (May 10, 2025) ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction
  3. (May 5, 2025) DiLoCo: Distributed Low-Communication Training of Language Models
  4. (Nov 4, 2024) RAGAS: Automated Evaluation of Retrieval Augmented Generation
  5. (Oct 12, 2024) Training Language Models to Self-Correct via Reinforcement Learning
  6. (Oct 12, 2024) Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
  7. (Oct 7, 2024) Unsupervised Dense Information Retrieval with Contrastive Learning
  8. (Oct 5, 2024) Generate rather than Retrieve: Large Language Models are Strong Context Generators
  9. (Oct 3, 2024) Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
  10. (Oct 3, 2024) Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
  11. (Oct 3, 2024) ASQA: Factoid Questions Meet Long-Form Answers
  12. (Oct 2, 2024) Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
  13. (Sep 29, 2024) Lost in the Middle: How Language Models Use Long Contexts
  14. (Sep 29, 2024) InPars: Data Augmentation for Information Retrieval using Large Language Models
  15. (Sep 27, 2024) Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE)
  16. (Sep 23, 2024) Dense X Retrieval: What Retrieval Granularity Should We Use?
  17. (Sep 22, 2024) Query Rewriting for Retrieval-Augmented Large Language Models
  18. (Sep 22, 2024) Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
  19. (Aug 5, 2024) Reconciling modern machine learning practice and the bias-variance trade-off
  20. (Aug 5, 2024) Deep Double Descent: Where Bigger Models and More Data Hurt
  21. (Jul 28, 2024) Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization
  22. (Jul 27, 2024) Editing a classifier by rewriting its prediction rules
  23. (Jul 25, 2024) Distilling the Knowledge in a Neural Network
  24. (Jul 23, 2024) Constitutional AI: Harmlessness from AI Feedback
  25. (Jul 21, 2024) Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
  26. (Jul 14, 2024) Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
  27. (Jul 13, 2024) Large Language Models as Optimizers
  28. (Apr 26, 2024) Prefix-Tuning: Optimizing Continuous Prompts for Generation
  29. (Mar 31, 2024) LoRA: Low-Rank Adaptation of Large Language Models
  30. (Mar 27, 2024) LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
  31. (Mar 23, 2024) Training Compute-Optimal Large Language Models
  32. (Mar 23, 2024) Scaling Laws for Neural Language Models
  33. (Mar 15, 2024) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
  34. (Feb 22, 2024) ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
  35. (Feb 19, 2024) Matryoshka Representation Learning
  36. (Dec 12, 2023) Large Language Models for Software Engineering: Survey and Open Problems
  37. (Nov 3, 2023) High-Resolution Image Synthesis with Latent Diffusion Models
  38. (Oct 31, 2023) Universal and Transferable Adversarial Attacks on Aligned Language Models
  39. (Oct 22, 2023) Zero-shot Image-to-Image Translation
  40. (Oct 17, 2023) InstructPix2Pix: Learning to Follow Image Editing Instructions
  41. (Oct 15, 2023) An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  42. (Oct 4, 2023) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
  43. (Sep 26, 2023) Repository-Level Prompt Generation for Large Language Models of Code
  44. (Sep 24, 2023) Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
  45. (Sep 23, 2023) Calibrate Before Use: Improving Few-Shot Performance of Language Models
  46. (Sep 10, 2023) Understanding Deep Learning Requires Rethinking Generalization
  47. (Sep 9, 2023) The Implicit Bias of Gradient Descent on Separable Data
  48. (Sep 7, 2023) Gradient Descent Provably Optimizes Over-parameterized Neural Networks
  49. (Sep 4, 2023) Loss Landscapes and Optimization in Over-Parameterized Non-Linear Systems and Neural Networks
  50. (Aug 29, 2023) Extracting Training Data from Large Language Models
  51. (Aug 27, 2023) A Watermark for Large Language Models
  52. (Aug 25, 2023) Efficiently Modeling Long Sequences with Structured State Spaces
  53. (Aug 22, 2023) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  54. (Aug 22, 2023) Accurate Detection of Wake Word Start and End Using a CNN
  55. (Aug 19, 2023) Transformers in Speech Processing: A Survey
  56. (Aug 11, 2023) MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
  57. (Aug 11, 2023) Improving Language Understanding by Generative Pre-Training (GPT)
  58. (Aug 11, 2023) Generative Agents: Interactive Simulacra of Human Behavior
  59. (Aug 10, 2023) Simple synthetic data reduces sycophancy in large language models
  60. (Aug 10, 2023) Language Models are Unsupervised Multitask Learners (GPT-2)
  61. (Aug 10, 2023) Dense Passage Retrieval for Open-Domain Question Answering
  62. (Aug 9, 2023) BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  63. (Aug 6, 2023) Evaluating Large Language Models Trained on Code (Codex)
  64. (Aug 5, 2023) Training language models to follow instructions with human feedback (InstructGPT)
  65. (Aug 3, 2023) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  66. (Aug 3, 2023) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  67. (Aug 2, 2023) Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
  68. (Aug 2, 2023) Deep contextualized word representations (ELMo)