Instructors Danqi Chen (danqic AT cs.princeton.edu) and Sanjeev Arora (arora AT cs.princeton.edu)
Teaching assistants Adithya Bhaskar (adithyab AT princeton.edu) and Tyler Zhu (tylerzhu AT princeton.edu)
Lectures Monday/Wednesday 10:30-11:50am
Location CS Building 105
Office hours Danqi's office hour: Tuesday 10-11, COS 412 (by appointment)
Sanjeev's office hour: Wednesday 4-5pm, COS 407
Adithya's office hour: Thursday 3-4pm, Friend 010B
Tyler's office hour: Monday 4-5pm, Friend 010C
Feedback formhttps://forms.gle/vUD1RieC1YcBSugw7

We will use a Slack team for most communications this semester. You will be added to the Slack team after the first week. If you join the class late, just email us, and we'll add you. Once you're on Slack, we prefer Slack messages over emails for all logistical questions. We also encourage students to use Slack for discussions related to lecture content and projects.

Large language models (LLMs) have revolutionized natural language processing by enabling machines to generate, understand, and interact with human language in more sophisticated ways than ever before. Beyond technical advancements, LLMs are shaping societal interactions with technology, from enhancing accessibility for underserved communities to transforming education, healthcare, and creative industries. This course aims to provide a rigorous survey of current LLM research, including model architecture, data preparation, pre-training, post-training, alignment, and model deployment. The course focuses on conceptual understanding and research rather than engineering, and it is expected to be highly interactive. Students are expected to read cutting-edge research papers regularly, participate in class discussion, and also complete a major project (in groups of 2-3) at the end, for which computational resources will be arranged.

Prerequisites: COS484 or equivalent background (i.e., familiarity with fundamentals of deep learning/machine learning, Transformers, PyTorch). Open to all graduate students. Undergraduates need instructors' permission.

Course structure

  • Class participation (30%): In each class, we will cover 1-2 papers (see "required reading" in the schedule). You are required to read these papers in depth beforehand, and answer a pre-lecture question form before the class (there is a Google form linked in the schedule). These are due at 11:59pm on the day before the lecture. Some questions are designed to test your understanding of the reading materials, and some questions are open-ended and prompt you to read the paper critically and write down your thoughts. This counts towards class participation - we will not grade the correctness but we will expect you to do the work, and submit reasonable answers.
  • Debate (15%): We will schedule 12 debate panels in the class from Week 4 to Week 9, with each panel consisting of 5 students and lasting 30 minutes (the lectures will be reduced to 50 minutes). Each panel will focus on one research paper (or two) related to the topics that have been taught so far, and will comprise of the following structure:
    • Each panel will be composed of 1 presenter, 2 critics, and 2 proponents.
    • The presenter will start with a short presentation (8 minutes) of the paper.
    • The 2 critics will then critique the paper, similar to how reviewers assess conference papers—highlighting limitations, weaknesses, and any claims that are not well supported by the experiments.
    • The 2 proponents will explain why they believe the problem does not exist or is not serious.
    • There will be multiple rounds of interaction. critics are asked to send their major criticisms to the proponents at least 2 days before the lecture day, so the proponents have time to research and prepare their responses.
    • The group will write a 2-page summary of the debate later and submit it.
  • Lecture scribing (10%): For each lecture, we will ask 3 students to scribe the lecture content, covering the technical content and research questions.
    • You can find the Overleaf scribe template here. Make a shared copy between all the scribes for a given lecture. It is up to you how to divide up the work so that it is equal. Send your completed Overleaf link + PDF to Adithya and Tyler on Slack by 11:59pm three days after the lecture. For Monday lectures, this is 11:59pm on Thursday. For Wednesday Lectures, this is 11:59pm on Saturday.
    • Please do not add the four course instructors on the Overleaf, but instead share the editable link with Adithya and Tyler.
    • New to the template is a contributions section, please do fill this out when you submit with an overview of each scribe's split.
  • Final project (35% + 10% for presentations): At the end of the semester, everyone is required to do a class project related to modern LLMs and submit a final paper. You should work as a team of 2 or 3. Everyone is required to submit a proposal to Gradescope by Oct 13th (Sunday) 11:59pm, and the final paper on the Dean's Date (Dec 13th 11:59 pm). In-class project presentations will be scheduled in the last three lectures. The template for the final report is here. Feel free to use it for the proposal as well, but you can also use any template you like.

Schedule

Date Instructor Topic/required reading Recommended reading Reading response Panel discussion Scribes
Sep 4 (Wed) Sanjeev Introduction [slides] N/A
Sep 9 (Mon) Danqi Pretraining 1 [slides]
[link] N/A
  • Yinghui He
  • Haichen Dong
  • Brendan Y. Wang
Sep 11 (Wed) Danqi Pretraining 2 [slides]
[link] N/A
  • Jiaxin Xiao
  • Dillon Lue
  • Ziyu Xiong
Sep 16 (Mon) Sanjeev Scaling laws [slides]
[link] N/A
  • Wuwei Zhang
  • Simran Kaur
  • Keerthana Nallamotu
Sep 18 (Wed) Sanjeev Emergent behavior [slides]
[link] N/A
  • Erich Liang
  • Heyu Guo
  • Benedikt P. Stroebl
Sep 23 (Mon) Danqi Data curation [slides]
[link] Paper: Phi-1.5 "More data or better data?"
Presenter: Victor Chu
Critics:
  • Erich Liang
  • Tanvi Namjoshi
Proponents:
  • Simran Kaur
  • Tedi Zadouri
  • Sijia Liu
  • Iain D. Campbell
  • Elizabeth A. Mieczkowski
Sep 25 (Wed) Danqi Post-training: Instruction tuning [slides] [link] Paper: Schaeffer et al 2023 "Are emergent abilities a mirage?"
Presenter: Mingqian Xue
Critics:
  • Lekang Yuan
  • Heyu Guo
Proponents:
  • Qishuo Yin
  • Lihan Zha
  • Jane E. Castleman
  • Kylie Zhang
  • Yingqing Guo
Sep 30 (Mon) Danqi Post-training: learning from preferences [slides] [link] Paper: Scaling Laws for Data Filtering
Presenter: Tamjeed Azad
Critics:
  • Elizabeth A. Mieczkowski
  • Nimra Nadeem
Proponents:
  • Iain D. Campbell
  • Zhicheng Zheng
  • Kincaid MacDonald
  • Amey P. Pasarkar
  • Nobline Yoo
Oct 2 (Wed) Sanjeev Alignment [slides] [link] Paper: LIMA: Less Is More for Alignment
Presenter: Mahsa Bastankhah
Critics:
  • Niusha Moshrefi
  • Zeyu Shen
Proponents:
  • Jiaxin Xiao
  • Wuwei Zhang
  • Nimra Nadeem
  • Stanley Wei
  • Cyrus Vachha
Oct 7 (Mon) Sanjeev Constitutional AI [slides] [link] Paper: Is DPO Superior to PPO for LLM Alignment?
Presenter: Boyi Wei
Critics:
  • Xingyu Zhu
  • Cyrus Vachha
Proponents:
  • Benedikt P. Stroebl
  • Kincaid MacDonald
  • Juhyun Park
  • Wentao Guo
  • Mahsa Bastankhah
Oct 9 (Wed) Sanjeev LLM Metacognition [slides] [link] Paper: Inverse Constitutional AI: Compressing Preferences into Principles
Presenter: Zixuan Wang
Critics:
  • Rafael Pastrana Jimenez
  • Dillon Lue
Proponents:
  • Sreemanti Dey
  • Jane E. Castleman
  • Wenzhe Li
  • Mingqian Xue
  • Rafael Pastrana Jimenez
Oct 21 (Mon) Tianyu Gao Long-context models [slides] [link] Paper: Language Models (Mostly) Know What They Know
Presenter: Arin J. Mukherjee
Critics:
  • Seth Karten
  • Veniamin Veselovskyy
Proponents:
  • Yuka Shu
  • Keerthana Nallamotu
  • Victor Chu
  • Yijun Yin
  • Lihan Zha
Oct 23 (Wed) Sanjeev Advanced topics in alignment The AI through debate blog post and interview. [link] Paper: The Impact of Positional Encoding on Length Generalization in Transformers
Presenter: Ambri Ma
Critics:
  • Colin Wang
  • Jiahao Qiu
Proponents:
  • Brendan Y. Wang
  • David B. Braun
  • Zeyu Shen
  • Tedi Zadouri
  • Lekang Yuan
Oct 28 (Mon) TBD Topic TBD Presenter: Jiayi Zhang
Critics:
  • Catherine Cheng
  • Juhyun Park
Proponents:
  • Wentao Guo
  • Sijia Liu
  • Niusha Moshrefi
  • Zhicheng Zheng
  • Zixuan Wang
Oct 30 (Wed) TBD Topic TBD Presenter: Constantin Schesch
Critics:
  • Yinghui He
  • Yijun Yin
Proponents:
  • Haichen Dong
  • Amey P. Pasarkar
  • Creston A. Brooks
  • Jiayi Zhang
  • Qishuo Yin
Nov 4 (Mon) TBD Topid TBD Presenter: Ziyu Xiong
Critics:
  • Nobline Yoo
  • Creston A. Brooks
Proponents:
  • Stanley Wei
  • Lucy He
  • David B. Braun
  • Boyi Wei
  • Arin J. Mukherjee
Nov 6 (Wed) Mengzhou Small models Presenter: Alexandre Kirchmeyer
Critics:
  • Wenzhe Li
  • Kylie Zhang
Proponents:
  • Yingqing Guo
  • Joie Y . Zhang
  • Sreemanti Dey
  • Xingyu Zhu
  • Colin Wang
Nov 11 (Mon) Guest Guest Lecture #1 N/A
  • Alexandre Kirchmeyer
  • Lucy He
  • Jiahao Qiu
Nov 13 (Wed) Guest Guest Lecture #2 N/A
  • Veniamin Veselovskyy
  • Tanvi Namjoshi
  • Ambri Ma
Nov 18 (Mon) Guest Guest Lecture #3 N/A
  • Tamjeed Azad
  • Seth Karten
  • Catherine Cheng
Nov 20 (Wed) Guest Guest Lecture #4 N/A
  • Constantin Schesch
  • Yuka Shu
  • Joie Y . Zhang
Nov 25 (Mon) Students Project presentations N/A
Dec 2 (Mon) Students Project presentations N/A
Dec 4 (Wed) Students Project presentations N/A