Instructors | Danqi Chen (danqic AT cs.princeton.edu) and Sanjeev Arora (arora AT cs.princeton.edu) |
Teaching assistants | Adithya Bhaskar (adithyab AT princeton.edu) and Tyler Zhu (tylerzhu AT princeton.edu) |
Lectures | Monday/Wednesday 10:30-11:50am |
Location | CS Building 105 |
Office hours | Danqi's office hour: Tuesday 10-11, COS 412 (by appointment) Sanjeev's office hour: Wednesday 4-5pm, COS 407 Adithya's office hour: Thursday 3-4pm, Friend 010B Tyler's office hour: Monday 4-5pm, Friend 010C |
Feedback form | https://forms.gle/vUD1RieC1YcBSugw7 |
We will use a Slack team for most communications this semester. You will be added to the Slack team after the first week. If you join the class late, just email us, and we'll add you. Once you're on Slack, we prefer Slack messages over emails for all logistical questions. We also encourage students to use Slack for discussions related to lecture content and projects.
Large language models (LLMs) have revolutionized natural language processing by enabling machines to generate, understand, and interact with human language in more sophisticated ways than ever before. Beyond technical advancements, LLMs are shaping societal interactions with technology, from enhancing accessibility for underserved communities to transforming education, healthcare, and creative industries. This course aims to provide a rigorous survey of current LLM research, including model architecture, data preparation, pre-training, post-training, alignment, and model deployment. The course focuses on conceptual understanding and research rather than engineering, and it is expected to be highly interactive. Students are expected to read cutting-edge research papers regularly, participate in class discussion, and also complete a major project (in groups of 2-3) at the end, for which computational resources will be arranged.
Prerequisites: COS484 or equivalent background (i.e., familiarity with fundamentals of deep learning/machine learning, Transformers, PyTorch). Open to all graduate students. Undergraduates need instructors' permission.
Date | Instructor | Topic/required reading | Recommended reading | Reading response | Panel discussion | Scribes |
---|---|---|---|---|---|---|
Sep 4 (Wed) | Sanjeev | Introduction [slides] | N/A | |||
Sep 9 (Mon) | Danqi |
Pretraining 1 [slides]
|
[link] | N/A |
|
|
Sep 11 (Wed) | Danqi |
Pretraining 2 [slides]
|
[link] | N/A |
|
|
Sep 16 (Mon) | Sanjeev |
Scaling laws [slides]
|
[link] | N/A |
|
|
Sep 18 (Wed) | Sanjeev |
Emergent behavior [slides]
|
[link] | N/A |
|
|
Sep 23 (Mon) | Danqi |
Data curation [slides]
|
[link] |
Paper: Phi-1.5 "More data or better data?" Presenter: Victor Chu Critics:
|
|
|
Sep 25 (Wed) | Danqi | Post-training: Instruction tuning [slides] | [link] |
Paper: Schaeffer et al 2023 "Are emergent abilities a mirage?" Presenter: Mingqian Xue Critics:
|
|
|
Sep 30 (Mon) | Danqi | Post-training: learning from preferences [slides] |
|
[link] |
Paper: Scaling Laws for Data Filtering Presenter: Tamjeed Azad Critics:
|
|
Oct 2 (Wed) | Sanjeev | Alignment [slides] | [link] |
Paper: LIMA: Less Is More for Alignment Presenter: Critics:
|
|
|
Oct 7 (Mon) | Sanjeev | Constitutional AI [slides] | [link] |
Paper: Is DPO Superior to PPO for LLM Alignment? Presenter: Boyi Wei Critics:
|
|
|
Oct 9 (Wed) | Sanjeev | LLM Metacognition [slides] | [link] |
Paper: Inverse Constitutional AI: Compressing Preferences into Principles Presenter: Zixuan Wang Critics:
|
|
|
Oct 21 (Mon) | Tianyu Gao | Long-context models [slides] | [link] |
Paper: Language Models (Mostly) Know What They Know Presenter: Arin J. Mukherjee Critics:
|
|
|
Oct 23 (Wed) | Sanjeev |
Advanced topics in alignment [slides]
|
The AI through debate blog post and interview. | [link] |
Paper: The Impact of Positional Encoding on Length Generalization in Transformers Presenter: Ambri Ma Critics:
|
|
Oct 28 (Mon) |
|
LLM Reasoning 1 [slides] | [link] |
Paper:
Transcendence: Generative Models Can Outperform The Experts That Train Them
Presenter: Jiayi Zhang Critics:
|
|
|
Oct 30 (Wed) | Danqi |
LLM Reasoning 2 [slides] |
[link] |
Paper:
Stream of Search (SoS): Learning to Search in Language
Presenter: Constantin Schesch Critics:
|
|
|
Nov 4 (Mon) | Mengzhou Xia | Small models [slides] | [link] |
Paper: Information-Theoretic Distillation for Reference-less Summarization Presenter: Ziyu Xiong Critics:
|
|
|
Nov 6 (Wed) |
|
|
[link] |
Paper: To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Presenter: Alexandre Kirchmeyer Critics:
|
|
|
Nov 11 (Mon) | Yu Su (OSU) | A Holistic and Critical Look at Language Agents [slides] | N/A |
|
||
Nov 13 (Wed) | Danqi | Retrieval-augmented language models [slides] | N/A |
|
||
Nov 18 (Mon) | Tri Dao | Hardware-aware Algorithms for Language Modeling | N/A |
|
||
Nov 20 (Wed) | Saining Xie (NYU) | Language Models Need Better Visual Grounding for Meaning and Understanding [slides] | N/A |
|
||
Nov 25 (Mon) | Students | Project presentations | N/A | |||
Dec 2 (Mon) | Students | Project presentations | N/A | |||
Dec 4 (Wed) | Students | Project presentations | N/A |