Skip to content

Commit 3cc087a

Browse files
committed
Update reading list
1 parent e1425ed commit 3cc087a

File tree

4 files changed

+78
-30
lines changed

4 files changed

+78
-30
lines changed

_data/papers.yaml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ wlbllm-osdi25:
2222
booktitle: "OSDI 25"
2323
url: "https://www.usenix.org/conference/osdi25/presentation/wang-zheng"
2424

25+
hotspa-sosp24:
26+
title: "Enabling Parallelism Hot Switching for Efficient Training of Large Language Models"
27+
authors: "Hao Ge, Fangcheng Fu, Haoyang Li, Xuanyu Wang, Sheng Lin, Yujie Wang, Xiaonan Nie, Hailin Zhang, Xupeng Miao, and Bin Cui"
28+
booktitle: "SOSP 24"
29+
url: "https://dl.acm.org/doi/abs/10.1145/3694715.3695969"
30+
2531
llama3-2024:
2632
title: "The Llama 3 Herd of Models"
2733
authors: "Aaron Grattafiori et al."
@@ -100,6 +106,12 @@ areal-arxiv25:
100106
booktitle: "Arxiv 2025"
101107
url: "https://arxiv.org/pdf/2505.24298"
102108

109+
asyncrlhf-iclr25:
110+
title: "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models"
111+
authors: "Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville"
112+
booktitle: "ICLR 25"
113+
url: "https://arxiv.org/abs/2410.18252"
114+
103115
hybridflow-eurosys25:
104116
title: "HybridFlow: A Flexible and Efficient RLHF Framework"
105117
authors: "Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu"
@@ -112,12 +124,36 @@ pagedattention-sosp23:
112124
booktitle: "SOSP 23"
113125
url: "https://arxiv.org/pdf/2309.06180.pdf"
114126

127+
orca-osdi22:
128+
title: "Orca: A Distributed Serving System for Transformer-Based Generative Models"
129+
authors: "Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun"
130+
booktitle: "OSDI 22"
131+
url: "https://www.usenix.org/conference/osdi22/presentation/yu"
132+
115133
distserve-osdi24:
116134
title: "DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving"
117135
authors: "Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang"
118136
booktitle: OSDI 24
119137
url: https://www.usenix.org/conference/osdi24/presentation/zhong-yinmin
120138

139+
loongserve-sosp24:
140+
title: "LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism"
141+
authors: "Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, and Xin Jin"
142+
booktitle: "SOSP 24"
143+
url: "https://dl.acm.org/doi/pdf/10.1145/3694715.3695948"
144+
145+
waferllm-osdi25:
146+
title: "WaferLLM: Large Language Model Inference at Wafer Scale"
147+
authors: "Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai"
148+
booktitle: "OSDI 25"
149+
url: "https://www.usenix.org/system/files/osdi25-he.pdf"
150+
151+
aqua-asplos25:
152+
title: "Aqua: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU Domains"
153+
authors: "Abhishek Vijaya Kumar, Gianni Antichi, and Rachee Singh"
154+
booktitle: "ASPLOS 25"
155+
url: "https://dl.acm.org/doi/abs/10.1145/3676641.3715983"
156+
121157
splitwise-isca24:
122158
title: "SplitWise: Efficient Generative LLM Inference Using Phase Splitting"
123159
authors: "Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini"
@@ -166,6 +202,12 @@ tutel-mlsys23:
166202
booktitle: "MLSys 23"
167203
url: "https://proceedings.mlsys.org/paper_files/paper/2023/hash/5616d34cf8ff73942cfd5aa922842556-Abstract-mlsys2023.html"
168204

205+
megablock-mlsys23:
206+
title: "MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"
207+
authors: "Trevor Gale, Deepak Narayanan, Cliff Young, and Matei Zaharia"
208+
booktitle: "MLSys 23"
209+
url: "https://proceedings.mlsys.org/paper_files/paper/2023/hash/5a54f79333768effe7e8927bcccffe40-Abstract-mlsys2023.html"
210+
169211
moelight-asplos25:
170212
title: "MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs"
171213
authors: "Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, and Ion Stoica"

_includes/paper_item.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{% assign p = site.data.papers[include.key] %}
22
<li markdown="span">
3-
{% if include.required %} <b>(Required)</b> {% endif %} <a href="{{ p.url }}">{{ p.title }}</a>
3+
{% if include.required %} {% if include.required != true %} <b>(Required - {{ include.required }})</b> {% else %} <b>(Required)</b> {% endif %} {% endif %} <a href="{{ p.url }}">{{ p.title }}</a>
44
{% if p.authors != "" %} <br/> <em>{{ p.authors }}</em>. {% endif %} {% if p.booktitle != "" %} {{ p.booktitle }}. {% endif %}
55
</li>

_pages/about.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ See the [**Logistics**]({{ '/logistics/' | relative_url }}) tab for detailed inf
2222

2323
### What We Will Cover
2424

25-
A tentative reading list lives on the [**Reading List**]({{ '/reading-list/' | relative_url }}) tab and draws heavily from OSDI/SOSP, ASPLOS, SIGCOMM, NSDI, MLSys and Nature papers.
25+
A tentative reading list lives on the [**Reading List**]({{ '/reading-list/' | relative_url }}) tab and draws heavily from top systems and ML venues, including OSDI/SOSP, ASPLOS, SIGCOMM, NSDI, MLSys, ICLR, and NeurIPS.
2626

2727
#### Part 1 – LLMs as the Backbone of Modern AI
2828
* Parallel & elastic training (3D, MoE, fault-tolerance)

_pages/reading_list.md

Lines changed: 34 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -12,56 +12,62 @@ title: ""
1212
<b>(Required)</b> <a href="https://arxiv.org/pdf/2407.21783">The Llama 3 Herd of Models</a> (Sections 2, 3.3, and 4.1), <br/><em>Llama Team, AI @ Meta</em>
1313
</li>
1414
{% include paper_item.html key="megatron-sc21" required=true %}
15-
{% include paper_item.html key="wlbllm-osdi25" required=false %}
1615
</ul>
1716

17+
1818
#### Scaling LLM Pre-Training
1919
<ul>
20-
{% include paper_item.html key="alpa-osdi22" required=true %}
21-
{% include paper_item.html key="partir-asplos25" required=false %}
22-
{% include paper_item.html key="rdma-sigcomm24" required=true %}
23-
{% include paper_item.html key="cassini-nsdi24" required=false %}
24-
{% include paper_item.html key="traincheck-osdi25" required=true %}
25-
{% include paper_item.html key="superbench-atc24" required=false %}
26-
{% include paper_item.html key="oobleck-sosp23" required=true %}
27-
{% include paper_item.html key="tenplex-sosp24" required=false %}
20+
{% include paper_item.html key="wlbllm-osdi25" required="Context Parallelism" %}
21+
{% include paper_item.html key="hotspa-sosp24" %}
22+
{% include paper_item.html key="alpa-osdi22" required="Auto Parallelism" %}
23+
{% include paper_item.html key="partir-asplos25" %}
24+
{% include paper_item.html key="rdma-sigcomm24" required="Network" %}
25+
{% include paper_item.html key="cassini-nsdi24" %}
26+
{% include paper_item.html key="traincheck-osdi25" required="Silent Data Corruption" %}
27+
{% include paper_item.html key="superbench-atc24" %}
28+
{% include paper_item.html key="oobleck-sosp23" required="Fault-Tolerance" %}
29+
{% include paper_item.html key="tenplex-sosp24" %}
2830
</ul>
2931

3032
#### LLM Post-Training for Alignment
3133
<ul>
32-
{% include paper_item.html key="rlhfuse-nsdi25" required=true %}
33-
{% include paper_item.html key="hybridflow-eurosys25" required=false %}
34-
{% include paper_item.html key="areal-arxiv25" required=true %}
34+
{% include paper_item.html key="hybridflow-eurosys25" required="Resource Efficiency" %}
35+
{% include paper_item.html key="rlhfuse-nsdi25" %}
36+
{% include paper_item.html key="areal-arxiv25" required="Async RL" %}
37+
{% include paper_item.html key="asyncrlhf-iclr25" %}
3538
</ul>
3639

3740
#### Efficient LLM Serving
3841
<ul>
39-
{% include paper_item.html key="pagedattention-sosp23" required=true %}
40-
{% include paper_item.html key="nanoflow-osdi25" required=true %}
41-
{% include paper_item.html key="sarathiserve-osdi24" required=false %}
42-
{% include paper_item.html key="distserve-osdi24" required=true %}
43-
{% include paper_item.html key="llumnix-osdi24" required=true %}
42+
{% include paper_item.html key="pagedattention-sosp23" required="KV Cache Management" %}
43+
{% include paper_item.html key="orca-osdi22" %}
44+
{% include paper_item.html key="nanoflow-osdi25" required="Optimal Throughput" %}
45+
{% include paper_item.html key="sarathiserve-osdi24" %}
46+
{% include paper_item.html key="distserve-osdi24" required="Prefill/Decode Disaggregation" %}
47+
{% include paper_item.html key="loongserve-sosp24" %}
48+
{% include paper_item.html key="waferllm-osdi25" required="New Hardware" %}
49+
{% include paper_item.html key="aqua-asplos25" %}
4450
</ul>
4551

4652
#### Mixture-of-Experts
4753
<ul>
48-
{% include paper_item.html key="switch-jmlr22" required=true %}
49-
{% include paper_item.html key="moe-iclr17" required=false %}
50-
{% include paper_item.html key="fsmoe-asplos25" required=true %}
51-
{% include paper_item.html key="tutel-mlsys23" required=false %}
52-
{% include paper_item.html key="moelight-asplos25" required=true %}
53-
{% include paper_item.html key="pregatedmoe-isca24" required=false %}
54-
{% include paper_item.html key="readme-neurips24" required=false %}
54+
{% include paper_item.html key="switch-jmlr22" required="MoE Motivation and Architecture" %}
55+
{% include paper_item.html key="moe-iclr17" %}
56+
{% include paper_item.html key="fsmoe-asplos25" required="Training" %}
57+
{% include paper_item.html key="megablock-mlsys23" %}
58+
{% include paper_item.html key="moelight-asplos25" required="Serving" %}
59+
{% include paper_item.html key="pregatedmoe-isca24" %}
60+
{% include paper_item.html key="readme-neurips24" %}
5561
</ul>
5662

5763
## Part 2 - GenAI: Beyond Simple Text Generation
5864
#### Multi-Modal Generation
5965
<ul>
6066
{% include paper_item.html key="illstablediff" required=true %}
61-
{% include paper_item.html key="approxcache-nsdi24" required=true %}
62-
{% include paper_item.html key="diffserve-mlsys24" required=false %}
63-
{% include paper_item.html key="cogvideox-iclr25" required=true %}
64-
{% include paper_item.html key="moviegen-arxiv24" required=false %}
67+
{% include paper_item.html key="approxcache-nsdi24" required="Diffusion Model Serving" %}
68+
{% include paper_item.html key="diffserve-mlsys24" %}
69+
{% include paper_item.html key="cogvideox-iclr25" required="Video Gen Model" %}
70+
{% include paper_item.html key="moviegen-arxiv24" %}
6571
</ul>
6672

6773
#### Retrieval-Augmented Generation

0 commit comments

Comments
 (0)