Extending Tensor Parallelism for IBM FMS: Sequence Parallelism by Sibi-Git · Pull Request #455 · foundation-model-stack/foundation-model-stack

Sibi-Git · 2025-07-17T23:47:41Z

This project extends the IBM Foundation Model Stack (FMS) to support both Tensor Parallelism (TP) and Sequence Parallelism (SP) in distributed model inference. While TP enables parameter sharding across GPUs, it does not partition the sequence dimension, which leads to memory inefficiency at long sequence lengths. We address this by integrating SP into normalization layers, optimizing layout transitions, and enabling support for non-divisible and short sequence lengths.

Members:

Maria Surani (ms7019)
Ryan Ghosh (rg3681)
Sibi Marappan (sm5726)

…h DTensor API for Llama and Granite

Automate tensor parallel plan generation, sequence parallelismn supported

- Added latency and memory usage plots for L40 GPU and Xeon CPU tests. - Included execution tables and brief analysis under Results and Benchmarks section in README. - Added Wandb link

added cpu test scripts

aw471 and others added 30 commits December 19, 2024 14:06

Add TP DTensor support for llama and granite

9cd6e30

Merge PR foundation-model-stack#371: Modernize tensor parallelism wit…

c051ff2

…h DTensor API for Llama and Granite

Add sequence parallelism env variable

2157688

Automate TP plan generation and add support for sequence parallelism

02b22e1

Add temporary tst script

afed7ab

Merge pull request #2 from HPML-Team9/tp-layer-plan-update

c6b2c50

Automate tensor parallel plan generation, sequence parallelismn supported

Use regex pattern matching for generalized tp layer plan

976bf46

Add debugging

894e95b

Add debugging logs

676515e

Remove redundant logs

43add04

Fix seq parallelization and add debug logs

292aa89

Add padding to fix imbalances

fbe672c

Fix wandb logs

7aab5eb

Base testing scripts

0ce0d85

Introduce 3 cases of sequence lengths

0716ab9

Gate all sequence parallelism changes

eebd306

more test script work

e779f12

Default llama to 32 layers for cluster testing script

8528fd3

Remove unnecessary prepare module input calls

56ad962

Reintroduce PrepareModuleInput to sequence parallel strategy

b149527

Get reserved memory metrics for SP and TP runs

6637002

Fix job and log directory names in slurm job script

8d30b5b

Clean up distributed test script

ea129ce

Remove debugging logs

9bd43d4

Move test files to designated folder

35c18eb

Add comments to tetsing cpu script

10fb5ec

Add README specific section

9c68210

Add comments for sp startegy

4e21f32

Created Assets Folder to store README..md plots

ac2bf56

Add files via upload

f9f9a3b

Sibi-Git and others added 18 commits May 8, 2025 21:07

Delete assets/placeholder.txt

71f90ba

Add files via upload

637065b

Delete assets/execution_time_vs_seq_length_gpu_l40.png

bb86815

Delete assets/memory_usage_vs_seq_length_gpu_l40.png

3d5249c

Add files via upload

0944ed8

Add README specific section

2168d14

- Added latency and memory usage plots for L40 GPU and Xeon CPU tests. - Included execution tables and brief analysis under Results and Benchmarks section in README. - Added Wandb link

updated example usage

6246189

updated comments for clarity

f46dbca

added comments

938fc41

Fix path to testing script and increase CPU resources

4488cef

Allow distributed testing script to work without wandb credentials

3fbd706

Update plots to mirror increased computational resources

d3a1045

Fix README with increased computational resources

e535ea4

Update README to new format

bc3e312

Add files via upload

88a51e1

added cpu test scripts

updated cpu benchmark instructions

12e6322

Update test_tp_sp_cluster_cpu.py

8a35945

Update README.md

5834527

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending Tensor Parallelism for IBM FMS: Sequence Parallelism#455

Extending Tensor Parallelism for IBM FMS: Sequence Parallelism#455
Sibi-Git wants to merge 48 commits into
foundation-model-stack:mainfrom
HPML-Team9:main

Sibi-Git commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Sibi-Git commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants