Rotary Position Embedding (RoPE) for ASR (code from Samsung Cambridge) by shucongzhang · Pull Request #2799 · speechbrain/speechbrain

shucongzhang · 2025-01-13T11:03:24Z

What does this PR do?

This PR implements Rotary Position Embedding (RoPE) https://arxiv.org/pdf/2104.09864.

It improves the training speech of LibriSpeech by 13%, while giving slightly better ASR results.

RoPE is also useful for other datasets. We have tested it with Lirbiheavy, CommonVoice, and Voxpopuli. The full results are shown in this short paper https://arxiv.org/pdf/2501.06051. In this PR, we only submit the recipe of RoPE on LibriSpeech for simplicity. If this can be merged, we would like to also submit recipes for other datasets.

rope with the latest SB

…into rope up-to-date with develop

merge the develop into rope

TParcollet · 2025-01-13T11:15:27Z

@pplantinga @Adel-Moumen @mravanelli this is an important PR in the sense that we should, from now on, use RoPE for most models we develop. The reason is that it's definitely faster, and better. We may want to retrain some models with it ... it's also a good distinction with ESPnet and NeMO I believe.

I can do a review as I am not that much aware of the code. But we'll need an external one as well.

TParcollet

Thanks @shucongzhang, just a few minor comments. Also as discussed together, let see how the discussion for Torch attention vs homemade attention goes with the others.

TParcollet · 2025-01-13T11:22:47Z

+loss_reduction: 'batchmean'
+sorting: random
+num_workers: 4
+precision: fp32 # bf16, fp16 or fp32


change to fp16 by default please.

TParcollet · 2025-01-13T11:23:35Z

@@ -0,0 +1,338 @@
+# ############################################################################


This guy looks VERY similar to our conformer_large.yaml, right? Why not changing the main one with RoPE. The model is just better, so I don't see any problem with that.

TParcollet · 2025-01-13T11:24:31Z

                raise ValueError(
                    "The chosen attention type for the Conformer is RelPosMHAXL. For this attention type, the positional embeddings are mandatory"
                )
+        elif self.attention_type == "RoPEMHAXL":


Why the XL? Why not just RoPEMHA?

TParcollet · 2025-01-13T11:26:25Z

            value=memory,
-            attn_mask=memory_mask,
-            key_padding_mask=memory_key_padding_mask,
+            attn_mask=memory_mask,  # none


remove comments?

TParcollet · 2025-01-13T11:26:31Z

            pos_embs=pos_embs_src,
        )

+        # breakpoint()


TParcollet · 2025-01-13T11:27:10Z

        branchformer_activation: Optional[nn.Module] = nn.GELU,
        attention_type: Optional[str] = "regularMHA",
-        max_length: Optional[int] = 2500,
+        max_length: Optional[int] = 10000,


This breaks backward compatibility. I 100% agree that 2500 is too little, but it will break backward compatibility when loading old checkpoints... can we keep it to 2500?

TParcollet · 2025-01-13T11:28:42Z

+        positions_inv_freq = torch.outer(positions, inv_freq)
+
+        cosines = torch.cos(positions_inv_freq)
+        # (cos(m*theta_0), cos(m*theta_0), cos(m*theta_1), cos(m*theta_1) ,... ) for equantion (34)


equation (typo)

TParcollet · 2025-01-13T11:28:53Z

+        )
+
+        sines = torch.sin(positions_inv_freq)
+        # (sin(m*theta_0), sin(m*theta_0), sin(m*theta_1), sin(m*theta_1) ,... ) for equantion (34)


equation (typo)

…into rope_rp

pplantinga · 2025-01-30T16:11:25Z

            key_padding_mask = key_padding_mask.view(bsz, 1, 1, klen).expand(
                bsz, self.num_heads, klen, qlen
            )
+            torch.logical_not(key_padding_mask)


Does this line do anything? Shouldn't it be stored back into a variable?

TParcollet · 2025-01-31T10:57:20Z

I think that there is a software/hardware curiosity that could be of interest to @asumagic @pplantinga and @Adel-Moumen here. I just added a unit test comparing our homemade attention to torch attention (see the unit test folder). You will quickly see that this test passes easily when using CPU, but it fails miserably when using GPU. I am wondering if this is an actual CUDNN issue or something else?

…into rope_pr

asumagic · 2025-01-31T12:18:56Z

I don't really know, can it just be expected precision loss? Or are the values significantly wrong?

Co-authored-by: Rogier van Dalen <r.vandalen@samsung.com>

Rogier fixes to RoPE

Rope streaming

TParcollet · 2025-02-25T14:48:23Z

@Adel-Moumen ready for the review. All the above comments can be only very briefly checked as we have worked intenally (all three of us) to make it better. You can give your input now.

Co-authored-by: Rogier van Dalen <r.vandalen@samsung.com>

Make RoPE memoisation clearer (#4)

Fix comments of review

pplantinga

Very minor last comments cleanup

TParcollet

I am happy with the code. But would need a last check

Adel-Moumen · 2025-03-10T15:23:16Z

I just ran the recipe tests and they worked for this recipe.

speechbrain#2799)

shucongzhang added 14 commits September 25, 2024 14:20

RoPE home made attn

f46208f

Rope with pytorch attention

cac5d66

Merge remote-tracking branch 'origin/develop' into rope

1cb2d67

rope with the latest SB

pytorch attention for RoPE

29c8a21

Merge branch 'develop' of https://github.com/speechbrain/speechbrain …

1b85c31

…into rope up-to-date with develop

hard coded Pytorch Attention for RoPE

8476ed4

hard coded rope for the decoder

909a03a

hard coded rope self-attn, regular cross-attn

4ed16bd

change max len from 2500 to 4000 for RoPE long sequences

1fc091b

TransformerASR.py max len to 10000

51a39b2

Merge branch 'develop' into rope_develop

588d7a7

merge the develop into rope

clean rope code for PR

97d2f82

Merge branch 'develop' into rope_pr

a23236c

rope

522a629

TParcollet requested changes Jan 13, 2025

View reviewed changes

shucongzhang and others added 5 commits January 13, 2025 16:19

pytorch attn

c4477bc

added RoPEPytorchMHA class

8317627

Merge branch 'rope_pr' of https://github.com/shucongzhang/speechbrain …

a44ed80

…into rope_rp

added data augmentation warmup for LS transformer train.py

18a2616

fix passing none mask to logic_not for RoPEPytorchMHA

59278ce

pplantinga reviewed Jan 30, 2025

View reviewed changes

TParcollet and others added 2 commits January 31, 2025 10:08

add unit test with weird precision error

fc02a58

pytorch attention scale and maks

b958d0c

shucongzhang added 4 commits January 31, 2025 11:05

Merge branch 'rope_pr' of https://github.com/shucongzhang/speechbrain …

e4b5917

…into rope_pr

Merge branch 'develop' into rope_pr

3a18696

remove breakpoint...

4dcff99

key_padding_mask for scaled_dot_product_attention

115c509

Titouan Parcollet and others added 5 commits February 10, 2025 17:26

remove rope recipe

8b8e1a6

Different dtypes for rope_rotate (#2)

a55af80

Co-authored-by: Rogier van Dalen <r.vandalen@samsung.com>

Add device to test_rope_rotate (#3)

2e39094

Co-authored-by: Rogier van Dalen <r.vandalen@samsung.com>

rope streaming

733d24e

remove auto compilation

52b02d4

pplantinga added this to the v1.0.3 milestone Feb 14, 2025

TParcollet and others added 5 commits February 17, 2025 11:56

remove useless test

e8173d8

Merge pull request #3 from TParcollet/rope_pr

3cf0ffd

Rogier fixes to RoPE

Merge pull request #4 from TParcollet/rope_streaming

196bf5d

Rope streaming

Merge branch 'develop' into rope_pr

dfda16e

Update TransformerASR.py

a1991be

TParcollet reviewed Feb 25, 2025

View reviewed changes

Comment thread recipes/LibriSpeech/ASR/transformer/README.md Outdated

TParcollet and others added 3 commits February 25, 2025 14:51

Update conformer_large.yaml

2f04186

Make RoPE memoisation clearer (#4)

3250b47

Co-authored-by: Rogier van Dalen <r.vandalen@samsung.com>

Make RoPE memoisation clearer (#4)

2359127

Make RoPE memoisation clearer (#4)

pplantinga linked an issue Feb 26, 2025 that may be closed by this pull request

Positional Rotary Embeddings for transformers #2677

Closed

mravanelli requested a review from Adel-Moumen February 28, 2025 15:11

Adel-Moumen reviewed Mar 5, 2025

View reviewed changes

Comment thread speechbrain/nnet/attention.py Outdated

Comment thread speechbrain/nnet/attention.py Outdated

Comment thread speechbrain/nnet/attention.py Outdated

TParcollet added 2 commits March 7, 2025 18:06

fix comments from reviez

e87be3b

Merge pull request #6 from TParcollet/rope_streaming

5b6a993

Fix comments of review

pplantinga reviewed Mar 7, 2025

View reviewed changes

Comment thread speechbrain/lobes/models/transformer/Transformer.py Outdated

Comment thread speechbrain/lobes/models/transformer/Transformer.py Outdated

Comment thread speechbrain/lobes/models/transformer/Transformer.py Outdated

pplantinga approved these changes Mar 7, 2025

View reviewed changes

pplantinga and others added 2 commits March 7, 2025 13:16

Merge branch 'develop' into rope_pr

00a382f

Minor comment changes

37b90f0

TParcollet approved these changes Mar 7, 2025

View reviewed changes

Adel-Moumen merged commit 7724216 into speechbrain:develop Mar 10, 2025
5 checks passed

pplantinga pushed a commit to pplantinga/speechbrain that referenced this pull request Jun 2, 2025

Rotary Position Embedding (RoPE) for ASR (code from Samsung Cambridge) (

af4784f

speechbrain#2799)

adityadev11 mentioned this pull request Jul 23, 2025

ASR.StreamingASR Inference does not seem to support RoPEMHA attention type #2954

Closed

		@@ -0,0 +1,338 @@
		# ############################################################################

Conversation

shucongzhang commented Jan 13, 2025

What does this PR do?

Uh oh!

TParcollet commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TParcollet commented Jan 31, 2025

Uh oh!

asumagic commented Jan 31, 2025

Uh oh!

TParcollet commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Adel-Moumen commented Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

TParcollet commented Jan 13, 2025 •

edited

Loading