read_audio fixes and docs cleanup by asumagic · Pull Request #1592 · speechbrain/speechbrain

asumagic · 2022-10-04T11:17:22Z

When using the dict variant of read_audio, omitting the "stop" key would fail due to a breaking change in torchaudio 0.8.0 (SB currently requires any version newer than 0.9.0).

While this was undocumented, the code seemed to intend it as possible.

This PR does several things:

The function was cleaned up, with the call to torchaudio.load fixed, keeping backwards-compatibility.
Added sanity checks, which raise an exception when the start-stop range seems incorrect. Let me know if these are overkill.
It attempts to improve the documentation for read_audio significantly, documenting edge cases and previously undocumented behavior, namely
- Keys for the dict variant are documented properly, and it is explained which ones are optional.
- Behavior for multi-channel files is now explained properly.
- General cleanups and formatting improvements.

Following this PR, all of these now work:

# always has worked
a = read_audio("/path/to/file.wav")

# always has worked
b = read_audio({
    "file": "/path/to/file.wav",
    "start": 8000,
    "stop": 16000
})

# was failing, now works
c = read_audio({
    "file": "/path/to/file.wav"
})

# was failing, now works
d = read_audio({
    "file": "/path/to/file.wav",
    "start": 8000
})

Tests were added accordingly.

Root cause

Previously, the num_frames parameter defaulted to 0, which meant "load from frame_offset through the end of file".
This is not the case anymore: num_frames now defaults to -1, with the same meaning. However, passing 0 now fails.

Implementation details

In order to best match the past intended behavior, if start == stop, this PR leaves the num_frames parameter unspecified, which I feel is more intuitive than the previous code here.

mravanelli · 2022-11-02T16:05:41Z

@Adel-Moumen, could you please review it when you have time?

Adel-Moumen · 2022-11-02T16:13:26Z

@Adel-Moumen, could you please review it when you have time?

Yes, sure. 🙂

Adel-Moumen

Hello,

Thanks for the PR. Everything looks nice. I tried your PR and everything works as expected.
Could you please add your name at the top of the file?

Thanks again!

Adel-Moumen

LGTM!

asumagic added 4 commits October 4, 2022 12:06

read_audio fixes and reworked docs

c93c518

Added more extensive tests to read_audio

dcd1328

Fix minor typo in read_audio doc

5c53722

Run black over test_data_io

d0f0808

mravanelli requested a review from Adel-Moumen November 2, 2022 16:05

Adel-Moumen self-assigned this Nov 2, 2022

Adel-Moumen requested changes Nov 21, 2022

View reviewed changes

Added name to author list in dataio

b95cb48

asumagic requested a review from Adel-Moumen November 21, 2022 13:56

Adel-Moumen approved these changes Nov 21, 2022

View reviewed changes

Adel-Moumen merged commit 0d50564 into speechbrain:develop Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_audio fixes and docs cleanup#1592

read_audio fixes and docs cleanup#1592
Adel-Moumen merged 5 commits intospeechbrain:developfrom
asumagic:fix-audio-load-stop-param

asumagic commented Oct 4, 2022 •

edited

Loading

Uh oh!

mravanelli commented Nov 2, 2022

Uh oh!

Adel-Moumen commented Nov 2, 2022

Uh oh!

Adel-Moumen left a comment

Uh oh!

Adel-Moumen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

asumagic commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Implementation details

Uh oh!

mravanelli commented Nov 2, 2022

Uh oh!

Adel-Moumen commented Nov 2, 2022

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asumagic commented Oct 4, 2022 •

edited

Loading