Replace torchaudio I/O with soundfile-based audio_io wrapper by Copilot · Pull Request #2989 · speechbrain/speechbrain

Copilot · 2025-10-31T01:37:55Z

Replace torchaudio's audio I/O uses (load, save, info) with a soundfile-based compatibility wrapper

This PR successfully replaces all torchaudio.{load,save,info} calls (~228 occurrences) across the repository with a new soundfile-based compatibility layer, decoupling audio I/O from torchaudio's ML transforms while maintaining identical API behavior.

Implementation Complete ✅

Changes

New module: speechbrain/dataio/audio_io.py

load(path, *, channels_first=True, dtype=torch.float32, always_2d=True, frame_offset=0, num_frames=-1)
save(path, src, sample_rate, channels_first=True, subtype=None) with flexible shape handling
info(path) returns AudioInfo dataclass with sample_rate, frames, channels, subtype, format, duration
Full compatibility with torchaudio signatures including partial file loading

Core modules updated (17 files)

speechbrain/inference/* - All inference modules
speechbrain/dataio/{dataio,legacy,preprocess}.py
speechbrain/augment/preparation.py
speechbrain/integrations/k2_fsa/align.py

Recipe files updated (89 files)

All major datasets: LibriMix, VoxCeleb, CommonVoice, LJSpeech, Voicebank, ESC50, etc.
Data preparation and training scripts

Documentation

Updated docs/audioloading.rst to document soundfile as primary backend
Added usage examples and troubleshooting information

Dependencies

Added soundfile>=0.12.1 to requirements.txt
Kept torchaudio dependency (still used for transforms.Resample and functional)

Testing

Added tests/unittests/test_audio_io.py with 12 test cases covering WAV/FLAC roundtrip, metadata retrieval, partial loading, channel ordering, various shapes/dtypes

Example Usage

from speechbrain.dataio import audio_io

# Drop-in replacement for torchaudio.load/save/info
audio, sr = audio_io.load("example.wav", channels_first=True)
audio_io.save("output.wav", audio, sr)
info = audio_io.info("example.wav")  # info.sample_rate, info.frames, info.duration

# Supports partial loading (like torchaudio)
audio, sr = audio_io.load("large.wav", frame_offset=1000, num_frames=5000)

Preserved Functionality

torchaudio.transforms.Resample (unchanged - still used throughout)
torchaudio.functional (unchanged)
All existing APIs and behavior maintained

Original prompt

Replace torchaudio's audio I/O uses (load, save, info) across the repository with a lightweight soundfile-based compatibility wrapper at speechbrain/dataio/audio_io.py.

Scope and goals:

Add a new module at speechbrain/dataio/audio_io.py that implements a minimal compatibility layer for torchaudio I/O using the soundfile (pysoundfile) library and torch:

load(path, *, channels_first=True, dtype=torch.float32, always_2d=False) -> (tensor, sample_rate)

info(path) -> object with sample_rate, frames, channels, subtype, format, duration property

save(path, src, sample_rate, subtype="PCM_16") -> writes audio to disk. The signature should be simple and similar to torchaudio.save(path, src, sample_rate).

list_audio_backends() -> ["soundfile"] (optional but helpful)

Use soundfile for reading/writing. Do not include ffmpeg fallbacks or an AudioEffector in this PR.

Do NOT change or remove torchaudio dependency. Keep torchaudio.transforms.Resample and any other non-I/O torchaudio uses intact.

Replace all call sites of torchaudio.load(), torchaudio.save(), and torchaudio.info() in the repository with calls to the new audio_io (e.g., from speechbrain.dataio import audio_io as audio_io; audio_io.load(...)). Keep the call semantics compatible: audio_io.save(path, src, sample_rate) should accept the same tensor shapes commonly used in the repo and handle batched/mono/stereo shapes gracefully.

Update code examples (docstrings) that call torchaudio.load/save/info to call audio_io equivalents, to avoid misleading docs.

Add soundfile to the repository's top-level requirements (requirements.txt) so CI installs it.

Add unit tests: minimal tests to check load/save roundtrip for WAV and FLAC, and info() returns expected fields. Place tests under tests/unittests/test_audio_io.py.

Files to be added/changed (non-exhaustive; the coding agent should find and replace all occurrences programmatically):

ADD: speechbrain/dataio/audio_io.py (new file)

MODIFY: speechbrain/inference/enhancement.py (replace torchaudio.save(...) with audio_io.save(...))

MODIFY: recipes/CVSS/S2ST/extract_code.py (replace torchaudio.info(...) with audio_io.info(...))

MODIFY: speechbrain/inference/classifiers.py (docstring examples: torchaudio.load -> audio_io.load)

MODIFY: speechbrain/inference/speaker.py (docstring examples: torchaudio.load -> audio_io.load)

MODIFY: docs/audioloading.rst to mention soundfile as supported I/O backend for SpeechBrain I/O (optional but helpful)

MODIFY: requirements.txt (add soundfile)

ADD: tests/unittests/test_audio_io.py with basic roundtrip tests

Implementation details / compatibility notes:

audio_io.load should use soundfile.read(..., dtype='float32', always_2d=True). Convert numpy arrays to torch tensors and return either channels-first (channels, frames) or (frames, channels) depending on channels_first flag. If mono, return (1, frames) when channels_first True to be consistent with torchaudio.

audio_io.save should accept torch.Tensor or numpy arrays; detect shape and convert to (frames, channels) when writing with soundfile.write.

audio_io.info should use soundfile.info and expose a simple dataclass-like object with sample_rate, frames, channels, subtype, format, and duration property.

Make the implementation robust with clear error messages when files are missing or shapes are unsupported.

Tests:

test_audio_io_roundtrip_wav: generate a small sine wave, save with audio_io.save to a temporary file, load back with audio_io.load and assert sample rate and close numeric equality within tolerance.

test_audio_io_info: write a temporary file and assert audio_io.info returns expected sample_rate, frames and channels.

Please create a feature branch, apply the changes, run tests (if possible in CI), and open a PR with a descriptive title and checklist of changes. Include a clear PR description documenting that this is a first pass focusing only on torchaudio.load/save/info substitution and that AudioEffector and other advanced torchaudio I/O features were intentionally omitted.

Do not modify torchaudio.transforms.Resample usages. Keep torchaudio dependency in requirements.

This pull request was created as a result of the following prompt from Copilot chat.

Replace torchaudio's audio I/O uses (load, save, info) across the repository with a lightweight soundfile-based compatibility wrapper at speechbrain/dataio/audio_io.py.

Scope and goals:

Add a new module at speechbrain/dataio/audio_io.py that implements a minimal compatibility layer for torchaudio I/O using the soundfile (pysoundfile) library and torch:

load(path, *, channels_first=True, dtype=torch.float32, always_2d=False) -> (tensor, sample_rate)

info(path) -> object with sample_rate, frames, channels, subtype, format, duration property

save(path, src, sample_rate, subtype="PCM_16") -> writes audio to disk. The signature should be simple and similar to torchaudio.save(path, src, sample_rate).

list_audio_backends() -> ["soundfile"] (optional but helpful)

Use soundfile for reading/writing. Do not include ffmpeg fallbacks or an AudioEffector in this PR.

Do NOT change or remove torchaudio dependency. Keep torchaudio.transforms.Resample and any other non-I/O torchaudio uses intact.

Replace all call sites of torchaudio.load(), torchaudio.save(), and torchaudio.info() in the repository with calls to the new audio_io (e.g., from speechbrain.dataio import audio_io as audio_io; audio_io.load(...)). Keep the call semantics compatible: audio_io.save(path, src, sample_rate) should accept the same tensor shapes commonly used in the repo and handle batched/mono/stereo shapes gracefully.

Update code examples (docstrings) that call torchaudio.load/save/info to call audio_io equivalents, to avoid misleading docs.

Add soundfile to the repository's top-level requirements (requirements.txt) so CI installs it.

Add unit tests: minimal tests to check load/save roundtrip for WAV and FLAC, and info() returns expected fields. Place tests under tests/unittests/test_audio_io.py.

Files to be added/changed (non-exhaustive; the coding agent should find and replace all occurrences programmatically):

ADD: speechbrain/dataio/audio_io.py (new file)

MODIFY: speechbrain/inference/enhancement.py (replace torchaudio.save(...) with audio_io.save(...))

MODIFY: recipes/CVSS/S2ST/extract_code.py (replace torchaudio.info(...) with audio_io.info(...))

MODIFY: speechbrain/inference/classifiers.py (docstring examples: torchaudio.load -> audio_io.load)

MODIFY: speechbrain/inference/speaker.py (docstring examples: torchaudio.load -> audio_io.load)

MODIFY: docs/audioloading.rst to mention soundfile as supported I/O backend for SpeechBrain I/O (optional but helpful)

MODIFY: requirements.txt (add soundfile)

ADD: tests/unittests/test_audio_io.py with basic roundtrip tests

Implementation details / compatibility notes:

audio_io.load should use soundfile.read(..., dtype='float32', always_2d=True). Convert numpy arrays to torch tensors and return either channels-first (channels, frames) or (frames, channels) depending on channels_first flag. If mono, return (1, frames) when channels_first True to be consistent with torchaudio.

audio_io.save should accept torch.Tensor or numpy arrays; detect shape and convert to (frames, channels) when writing with soundfile.write.

audio_io.info should use soundfile.info and expose a simple dataclass-like object with sample_rate, frames, channels, subtype, format, and duration property.

Make the implementation robust with clear error messages when files are missing or shapes are unsupported.

Tests:

test_audio_io_roundtrip_wav: generate a small sine wave, save with audio_io.save to a temporary file, load back with audio_io.load and assert sample rate and close numeric equality within tolerance.

test_audio_io_info: write a temporary file and assert audio_io.info returns expected sample_rate, frames and channels.

Please create a feature branch, apply the changes, run tests (if possible in CI), and open a PR with a descriptive title and checklist of changes. Include a clear PR description documenting that this is a first pass focusing only on torchaudio.load/save/info substitution and that AudioEffector and other advanced torchaudio I/O features were intentionally omitted.

Do not modify torchaudio.transforms.Resample usages. Keep torchaudio dependency in requirements.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

…docstrings Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

pplantinga

Looks quite good, there's a few changes we can make to better match the torchaudio signature, which I hope reduces the chance of errors due to wrong calls.

Also, we need to update the documentation to reflect the change in audio io library, primarily the docs/audioloading.rst file.

pplantinga · 2025-10-31T17:40:02Z

+import torch
+
+
+class AudioInfo:


AudioInfo should be a dataclass

Converted AudioInfo to a dataclass in commit 5b1793f. The class now uses @dataclasses.dataclass decorator and field type annotations, while maintaining the computed properties (num_frames, num_channels, duration) for compatibility.

pplantinga · 2025-10-31T17:44:47Z

+            # Determine if it's (channels, frames) or (frames, channels)
+            # Heuristic: if first dimension is smaller and <= 16, likely channels-first
+            if audio_np.shape[0] <= 16 and audio_np.shape[0] < audio_np.shape[1]:
+                # Convert (channels, frames) to (frames, channels)
+                audio_np = audio_np.T


Although I expect this inference to be correct nearly all the time, let's make this function match the torchaudio.save() signature by adding a channels_first=True argument

Added channels_first=True parameter to save() function in commit 5b1793f. The function now explicitly handles the input shape based on this parameter instead of using heuristics, matching the torchaudio.save() signature.

pplantinga · 2025-10-31T17:51:02Z

+            if channels_first:
+                # Need to add channel dimension: (frames,) -> (1, frames)
+                audio = audio.unsqueeze(0)


The always_2d parameter should take precedence over channels_first, because if the user asks for only a single channel, we shouldn't be adding channels just so that they can go first.

Fixed in commit 5b1793f. The always_2d parameter now takes precedence over channels_first. When always_2d=False and the audio is mono, it returns 1D regardless of channels_first. When always_2d=True, it respects the channels_first flag for determining dimension order.

pplantinga · 2025-10-31T17:52:29Z

+        )
+
+
+def load(path, *, channels_first=True, dtype=torch.float32, always_2d=False, 


The always_2d argument should be True by default to match the torchaudio.load behavior.

Changed always_2d default to True in commit 5b1793f to match torchaudio.load() behavior. Updated the docstring to reflect the new default value.

pplantinga · 2025-10-31T17:59:56Z

+        raise RuntimeError(f"Failed to load audio from {path}: {e}") from e
+
+
+def save(path, src, sample_rate, subtype="PCM_16"):


Let's not override the default subtype (which is None), as soundfile does smart subtyping based on file format.

Changed default subtype from "PCM_16" to None in commit 5b1793f. When None, soundfile automatically chooses an appropriate subtype based on the file format. Updated the save function to handle None by calling sf.write without the subtype parameter.

pplantinga · 2025-10-31T18:03:35Z

+This module provides a minimal compatibility wrapper for audio I/O operations
+using soundfile (pysoundfile) library, replacing torchaudio's load, save, and
+info functions.
+


Can we add an example here at the top of the file using tmpdir fixture that shows all three functions usage?

Added comprehensive usage example at the top of the file in commit 5b1793f, showing all three functions (load, info, save) with proper doctest formatting using the tmpdir fixture.

pplantinga · 2025-10-31T18:09:08Z

@copilot can you please address the comments in the review?

…d default, subtype default, docs Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Copilot · 2025-10-31T18:17:20Z

Looks quite good, there's a few changes we can make to better match the torchaudio signature, which I hope reduces the chance of errors due to wrong calls.

Also, we need to update the documentation...

Updated docs/audioloading.rst in commit 5b1793f to document soundfile as the primary audio I/O backend for SpeechBrain. The documentation now includes:

Introduction explaining the migration from torchaudio to soundfile
Installation and troubleshooting steps for soundfile
SpeechBrain Audio I/O API usage examples
Supported formats and conversion guidance for unsupported formats
Legacy notes about torchaudio backends

pplantinga

Okay I've fixed remaining issues

TParcollet · 2025-11-02T20:26:09Z

@pplantinga should we have a quick meeting with @Adel-Moumen at some point this week to discuss this PR?

Initial plan

0ef7574

Copilot AI assigned Copilot and pplantinga Oct 31, 2025

Copilot started work on behalf of pplantinga October 31, 2025 01:38 View session

Copilot AI and others added 5 commits October 31, 2025 01:46

Add audio_io module with soundfile-based I/O and update requirements

38157e7

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Remove accidental file

175a91f

Replace torchaudio I/O in speechbrain core modules with audio_io

289ca27

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Replace torchaudio I/O in remaining speechbrain modules and CVSS recipe

1c59d48

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Replace all torchaudio I/O calls with audio_io in recipes and update …

c9fd2af

…docstrings Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add soundfile compatibility layer for torchaudio I/O~~ Replace torchaudio I/O with soundfile-based audio_io wrapper Oct 31, 2025

Copilot AI requested a review from pplantinga October 31, 2025 02:04

Copilot finished work on behalf of pplantinga October 31, 2025 02:04

pplantinga requested changes Oct 31, 2025

View reviewed changes

Copilot started work on behalf of pplantinga October 31, 2025 18:09 View session

Address PR review feedback: dataclass, channels_first param, always_2…

5b1793f

…d default, subtype default, docs Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>

Copilot finished work on behalf of pplantinga October 31, 2025 18:18

Copilot AI requested a review from pplantinga October 31, 2025 18:18

pplantinga marked this pull request as ready for review October 31, 2025 18:23

pplantinga added 3 commits November 1, 2025 11:19

Formatting and test fixes to please the robot overlords

2dba68e

Fix import statements in recipes due to audio io change

3a2d98d

Fix errors in audio loading documentation

2e58e78

pplantinga approved these changes Nov 1, 2025

View reviewed changes

pplantinga requested a review from Adel-Moumen November 1, 2025 18:15

Add soundfile dependency to pyproject

ab1c6c1

pplantinga added this to the v1.1.0 milestone Nov 1, 2025

Adel-Moumen requested a review from TParcollet November 2, 2025 14:13

Merge branch 'develop' into copilot/add-soundfile-compatibility-wrapper

70a849c

Merge branch 'develop' into copilot/add-soundfile-compatibility-wrapper

99419e4

Adel-Moumen approved these changes Nov 19, 2025

View reviewed changes

Comment thread pyproject.toml

Comment thread speechbrain/dataio/audio_io.py Outdated

Respect torch default dtype

094b19f

Adel-Moumen merged commit 2427785 into develop Nov 22, 2025
5 checks passed

Adel-Moumen deleted the copilot/add-soundfile-compatibility-wrapper branch November 22, 2025 12:56

Adel-Moumen mentioned this pull request Dec 21, 2025

AttributeError: module 'torchaudio' has no attribute 'list_audio_backends' #3012

Open

seohyunjun mentioned this pull request Jan 10, 2026

#3012 Fix attribute check for torchaudio backend availability in CodecAugme… #3020

Closed

13 tasks

		)


		def load(path, *, channels_first=True, dtype=torch.float32, always_2d=False,

		raise RuntimeError(f"Failed to load audio from {path}: {e}") from e


		def save(path, src, sample_rate, subtype="PCM_16"):

Conversation

Copilot AI commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete ✅

Changes

Example Usage

Preserved Functionality

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga commented Oct 31, 2025

Uh oh!

Copilot AI commented Oct 31, 2025

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

TParcollet commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Oct 31, 2025 •

edited

Loading