Skip to content

Replace torchaudio I/O with soundfile-based audio_io wrapper#2989

Merged
Adel-Moumen merged 14 commits intodevelopfrom
copilot/add-soundfile-compatibility-wrapper
Nov 22, 2025
Merged

Replace torchaudio I/O with soundfile-based audio_io wrapper#2989
Adel-Moumen merged 14 commits intodevelopfrom
copilot/add-soundfile-compatibility-wrapper

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 31, 2025

Replace torchaudio's audio I/O uses (load, save, info) with a soundfile-based compatibility wrapper

This PR successfully replaces all torchaudio.{load,save,info} calls (~228 occurrences) across the repository with a new soundfile-based compatibility layer, decoupling audio I/O from torchaudio's ML transforms while maintaining identical API behavior.

Implementation Complete ✅

  • Create speechbrain/dataio/audio_io.py with soundfile-based implementations
  • Add soundfile>=0.12.1 to requirements.txt
  • Replace all torchaudio.load() calls (~85 occurrences)
  • Replace all torchaudio.save() calls (~86 occurrences)
  • Replace all torchaudio.info() calls (~57 occurrences)
  • Update docstring examples in inference modules
  • Add tests/unittests/test_audio_io.py
  • Address PR review feedback:
    • Convert AudioInfo to dataclass
    • Add channels_first parameter to save()
    • Fix always_2d precedence over channels_first in load()
    • Change always_2d default to True (matching torchaudio)
    • Change subtype default to None (smart subtyping)
    • Add usage example at top of audio_io.py
    • Update docs/audioloading.rst to reflect soundfile backend

Changes

New module: speechbrain/dataio/audio_io.py

  • load(path, *, channels_first=True, dtype=torch.float32, always_2d=True, frame_offset=0, num_frames=-1)
  • save(path, src, sample_rate, channels_first=True, subtype=None) with flexible shape handling
  • info(path) returns AudioInfo dataclass with sample_rate, frames, channels, subtype, format, duration
  • Full compatibility with torchaudio signatures including partial file loading

Core modules updated (17 files)

  • speechbrain/inference/* - All inference modules
  • speechbrain/dataio/{dataio,legacy,preprocess}.py
  • speechbrain/augment/preparation.py
  • speechbrain/integrations/k2_fsa/align.py

Recipe files updated (89 files)

  • All major datasets: LibriMix, VoxCeleb, CommonVoice, LJSpeech, Voicebank, ESC50, etc.
  • Data preparation and training scripts

Documentation

  • Updated docs/audioloading.rst to document soundfile as primary backend
  • Added usage examples and troubleshooting information

Dependencies

  • Added soundfile>=0.12.1 to requirements.txt
  • Kept torchaudio dependency (still used for transforms.Resample and functional)

Testing

  • Added tests/unittests/test_audio_io.py with 12 test cases covering WAV/FLAC roundtrip, metadata retrieval, partial loading, channel ordering, various shapes/dtypes

Example Usage

from speechbrain.dataio import audio_io

# Drop-in replacement for torchaudio.load/save/info
audio, sr = audio_io.load("example.wav", channels_first=True)
audio_io.save("output.wav", audio, sr)
info = audio_io.info("example.wav")  # info.sample_rate, info.frames, info.duration

# Supports partial loading (like torchaudio)
audio, sr = audio_io.load("large.wav", frame_offset=1000, num_frames=5000)

Preserved Functionality

  • torchaudio.transforms.Resample (unchanged - still used throughout)
  • torchaudio.functional (unchanged)
  • All existing APIs and behavior maintained
Original prompt

Replace torchaudio's audio I/O uses (load, save, info) across the repository with a lightweight soundfile-based compatibility wrapper at speechbrain/dataio/audio_io.py.

Scope and goals:

  • Add a new module at speechbrain/dataio/audio_io.py that implements a minimal compatibility layer for torchaudio I/O using the soundfile (pysoundfile) library and torch:
    • load(path, *, channels_first=True, dtype=torch.float32, always_2d=False) -> (tensor, sample_rate)
    • info(path) -> object with sample_rate, frames, channels, subtype, format, duration property
    • save(path, src, sample_rate, subtype="PCM_16") -> writes audio to disk. The signature should be simple and similar to torchaudio.save(path, src, sample_rate).
    • list_audio_backends() -> ["soundfile"] (optional but helpful)
  • Use soundfile for reading/writing. Do not include ffmpeg fallbacks or an AudioEffector in this PR.
  • Do NOT change or remove torchaudio dependency. Keep torchaudio.transforms.Resample and any other non-I/O torchaudio uses intact.
  • Replace all call sites of torchaudio.load(), torchaudio.save(), and torchaudio.info() in the repository with calls to the new audio_io (e.g., from speechbrain.dataio import audio_io as audio_io; audio_io.load(...)). Keep the call semantics compatible: audio_io.save(path, src, sample_rate) should accept the same tensor shapes commonly used in the repo and handle batched/mono/stereo shapes gracefully.
  • Update code examples (docstrings) that call torchaudio.load/save/info to call audio_io equivalents, to avoid misleading docs.
  • Add soundfile to the repository's top-level requirements (requirements.txt) so CI installs it.
  • Add unit tests: minimal tests to check load/save roundtrip for WAV and FLAC, and info() returns expected fields. Place tests under tests/unittests/test_audio_io.py.

Files to be added/changed (non-exhaustive; the coding agent should find and replace all occurrences programmatically):

  • ADD: speechbrain/dataio/audio_io.py (new file)
  • MODIFY: speechbrain/inference/enhancement.py (replace torchaudio.save(...) with audio_io.save(...))
  • MODIFY: recipes/CVSS/S2ST/extract_code.py (replace torchaudio.info(...) with audio_io.info(...))
  • MODIFY: speechbrain/inference/classifiers.py (docstring examples: torchaudio.load -> audio_io.load)
  • MODIFY: speechbrain/inference/speaker.py (docstring examples: torchaudio.load -> audio_io.load)
  • MODIFY: docs/audioloading.rst to mention soundfile as supported I/O backend for SpeechBrain I/O (optional but helpful)
  • MODIFY: requirements.txt (add soundfile)
  • ADD: tests/unittests/test_audio_io.py with basic roundtrip tests

Implementation details / compatibility notes:

  • audio_io.load should use soundfile.read(..., dtype='float32', always_2d=True). Convert numpy arrays to torch tensors and return either channels-first (channels, frames) or (frames, channels) depending on channels_first flag. If mono, return (1, frames) when channels_first True to be consistent with torchaudio.
  • audio_io.save should accept torch.Tensor or numpy arrays; detect shape and convert to (frames, channels) when writing with soundfile.write.
  • audio_io.info should use soundfile.info and expose a simple dataclass-like object with sample_rate, frames, channels, subtype, format, and duration property.
  • Make the implementation robust with clear error messages when files are missing or shapes are unsupported.

Tests:

  • test_audio_io_roundtrip_wav: generate a small sine wave, save with audio_io.save to a temporary file, load back with audio_io.load and assert sample rate and close numeric equality within tolerance.
  • test_audio_io_info: write a temporary file and assert audio_io.info returns expected sample_rate, frames and channels.

Please create a feature branch, apply the changes, run tests (if possible in CI), and open a PR with a descriptive title and checklist of changes. Include a clear PR description documenting that this is a first pass focusing only on torchaudio.load/save/info substitution and that AudioEffector and other advanced torchaudio I/O features were intentionally omitted.

Do not modify torchaudio.transforms.Resample usages. Keep torchaudio dependency in requirements.

This pull request was created as a result of the following prompt from Copilot chat.

Replace torchaudio's audio I/O uses (load, save, info) across the repository with a lightweight soundfile-based compatibility wrapper at speechbrain/dataio/audio_io.py.

Scope and goals:

  • Add a new module at speechbrain/dataio/audio_io.py that implements a minimal compatibility layer for torchaudio I/O using the soundfile (pysoundfile) library and torch:
    • load(path, *, channels_first=True, dtype=torch.float32, always_2d=False) -> (tensor, sample_rate)
    • info(path) -> object with sample_rate, frames, channels, subtype, format, duration property
    • save(path, src, sample_rate, subtype="PCM_16") -> writes audio to disk. The signature should be simple and similar to torchaudio.save(path, src, sample_rate).
    • list_audio_backends() -> ["soundfile"] (optional but helpful)
  • Use soundfile for reading/writing. Do not include ffmpeg fallbacks or an AudioEffector in this PR.
  • Do NOT change or remove torchaudio dependency. Keep torchaudio.transforms.Resample and any other non-I/O torchaudio uses intact.
  • Replace all call sites of torchaudio.load(), torchaudio.save(), and torchaudio.info() in the repository with calls to the new audio_io (e.g., from speechbrain.dataio import audio_io as audio_io; audio_io.load(...)). Keep the call semantics compatible: audio_io.save(path, src, sample_rate) should accept the same tensor shapes commonly used in the repo and handle batched/mono/stereo shapes gracefully.
  • Update code examples (docstrings) that call torchaudio.load/save/info to call audio_io equivalents, to avoid misleading docs.
  • Add soundfile to the repository's top-level requirements (requirements.txt) so CI installs it.
  • Add unit tests: minimal tests to check load/save roundtrip for WAV and FLAC, and info() returns expected fields. Place tests under tests/unittests/test_audio_io.py.

Files to be added/changed (non-exhaustive; the coding agent should find and replace all occurrences programmatically):

  • ADD: speechbrain/dataio/audio_io.py (new file)
  • MODIFY: speechbrain/inference/enhancement.py (replace torchaudio.save(...) with audio_io.save(...))
  • MODIFY: recipes/CVSS/S2ST/extract_code.py (replace torchaudio.info(...) with audio_io.info(...))
  • MODIFY: speechbrain/inference/classifiers.py (docstring examples: torchaudio.load -> audio_io.load)
  • MODIFY: speechbrain/inference/speaker.py (docstring examples: torchaudio.load -> audio_io.load)
  • MODIFY: docs/audioloading.rst to mention soundfile as supported I/O backend for SpeechBrain I/O (optional but helpful)
  • MODIFY: requirements.txt (add soundfile)
  • ADD: tests/unittests/test_audio_io.py with basic roundtrip tests

Implementation details / compatibility notes:

  • audio_io.load should use soundfile.read(..., dtype='float32', always_2d=True). Convert numpy arrays to torch tensors and return either channels-first (channels, frames) or (frames, channels) depending on channels_first flag. If mono, return (1, frames) when channels_first True to be consistent with torchaudio.
  • audio_io.save should accept torch.Tensor or numpy arrays; detect shape and convert to (frames, channels) when writing with soundfile.write.
  • audio_io.info should use soundfile.info and expose a simple dataclass-like object with sample_rate, frames, channels, subtype, format, and duration property.
  • Make the implementation robust with clear error messages when files are missing or shapes are unsupported.

Tests:

  • test_audio_io_roundtrip_wav: generate a small sine wave, save with audio_io.save to a temporary file, load back with audio_io.load and assert sample rate and close numeric equality within tolerance.
  • test_audio_io_info: write a temporary file and assert audio_io.info returns expected sample_rate, frames and channels.

Please create a feature branch, apply the changes, run tests (if possible in CI), and open a PR with a descriptive title and checklist of changes. Include a clear PR description documenting that this is a first pass focusing only on torchaudio.load/save/info substitution and that AudioEffector and other advanced torchaudio I/O features were intentionally omitted.

Do not modify torchaudio.transforms.Resample usages. Keep torchaudio dependency in requirements.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits October 31, 2025 01:46
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
…docstrings

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Copilot AI changed the title [WIP] Add soundfile compatibility layer for torchaudio I/O Replace torchaudio I/O with soundfile-based audio_io wrapper Oct 31, 2025
Copilot AI requested a review from pplantinga October 31, 2025 02:04
Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good, there's a few changes we can make to better match the torchaudio signature, which I hope reduces the chance of errors due to wrong calls.

Also, we need to update the documentation to reflect the change in audio io library, primarily the docs/audioloading.rst file.

import torch


class AudioInfo:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AudioInfo should be a dataclass

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted AudioInfo to a dataclass in commit 5b1793f. The class now uses @dataclasses.dataclass decorator and field type annotations, while maintaining the computed properties (num_frames, num_channels, duration) for compatibility.

Comment thread speechbrain/dataio/audio_io.py Outdated
Comment on lines +169 to +173
# Determine if it's (channels, frames) or (frames, channels)
# Heuristic: if first dimension is smaller and <= 16, likely channels-first
if audio_np.shape[0] <= 16 and audio_np.shape[0] < audio_np.shape[1]:
# Convert (channels, frames) to (frames, channels)
audio_np = audio_np.T
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I expect this inference to be correct nearly all the time, let's make this function match the torchaudio.save() signature by adding a channels_first=True argument

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added channels_first=True parameter to save() function in commit 5b1793f. The function now explicitly handles the input shape based on this parameter instead of using heuristics, matching the torchaudio.save() signature.

Comment thread speechbrain/dataio/audio_io.py Outdated
Comment on lines +113 to +115
if channels_first:
# Need to add channel dimension: (frames,) -> (1, frames)
audio = audio.unsqueeze(0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The always_2d parameter should take precedence over channels_first, because if the user asks for only a single channel, we shouldn't be adding channels just so that they can go first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 5b1793f. The always_2d parameter now takes precedence over channels_first. When always_2d=False and the audio is mono, it returns 1D regardless of channels_first. When always_2d=True, it respects the channels_first flag for determining dimension order.

Comment thread speechbrain/dataio/audio_io.py Outdated
)


def load(path, *, channels_first=True, dtype=torch.float32, always_2d=False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The always_2d argument should be True by default to match the torchaudio.load behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed always_2d default to True in commit 5b1793f to match torchaudio.load() behavior. Updated the docstring to reflect the new default value.

Comment thread speechbrain/dataio/audio_io.py Outdated
raise RuntimeError(f"Failed to load audio from {path}: {e}") from e


def save(path, src, sample_rate, subtype="PCM_16"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not override the default subtype (which is None), as soundfile does smart subtyping based on file format.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed default subtype from "PCM_16" to None in commit 5b1793f. When None, soundfile automatically chooses an appropriate subtype based on the file format. Updated the save function to handle None by calling sf.write without the subtype parameter.

This module provides a minimal compatibility wrapper for audio I/O operations
using soundfile (pysoundfile) library, replacing torchaudio's load, save, and
info functions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example here at the top of the file using tmpdir fixture that shows all three functions usage?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comprehensive usage example at the top of the file in commit 5b1793f, showing all three functions (load, info, save) with proper doctest formatting using the tmpdir fixture.

@pplantinga
Copy link
Copy Markdown
Collaborator

@copilot can you please address the comments in the review?

…d default, subtype default, docs

Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Oct 31, 2025

Looks quite good, there's a few changes we can make to better match the torchaudio signature, which I hope reduces the chance of errors due to wrong calls.

Also, we need to update the documentation...

Updated docs/audioloading.rst in commit 5b1793f to document soundfile as the primary audio I/O backend for SpeechBrain. The documentation now includes:

  • Introduction explaining the migration from torchaudio to soundfile
  • Installation and troubleshooting steps for soundfile
  • SpeechBrain Audio I/O API usage examples
  • Supported formats and conversion guidance for unsupported formats
  • Legacy notes about torchaudio backends

Copilot AI requested a review from pplantinga October 31, 2025 18:18
@pplantinga pplantinga marked this pull request as ready for review October 31, 2025 18:23
Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've fixed remaining issues

@pplantinga pplantinga added this to the v1.1.0 milestone Nov 1, 2025
@TParcollet
Copy link
Copy Markdown
Collaborator

@pplantinga should we have a quick meeting with @Adel-Moumen at some point this week to discuss this PR?

Comment thread pyproject.toml
Comment thread speechbrain/dataio/audio_io.py Outdated
@Adel-Moumen Adel-Moumen merged commit 2427785 into develop Nov 22, 2025
5 checks passed
@Adel-Moumen Adel-Moumen deleted the copilot/add-soundfile-compatibility-wrapper branch November 22, 2025 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants