Replace torchaudio I/O with soundfile-based audio_io wrapper#2989
Replace torchaudio I/O with soundfile-based audio_io wrapper#2989Adel-Moumen merged 14 commits intodevelopfrom
Conversation
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
…docstrings Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
pplantinga
left a comment
There was a problem hiding this comment.
Looks quite good, there's a few changes we can make to better match the torchaudio signature, which I hope reduces the chance of errors due to wrong calls.
Also, we need to update the documentation to reflect the change in audio io library, primarily the docs/audioloading.rst file.
| import torch | ||
|
|
||
|
|
||
| class AudioInfo: |
There was a problem hiding this comment.
AudioInfo should be a dataclass
There was a problem hiding this comment.
Converted AudioInfo to a dataclass in commit 5b1793f. The class now uses @dataclasses.dataclass decorator and field type annotations, while maintaining the computed properties (num_frames, num_channels, duration) for compatibility.
| # Determine if it's (channels, frames) or (frames, channels) | ||
| # Heuristic: if first dimension is smaller and <= 16, likely channels-first | ||
| if audio_np.shape[0] <= 16 and audio_np.shape[0] < audio_np.shape[1]: | ||
| # Convert (channels, frames) to (frames, channels) | ||
| audio_np = audio_np.T |
There was a problem hiding this comment.
Although I expect this inference to be correct nearly all the time, let's make this function match the torchaudio.save() signature by adding a channels_first=True argument
There was a problem hiding this comment.
Added channels_first=True parameter to save() function in commit 5b1793f. The function now explicitly handles the input shape based on this parameter instead of using heuristics, matching the torchaudio.save() signature.
| if channels_first: | ||
| # Need to add channel dimension: (frames,) -> (1, frames) | ||
| audio = audio.unsqueeze(0) |
There was a problem hiding this comment.
The always_2d parameter should take precedence over channels_first, because if the user asks for only a single channel, we shouldn't be adding channels just so that they can go first.
There was a problem hiding this comment.
Fixed in commit 5b1793f. The always_2d parameter now takes precedence over channels_first. When always_2d=False and the audio is mono, it returns 1D regardless of channels_first. When always_2d=True, it respects the channels_first flag for determining dimension order.
| ) | ||
|
|
||
|
|
||
| def load(path, *, channels_first=True, dtype=torch.float32, always_2d=False, |
There was a problem hiding this comment.
The always_2d argument should be True by default to match the torchaudio.load behavior.
There was a problem hiding this comment.
Changed always_2d default to True in commit 5b1793f to match torchaudio.load() behavior. Updated the docstring to reflect the new default value.
| raise RuntimeError(f"Failed to load audio from {path}: {e}") from e | ||
|
|
||
|
|
||
| def save(path, src, sample_rate, subtype="PCM_16"): |
There was a problem hiding this comment.
Let's not override the default subtype (which is None), as soundfile does smart subtyping based on file format.
There was a problem hiding this comment.
Changed default subtype from "PCM_16" to None in commit 5b1793f. When None, soundfile automatically chooses an appropriate subtype based on the file format. Updated the save function to handle None by calling sf.write without the subtype parameter.
| This module provides a minimal compatibility wrapper for audio I/O operations | ||
| using soundfile (pysoundfile) library, replacing torchaudio's load, save, and | ||
| info functions. | ||
|
|
There was a problem hiding this comment.
Can we add an example here at the top of the file using tmpdir fixture that shows all three functions usage?
There was a problem hiding this comment.
Added comprehensive usage example at the top of the file in commit 5b1793f, showing all three functions (load, info, save) with proper doctest formatting using the tmpdir fixture.
|
@copilot can you please address the comments in the review? |
…d default, subtype default, docs Co-authored-by: pplantinga <4369106+pplantinga@users.noreply.github.com>
Updated docs/audioloading.rst in commit 5b1793f to document soundfile as the primary audio I/O backend for SpeechBrain. The documentation now includes:
|
pplantinga
left a comment
There was a problem hiding this comment.
Okay I've fixed remaining issues
|
@pplantinga should we have a quick meeting with @Adel-Moumen at some point this week to discuss this PR? |
Replace torchaudio's audio I/O uses (load, save, info) with a soundfile-based compatibility wrapper
This PR successfully replaces all torchaudio.{load,save,info} calls (~228 occurrences) across the repository with a new soundfile-based compatibility layer, decoupling audio I/O from torchaudio's ML transforms while maintaining identical API behavior.
Implementation Complete ✅
Changes
New module:
speechbrain/dataio/audio_io.pyload(path, *, channels_first=True, dtype=torch.float32, always_2d=True, frame_offset=0, num_frames=-1)save(path, src, sample_rate, channels_first=True, subtype=None)with flexible shape handlinginfo(path)returns AudioInfo dataclass with sample_rate, frames, channels, subtype, format, durationCore modules updated (17 files)
Recipe files updated (89 files)
Documentation
Dependencies
soundfile>=0.12.1to requirements.txtTesting
tests/unittests/test_audio_io.pywith 12 test cases covering WAV/FLAC roundtrip, metadata retrieval, partial loading, channel ordering, various shapes/dtypesExample Usage
Preserved Functionality
Original prompt
This pull request was created as a result of the following prompt from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.