The SpeechBrain Toolkit

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.

SpeechBrain is currently under development.

Basics

In the following sections, the basic functionalities of SpeechBrain are described.

License

SpeechBrain is licensed under the Apache License v2.0 (i.e., the same as the popular Kaldi toolkit).

Development requirements

pip install -r requirements.txt
pip install --editable .

Test Installation

Please, run the following script to make sure your installation is working:

pytest tests
pytest --doctest-modules speechbrain

Folder Structure

The current version of Speechbrain has the following folder/file organization:

speechbrain: The core library
recipes: Experiment scripts and configurations
samples: Some toy data for debugging and testing
tools: Additional, runnable utility script
tests: Unittests and integration tests

How to run an experiment

In SpeechBrain experiments can be run from anywhere, but the experimental results/ directory will be created relative to the directory you are in. The most common pattern for running experiments is as follows:

> cd recipes/<dataset>/<task>/
> python experiment.py params.yaml

At the top of the experiment.py file, the function sb.core.create_experiment_directory() is called to create an output directory (by default: <cwd>/results/). Both detailed logs and experiment output are saved there. Furthermore, less detailed logs are output to stdout. The experiment script and configuration (including possible command-line overrides) are also copied to the output directory.

Also have a look at the YAML files in recipe directories. The YAML files specify the hyperparameters of the recipes. The syntax is explained in speechbrain.utils.data_utils in the docstring of load_extended_yaml.

A quick look at the extended YAML features, using an example:

seed: !PLACEHOLDER
output_dir: !ref results/vgg_blstm/<seed>
save_dir: !ref <output_dir>/save
data_folder: !PLACEHOLDER # e.g. /path/to/TIMIT

model: !new:speechbrain.lobes.models.CRDNN.CRDNN
    output_size: 40 # 39 phonemes + 1 blank symbol
    cnn_blocks: 2
    dnn_blocks: 2

!new:speechbrain.lobes.models.CRDNN.CRDNN creates a CRDNN instance from the module speechbrain.lobes.models.CRDNN
The indented keywords (output_size etc.) after it are passed as keyword arguments.
!ref <output_dir>/save evaluates the part in angle brackets, referencing the YAML itself.
!PLACEHOLDER simply errors out when loaded; it should be replaced by every user either by editing the yaml, or with an override (passed to load_extended_yaml).

For more details on yaml and our extensions, see speechbrain/YAML.md

Tensor format

All the tensors within SpeechBrain are formatted using the following convention:

tensor=(batch, time_steps, channels[optional])

The batch is always the first element, and time_steps is always the second one. The rest of the dimensions are as many channels as you need.

Why do we need all tensors to have the same format? It is crucial to have a shared format for all the classes that process data and all the processing functions must be designed considering it. In SpeechBrain we might have pipelines of modules and if each module was based on different tensor formats, exchanging data between processing units would have been painful. Many formats are possible. For SpeechBrain we selected this one because it is commonly used with recurrent layers, which are common in speech applications.

The format is very flexible and allows users to read different types of data. As we have seen, for single-channel raw waveform signals, the tensor will be tensor=(batch, time_steps), while for multi-channel raw waveform it will be tensor=(batch, time_steps, n_channel). Beyond waveforms, this format is used for any tensor in the computation pipeline. For instance, fbank features that are formatted in this way:

(batch, time_step, n_filters)

The Short-Time Fourier Transform (STFT) tensor, instead, will be:

(batch, time_step, n_fft, 2)

where the "2" is because STFT is based on complex numbers with a real and imaginary part. We can also read multi-channel SFT data, that will be formatted in this way:

(batch, time_step, n_fft, 2, n_audio_channels)

Developer Guidelines

The goal is to write a set of libraries that process audio and speech in several different ways. The goal is to build a set of homogeneous libraries that are all compliant with the guidelines described in the following sub-sections.

GitHub

Our development strategy is as follows:

Clone the main speechbrain repository (no fork necessary). SSH example: git clone git@github.com:speechbrain/speechbrain
Create a branch for specific feature you are developing. git checkout -b your-branch-name
Make + commit changes. Do not commit to master.
Push branch to github. git push --set-upstream origin your-branch-name
Navigate to github, and create a pull request from your branch to master.
A reviewer will be assigned to your PR to (hopefully quickly) review.
When reviewer is satisfied that the code improves repository quality, they can merge.
Reviewer should delete the source branch in the origin. You can do this in your local copy too, after first making sure master is up-to-date (so git doesn't complain that your branch changes aren't in master): git checkout master git pull git branch -d your-branch-name

Note that CI tests will be run when you create a PR. If you want to be sure that your code will not fail these tests, we have set up pre-commit hooks that you can install:

> pip install pre-commit
> pre-commit install
> pre-commit install --hook-type pre-push --config .pre-push-config.yaml

These will automatically check the code when you commit and when you push.

Python

Version

SpeechBrain targets Python >= 3.7.

Formatting

To settle code formatting, SpeechBrain adopts the black code formatter. Before submitting pull requests, please run the black formatter on your code.

In addition, we use flake8 to test code style. Black as a tool does not enforce everything that flake8 tests.

You can run the formatter with: black <file-or-directory>. Similarly the flake8 tests can be run with flake8 <file-or-directory>.

Adding dependencies

In general, we strive to have as few dependencies as possible. However, we will debate dependencies on a case-by-case basis. We value easy installability via pip.

In case the dependency is only needed for a specific recipe or specific niche module, we suggest the extra tools pattern: don't add the dependency to general requirements, but check for installation and instruct to if the dependant code is run.

Testing

We are adopting unit tests using pytest. Run unit tests with pytest tests

Additionally we have runnable doctests, though primarily these serve as examples of the documented code. Run doctests with pytest --doctest-modules <file-or-directory>

Documentation

In SpeechBrain, we plan to provide documentation at different levels:

Docstrings: For each class/function in the repository, there should a header that properly describes its functionality, inputs, and outputs. It is also crucial to provide an example that shows how it can be used as a stand-alone function. We use Numpy-style docstrings. Consistent docstring style enables automatic API documentation. Also note the automatic doctests (see here.
Comments: We encourage developers to write self-documenting code, and use comments only where the implementation is surprising (to a Python-literate audience) and where the implemented algorithm needs clarification.

In addition we have plans for:

Website documentation. In the SpeechBrain website, we will put detailed documentation where we put both the written tutorials and descriptions of all the functionalities of the toolkit.
The SpeechBrain book: Similarly to HTK (an old HMM-based speech toolkit developed by Cambridge) we plan to have a book that summarized the functionalities of speechbrain. The book will be mainly based on the website documentation, but also summarizing everything in a book, make it simpler to cite us.
Video tutorial: For each important topic (e.g, speech recognition, speaker recognition, speech enhancement) we plan to have some video tutorials.

Development tools

flake8

A bit like pycodestyle: make sure the codestyle is according to guidelines.
Compatible with black, in fact current flake8 config directly taken from black
Code compliance can be tested simply with: flake8 <file-or-directory>
You can bypass flake8 for a line with # noqa: <QA-CODE> E.G. # noqa: E731 to allow lambda assignment

pre-commit

Python tool which takes a configuration file (.pre-commit-config.yaml) and installs the git commit hooks specified in it.
Git commit hooks are local so all who want to use them need to install them separately. This is done by: pre-commit install
The tool can also install pre-push hooks. This is done separately with: pre-commit install --hook-type pre-push --config .pre-push-config.yaml

the git pre-commit hooks

Automatically run black
Automatically fix trailing whitespace, end of file, sort requirements.txt
Check that no large (>512kb) files are added by accident
Automatically run flake8
NOTE: If the hooks fix something (e.g. trailing whitespace or reformat with black), these changes are not automatically added and committed. You’ll have to add the fixed files again, and run the commit again. I guess this is a safeguard: don’t blindly accept changes from git hooks.
NOTE2: The hooks are only run on the files you git added to the commit. This is in contrast to the CI pipeline, which always tests everything.

the git pre-push hooks

Black and flake8 as checks on the whole repo
Unit-tests and doctests run on the whole repo
These hooks can only be run in the full environment, so if you install these, you’ll need to e.g. activate virtualenv before pushing.

pytest doctests

This is not an additional dependency, but just that doctests are now run with pytest. Use: pytest --doctest-modules <file-or-directory>
Thus you may use some pytest features in docstring examples. Most notably IMO: tmpdir = getfixture('tmpdir') which makes a temp dir and gives you a path to it, without needing a with tempfile.TemporaryDirectory() as tmpdir:

Continuous integration

What is CI

loose term for a tight merge schedule
typically assisted by automated testing and code review tools + practices

CI / CD Pipelines

GitHub Actions (and also available as third-party solution) feature, which automatically runs basically anything in reaction to git events.
The CI pipeline is triggered by pull requests.
Runs in a Ubuntu environment provided by GitHub
GitHub offers a limited amount of CI pipeline minutes for free.
CD would stand for continuous deployment, though we’re not doing that yet

Our test suite

Code linters are run. This means black and flake8. These are run on everything in speechbrain (the library directory), everything in recipes and everything in tests.
Note that black will only error out if it would change a file here, but won’t reformat anything at this stage. You’ll have to run black on your code and push a new commit. The black commit hook helps avoid these errors.
All unit-tests and doctests are run. You can check that these pass by running them yourself before pushing, with pytest tests and pytest --doctest-modules speechbrain
Integration tests (minimal examples). The minimal examples serve both to illustrate basic tasks and experiment running, but also as integration tests for the toolkit. For this purpose, any file which is prefixed with example_ gets collected by pytest, and we add a short test_ function at the end of the minimal examples.
Currently, these are not run: docstring format tests (this should be added once the docstring conversion is done).
If all tests pass, the whole pipeline takes a couple of minutes.

Pull Request review guide

This is not a comprehensive code review guide, but some rough guidelines to unify the general review practices across this project.

Firstly, let the review take some time. Try to read every line that was added, if possible. Read the surrounding context of the code if needed to understand the changes introduced. Possibly ask for clarifications if you don't understand. If the pull request changes are hard to understand, maybe that that's a sign that the code is not clear enough yet. However, don't nitpick every detail.

Secondly, focus on the major things first, and only then move on to smaller, things. Level of importance:

Immediate deal breakers (code does wrong thing, or feature shouldn't be added etc.)
Things to fix before merging (Add more documentation, reduce complexity, etc.)
More subjective things which could be changed if the author also agrees with you.

Thirdly, approve the pull request only once you believe the changes "improve overall code health" as attested to here. However, this also means the pull request does not have to be perfect. Some features are best implemented incrementally over many pull requests, and you should be more concerned with making sure that the changes introduced lend themselves to painless further improvements.

Fourthly, use the tools that GitHub has: comment on specific code lines, suggest edits, and once everyone involved has agreed that the PR is ready to merge, merge the request and delete the feature branch.

Fifthly, though code review is a place for professional constructive criticism, a nice strategy to show (and validate) that you understand what the PR is really doing, is to provide some affirmative comments on its strengths.

Zen of Speechbrain

SpeechBrain could be used for research, academic, commercial, non-commercial purposes. Ideally, the code should have the following features:

Simple: the code must be easy to understand even by students or by users that are not professional programmers or speech researchers. Try to design your code such that it can be easily read. Given alternatives with the same level of performance, code the simplest one. (the most explicit and straightforward manner is preferred)
Readable: SpeechBrain mostly adopts the code style conventions in PEP8. The code written by the users must be compliant with that. We test codestyle with flake8
Efficient: The code should be as efficient as possible. When possible, users should maximize the use of pytorch native operations. Remember that in generally very convenient to process in parallel multiple signals rather than processing them one by one (e.g try to use batch_size > 1 when possible). Test the code carefully with your favorite profiler (e.g, torch.utils.bottleneck https://pytorch.org/docs/stable/bottleneck.html ) to make sure there are no bottlenecks if your code. Since we are not working in c++ directly, performance can be an issue. Despite that, our goal is to make SpeechBrain as fast as possible.
modular: Write your code such that is is very modular and fits well with the other functionalities of the toolkit. The idea is to develop a bunch of models that can be naturally interconnected with each other to implement complex modules.
well documented: Given the goals of SpeechBrain, writing a rich a good documentation is a crucial step. Many existing toolkits are not well documented, and we have to succeed in that to make the difference. This aspect will be better described in the following sub-section.

Name		Name	Last commit message	Last commit date
Latest commit History 2,245 Commits
.github/workflows		.github/workflows
docs		docs
recipes		recipes
samples		samples
speechbrain		speechbrain
tests		tests
tools		tools
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-push-config.yaml		.pre-push-config.yaml
.yamllint.yaml		.yamllint.yaml
README.md		README.md
conftest.py		conftest.py
lint-requirements.txt		lint-requirements.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

The SpeechBrain Toolkit

Table of Contents

Basics

License

Development requirements

Test Installation

Folder Structure

How to run an experiment

Tensor format

Developer Guidelines

GitHub

Python

Version

Formatting

Adding dependencies

Testing

Documentation

Development tools

flake8

pre-commit

the git pre-commit hooks

the git pre-push hooks

pytest doctests

Continuous integration

What is CI

CI / CD Pipelines

Our test suite

Pull Request review guide

Zen of Speechbrain

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages