You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,8 @@ The `docs/tutorials` directory exclusively contains tutorials in Jupyter Noteboo
36
36
- It's OK if the user has to run the notebook to get some of the heavier outputs.
37
37
- Preferably use Jupyter Notebook for final editing of your notebook.
38
38
- Jupyter Notebook tends to have somewhat sane `.ipynb` output. This avoids Git diffs from being excessively large.
39
+
-**Images can be put in the `docs/tutorials/assets` directory,** rather than embedded as base64. You can then refer to them in Markdown like ``. These will work correctly when imported on Colab.
This is not a comprehensive code review guide, but some rough guidelines to unify the general review practices across this project.
4
+
5
+
Firstly, let the review take some time. Try to read every line that was added,
6
+
if possible. Try also to run some tests. Read the surrounding context of the code if needed to understand
7
+
the changes introduced. Possibly ask for clarifications if you don't understand.
8
+
If the pull request changes are hard to understand, maybe that's a sign that
9
+
the code is not clear enough yet. However, don't nitpick every detail.
10
+
11
+
Secondly, focus on the major things first, and only then move on to smaller,
12
+
things. Level of importance:
13
+
- Immediate deal breakers (code does the wrong thing, or feature shouldn't be added etc.)
14
+
- Things to fix before merging (Add more documentation, reduce complexity, etc.)
15
+
- More subjective things could be changed if the author also agrees with you.
16
+
17
+
Thirdly, approve the pull request only once you believe the changes "improve overall code health" as attested to [here](https://google.github.io/eng-practices/review/reviewer/standard.html).
18
+
However, this also means the pull request does not have to be perfect. Some features are best implemented incrementally over many pull requests, and you should be more concerned with making sure that the changes introduced lend themselves to painless further improvements.
19
+
20
+
Fourthly, use the tools that GitHub has: comment on specific code lines, suggest edits, and once everyone involved has agreed that the PR is ready to merge, merge the request and delete the feature branch.
21
+
22
+
Fifthly, the code review is a place for professional constructive criticism,
23
+
a nice strategy to show (and validate) that you understand what the PR is really
24
+
doing is to provide some affirmative comments on its strengths.
The goal is to write a set of libraries that process audio and speech in several ways. It is crucial to write a set of homogeneous libraries that are all compliant with the guidelines described in the following sub-sections.
4
-
5
3
## Zen of Speechbrain
6
-
SpeechBrain could be used for *research*, *academic*, *commercial*, *non-commercial* purposes. Ideally, the code should have the following features:
7
-
8
-
-**Simple:** the code must be easy to understand even by students or by users that are not professional programmers or speech researchers. Try to design your code such that it can be easily read. Given alternatives with the same level of performance, code the simplest one. (the most explicit and straightforward manner is preferred)
4
+
SpeechBrain is used for *research*, *academic*, *commercial*, *non-commercial* purposes, thus the code should be:
9
5
10
-
-**Readable:**SpeechBrain mostly adopts the code style conventions in PEP8. The code written by the users must be compliant with that. We test code style with `flake8`
6
+
-**Simple:**Straightforward and easy to understand even by students, academics and non-professional programmers. Complex code, when it _must_ exist, should be especially well explained.
11
7
12
-
-**Efficient**: The code should be as efficient as possible. When possible, users should maximize the use of pytorch native operations. Remember that in generally very convenient to process in parallel multiple signals rather than processing them one by one (e.g try to use *batch_size > 1* when possible). Test the code carefully with your favorite profiler (e.g, torch.utils.bottleneck https://pytorch.org/docs/stable/bottleneck.html ) to make sure there are no bottlenecks in your code. Since we are not working in *c++* directly, the speed can be an issue. Despite that, our goal is to make SpeechBrain as fast as possible.
8
+
-**Readable:** Avoid abstract naming. Link to resources and references to help understand complex topics or implementations. Code style and formatting are automatically enforced.
13
9
14
-
-**Modular:** Write your code such that it is very modular and fits well with the other functionalities of the toolkit. The idea is to develop a bunch of models that can be naturally interconnected with each other.
10
+
-**Efficient**: Not _everything_ must be fast, but for what _should_ be, [profile and optimize it](https://speechbrain.readthedocs.io/en/develop/tutorials/advanced/profiling-and-benchmark.html). Operate on batches. Prefer tensor operations over Python-heavy constructs. Avoid CPU/GPU syncs.
15
11
16
-
-**Well documented:** Given the goals of SpeechBrain, writing rich and good documentation is a crucial step.
12
+
-**Modular:**It should be easy to use any of the functionality from the toolkit. Break up functions/classes when it helps. Group functionality logically. Avoid unnecessary coupling.
17
13
18
-
## How to get your code in SpeechBrain
14
+
-**Well documented:** Docs should be complete, easy to navigate and easy to discover. Consider [writing a tutorial](https://github.com/speechbrain/speechbrain/tree/develop/docs#tutorial-integration).
19
15
20
-
Practically, development goes as follows:
16
+
## Creating Pull Requests on GitHub
21
17
22
18
0. We use git and GitHub.
23
19
1. Fork the speechbrain repository (https://github.com/speechbrain/speechbrain)
@@ -48,41 +44,45 @@ See the section on pre-commit.
48
44
49
45
These will automatically check the code when you commit and when you push.
50
46
51
-
## Python
52
-
### Version
53
-
SpeechBrain targets Python >= 3.7.
47
+
## Important code guidelines
48
+
49
+
We target a specific range of supported Python versions, which are tested via CI.
50
+
51
+
### Formatting & linting
52
+
53
+
Use `pre-commit run -a` to run formatting and linting, using tools like `black`
54
+
and `flake8` under the hood (see [`.pre-commit-config.yaml`](../.pre-commit-config.yaml)).
55
+
Some passes automatically fix your code, and some may require your intervention.
54
56
55
-
### Formatting
56
-
To settle code formatting, SpeechBrain adopts the [black](https://black.readthedocs.io/en/stable/) code formatter. Before submitting pull requests, please run the black formatter on your code.
57
+
These checks are run and enforced on the CI.
57
58
58
-
In addition, we use [flake8](https://flake8.pycqa.org/en/latest/) to test code
59
-
style. Black as a tool does not enforce everything that flake8 tests.
59
+
### Running tests
60
+
61
+
We use [pytest](https://docs.pytest.org/en/latest/contents.html). Run unit tests
62
+
with `pytest tests`
63
+
64
+
Additionally, we have runnable doctests, though primarily these serve as
65
+
examples of the documented code. Run doctests with
66
+
`pytest --doctest-modules <file-or-directory>`
60
67
61
-
You can run the formatter with: `black <file-or-directory>`. Similarly the
62
-
flake8 tests can be run with `flake8 <file-or-directory>`.
68
+
These checks are run and enforced on the CI.
63
69
64
70
### Adding dependencies
71
+
65
72
In general, we strive to have as few dependencies as possible. However, we will
66
73
debate dependencies on a case-by-case basis. We value easy installability via
67
74
pip.
68
75
69
76
In case the dependency is only needed for a specific recipe or specific niche
70
77
module, we suggest the extra tools pattern: don't add the dependency to general
71
-
requirements, but add it in the extra-requirement.txt file of the specific recipe.
requirements, but add it in the `extra-requirements.txt` file of that specific
79
+
recipe.
77
80
78
-
Additionally, we have runnable doctests, though primarily these serve as
79
-
examples of the documented code. Run doctests with
80
-
`pytest --doctest-modules <file-or-directory>`
81
+
## Important documentation guidelines
81
82
82
-
## Documentation
83
83
In SpeechBrain, we plan to provide documentation at different levels:
84
84
85
-
-**Docstrings**: For each class/function in the repository, there should be a header that properly describes its functionality, inputs, and outputs. It is also crucial to provide an example that shows how it can be used as a stand-alone function. We use [Numpy-style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html) docstrings. Consistent docstring style enables automatic API documentation. Also note the automatic doctests (see [here](#testing).
85
+
-**Docstrings**: For each class/function in the repository, there should be a header that properly describes its functionality, inputs, and outputs. It is also crucial to provide an example that shows how it can be used as a stand-alone function. We use [Numpy-style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html) docstrings. Consistent docstring style enables automatic API documentation. Also note the automatic doctests (see [here](#testing)).
86
86
87
87
-**Comments**: We encourage developers to write self-documenting code, and use
88
88
proper comments where the implementation is surprising (to a Python-literate audience)
@@ -92,126 +92,13 @@ and where the implemented algorithm needs clarification.
92
92
93
93
-**Tutorials**: Tutorials are a good way to familiarize yourself with SpeechBrain with interactive codes and explanations.
94
94
95
-
## Development tools
96
-
97
-
### flake8
98
-
- A bit like pycodestyle: make sure the codestyle is according to guidelines.
99
-
- Compatible with black, in fact, current flake8 config directly taken from black
100
-
- Code compliance can be tested simply with: `flake8 <file-or-directory>`
101
-
- You can bypass flake8 for a line with `# noqa: <QA-CODE> E.G. # noqa: E731 to allow lambda assignment`
102
-
103
-
### pre-commit
104
-
- Python tool which takes a configuration file (.pre-commit-config.yaml) and installs the git commit hooks specified in it.
105
-
- Git commit hooks are local so all who want to use them need to install them separately. This is done by: `pre-commit install`
106
-
- The tool can also install pre-push hooks. This is done separately with: `pre-commit install --hook-type pre-push --config .pre-push-config.yaml`
107
-
108
-
### the git pre-commit hooks
109
-
- Automatically run black
110
-
- Automatically fix trailing whitespace, end of file, sort requirements.txt
111
-
- Check that no large (>512kb) files are added by accident
112
-
- Automatically run flake8
113
-
- Automatically run cspell
114
-
- NOTE: If the hooks fix something (e.g. trailing whitespace or reformat with black), these changes are not automatically added and committed. You’ll have to add the fixed files again and run the commit again. I guess this is a safeguard: don’t blindly accept changes from git hooks.
115
-
- NOTE2: The hooks are only run on the files you git added to the commit. This is in contrast to the CI pipeline, which always tests everything.
116
-
- NOTE3: If a word is flagged as a spelling error but it should be kept, you can add the word to `.dict-speechbrain.txt`
117
-
118
-
### the git pre-push hooks
119
-
- Black and flake8 as checks on the whole repo
120
-
- Unit-tests and doctests run on the whole repo
121
-
- These hooks can only be run in the full environment, so if you install these, you’ll need to e.g. activate virtualenv before pushing.
122
-
123
-
### pytest doctests
124
-
- This is not an additional dependency, but just that doctests are now run with pytest. Use: `pytest --doctest-modules <file-or-directory>`
125
-
- Thus you may use some pytest features in docstring examples. Most notably IMO: `tmpdir = getfixture('tmpdir')` which makes a temp dir and gives you a path to it, without needing a `with tempfile.TemporaryDirectory() as tmpdir:`
126
-
127
-
## Continuous integration
128
-
129
-
### What is CI?
130
-
- loose term for a tight merge schedule
131
-
- typically assisted by automated testing and code review tools + practices
132
-
133
-
### CI / CD Pipelines
134
-
- GitHub Actions (and also available as a third-party solution) feature, which automatically runs basically anything in reaction to git events.
135
-
- The CI pipeline is triggered by pull requests.
136
-
- Runs in a Ubuntu environment provided by GitHub
137
-
- GitHub offers a limited amount of CI pipeline minutes for free.
138
-
- CD stands for continuous deployment, check out the "Releasing a new version" section.
139
-
140
-
### Our test suite
141
-
- Code linters are run. This means black and flake8. These are run on everything in speechbrain (the library directory), everything in recipes and everything in tests.
142
-
- Note that black will only error out if it would change a file here, but won’t reformat anything at this stage. You’ll have to run black on your code and push a new commit. The black commit hook helps avoid these errors.
143
-
- All unit-tests and doctests are run. You can check that these pass by running them yourself before pushing, with `pytest tests` and `pytest --doctest-modules speechbrain`
144
-
- Integration tests (minimal examples). The minimal examples serve both to
145
-
illustrate basic tasks and experiment running, but also as integration tests
146
-
for the toolkit. For this purpose, any file which is prefixed with
147
-
`example_` gets collected by pytest, and we add a short `test_` function at
148
-
the end of the minimal examples.
149
-
- Currently, these are not run: docstring format tests (this should be added once the docstring conversion is done).
150
-
- If all tests pass, the whole pipeline takes a couple of minutes.
151
-
152
-
## Pull Request review guide
153
-
154
-
This is not a comprehensive code review guide, but some rough guidelines to unify the general review practices across this project.
155
-
156
-
Firstly, let the review take some time. Try to read every line that was added,
157
-
if possible. Try also to run some tests. Read the surrounding context of the code if needed to understand
158
-
the changes introduced. Possibly ask for clarifications if you don't understand.
159
-
If the pull request changes are hard to understand, maybe that's a sign that
160
-
the code is not clear enough yet. However, don't nitpick every detail.
161
-
162
-
Secondly, focus on the major things first, and only then move on to smaller,
163
-
things. Level of importance:
164
-
- Immediate deal breakers (code does the wrong thing, or feature shouldn't be added etc.)
165
-
- Things to fix before merging (Add more documentation, reduce complexity, etc.)
166
-
- More subjective things could be changed if the author also agrees with you.
167
-
168
-
Thirdly, approve the pull request only once you believe the changes "improve overall code health" as attested to [here](https://google.github.io/eng-practices/review/reviewer/standard.html).
169
-
However, this also means the pull request does not have to be perfect. Some features are best implemented incrementally over many pull requests, and you should be more concerned with making sure that the changes introduced lend themselves to painless further improvements.
170
-
171
-
Fourthly, use the tools that GitHub has: comment on specific code lines, suggest edits, and once everyone involved has agreed that the PR is ready to merge, merge the request and delete the feature branch.
172
-
173
-
Fifthly, the code review is a place for professional constructive criticism,
174
-
a nice strategy to show (and validate) that you understand what the PR is really
175
-
doing is to provide some affirmative comments on its strengths.
176
-
177
-
## Releasing a new version
178
-
179
-
Here are a few guidelines for when and how to release a new version.
180
-
To begin with, as hinted in the "Continuous Integration" section, we would like to follow a
181
-
pretty tight release schedule, known as "Continuous Deployment". For us, this means a new
182
-
version should be released roughly once a week.
183
-
184
-
As for how to name the released version, we try to follow semantic versioning for this. More details
185
-
can be found at [semver.org](http://semver.org). As it applies to SpeechBrain, some examples
186
-
of what this would likely mean:
187
-
* Changes to the Brain class or other core elements often warrant a major version bump (e.g. 1.5.3 -> 2.0.0)
188
-
* Added classes or features warrant a minor version bump. Most weekly updates should fall into this.
189
-
* Patch version bumps should happen only for bug fixes.
190
-
191
-
When releasing a new version, there are a few user-initiated action that need to occur.
192
-
1. On the `develop` branch, update `speechbrain/version.txt` to say the new version:
193
-
X.Y.Z
194
-
2. Merge the `develop` branch into the `main` branch:
195
-
git checkout main
196
-
git merge develop
197
-
3. Push the `main` branch to github:
198
-
git push
199
-
4. Tag the `main` branch with the new version:
200
-
git tag vX.Y.Z
201
-
5. Push the new tag to github:
202
-
git push --tags
203
-
204
-
This kicks off an automatic action that creates a draft release with release notes.
205
-
Review the notes to make sure they make sense and remove commits that aren't important.
206
-
You can then publish the release to make it public.
207
-
Publishing a new release kicks off a series of automatic tools, listed below:
208
-
209
-
* The `main` branch is checked out and used for building a python package.
210
-
* The built package is uploaded to PyPI and the release is published there.
211
-
* Read the Docs uses Webhooks to get notified when a new version is published.
212
-
Read the Docs then builds the documentation and publishes the new version.
213
-
214
-
Maintainers of relevant accounts:
215
-
* Mirco Ravanelli maintains the GitHub and PyPI accounts
216
-
* Titouan Parcollet maintains the website at [speechbrain.github.io](speechbrain.github.io)
217
-
as well as accounts at Read the Docs and Discourse
95
+
96
+
## Additional reading
97
+
98
+
-[Development tools](devtools.md)
99
+
-[What testing coverage approaches are needed?](coverage.md)
0 commit comments