Skip to content

Commit 8636054

Browse files
authored
Update PR Template and Contributors guide for nodes (meta-pytorch#1355)
1 parent 137ac45 commit 8636054

File tree

2 files changed

+77
-45
lines changed

2 files changed

+77
-45
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
22
creating your pull request.
33

4-
- Note that there is a section on requirements related to adding a new DataPipe.
4+
- If you are adding a new node, ensure you read that section in the contribution guide, as it includes requirements for
5+
functionality and testing.
56

67
Fixes #{issue number}
78

CONTRIBUTING.md

Lines changed: 75 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,16 @@ description is clear and has sufficient instructions to be able to reproduce the
3535
For question related to the usage of this library, please post a question on the
3636
[PyTorch forum, under the "data" category](https://discuss.pytorch.org/c/data/37).
3737

38+
## Pull Requests
39+
40+
We actively welcome your pull requests.
41+
42+
1. Fork the repo and create your branch from `main`.
43+
2. If you've added code that should be tested, add tests.
44+
3. If you've changed APIs, update the documentation and examples.
45+
4. Ensure the test suite passes.
46+
5. If you haven't already, complete the Contributor License Agreement ("CLA").
47+
3848
## Development installation
3949

4050
### Install PyTorch Nightly
@@ -46,66 +56,87 @@ conda install pytorch -c pytorch-nightly
4656
# pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
4757
```
4858

49-
### Install TorchData
59+
### Install TorchData and Test Requirements
5060

5161
```bash
5262
git clone https://github.com/pytorch/data.git
5363
cd data
5464
pip install -e .
55-
pip install flake8 typing mypy pytest expecttest
65+
pip install -r test/requirements.txt
5666
```
5767

58-
## Pull Requests
59-
60-
We actively welcome your pull requests.
61-
62-
1. Fork the repo and create your branch from `main`.
63-
2. If you've added code that should be tested, add tests.
64-
3. If you've changed APIs, update the documentation and examples.
65-
4. Ensure the test suite passes.
66-
5. If you haven't already, complete the Contributor License Agreement ("CLA").
67-
6868
### Code style
6969

7070
`torchdata` enforces a fairly strict code format through [`pre-commit`](https://pre-commit.com). You can install it with
7171

72-
```shell
73-
pip install pre-commit
72+
```bash
73+
conda install -c conda-forge pre-commit
74+
# or pip install pre-commit
75+
cd data
76+
with-proxy conda install pre-commit
77+
pre-commit install --install-hooks
7478
```
7579

76-
or
80+
### Running mypy and unit-tests locally
7781

78-
```shell
79-
conda install -c conda-forge pre-commit
82+
Currently we don't run mypy as part of pre-commit hooks
83+
84+
```bash
85+
mypy --config-file=mypy.ini
86+
```
87+
88+
```bash
89+
pytest --durations=0 --no-header -v test/nodes/
8090
```
8191

82-
To check and in most cases fix the code format, stage all your changes (`git add`) and run `pre-commit run`. To perform
83-
the checks automatically before every `git commit`, you can install them with `pre-commit install`.
84-
85-
### Adding a new DataPipe
86-
87-
When adding a new DataPipe, there are few things that need to be done to ensure it is working and documented properly.
88-
89-
1. Naming - please follow the naming convention as
90-
[described here](https://pytorch.org/data/main/dp_tutorial.html#naming).
91-
2. Testing - please add unit tests to ensure that the DataPipe is functioning properly. Here are the
92-
[test requirements](https://github.com/pytorch/data/issues/106) that we have.
93-
- One test that is commonly missed is the serialization test. Please add the new DataPipe to
94-
[`test_serialization.py`](https://github.com/pytorch/data/blob/main/test/test_serialization.py).
95-
- If your test requires interacting with files in the file system (e.g. opening a `csv` or `tar` file, we prefer
96-
those files to be generated during the test (see `test_local_io.py`). If the file is on a remote server, see
97-
`test_remote_io.py`.
98-
3. Documentation - ensure that the DataPipe has docstring, usage example, and that it is added to the right category of
99-
the right RST file to be rendered.
100-
- If your DataPipe has a functional form (i.e. `@functional_datapipe(...)`), include at the
101-
[end of the first sentence](https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/combining.py#L25)
102-
of your docstring. This will make sure it correctly shows up in the
103-
[summary table](https://pytorch.org/data/main/torchdata.datapipes.iter.html#archive-datapipes) of our
104-
documentation.
105-
4. Import - import the DataPipe in the correct `__init__.py` file.
106-
5. Interface - if the DataPipe has a functional form, make sure that is generated properly by `gen_pyi.py` into the
107-
relevant interface file.
108-
- You can re-generate the pyi files by re-running `pip install -e .`, then you can examine the new outputs.
92+
### Adding a new Node
93+
94+
When adding a new Node, there are few things that need to be done to ensure it is working and documented properly.
95+
96+
The following simplifying assumptions are made of node implementations:
97+
98+
- state is managed solely by the BaseNode, not through any iterators returned from them.
99+
- state_dict() returns the state of the most recently requested iterator.
100+
- load_state_dict() will set the state for the next iterator.
101+
102+
1. Functionality - Nodes must subclass BaseNode and implement the required methods.
103+
104+
- `.iterator(self, initial_state: Optional[Dict[str, Any]])` - return a new iterator/generator that is properly
105+
initialized with the optional initial_state
106+
- `.get_state(self) -> Dict[str, Any]` - return a dictionary representing the state of the most recently returned
107+
iterator, or if not yet requested, the initial state.
108+
- ensure you're calling `state_dict()/load_state_dict()` on ancestor BaseNodes. Here is a simple example of a pretty
109+
useless node:
110+
111+
```python
112+
class MyNode(BaseNode[T]):
113+
def __init__(self, parent: BaseNode[T]):
114+
self.parent = parent
115+
self.idx = 0 # not very useful state
116+
117+
def iterator(self, initial_state: Optional[Dict[str, Any]]) -> Iterator[T]
118+
if initial_state is not None:
119+
self.parent.load_state_dict(initial_state["parent"])
120+
self.idx = initial_state["idx"]
121+
122+
for item in self.parent:
123+
self.idx += 1
124+
yield item
125+
126+
def get_state(self) -> Dict[str, Any]:
127+
return {
128+
"parent": self.parent.state_dict(), # note we call state_dict() and not get_state() here
129+
"idx": self.idx,
130+
}
131+
```
132+
133+
2. Typing - Include type-hints for all public functions and methods
134+
3. Testing - please add unit tests to ensure that the Node is functioning properly.
135+
- In addition to testing basic functionatity, state management must also be tested.
136+
- For basic state testing, you may use `test.nodes.utils.run_test_save_load_state`. See `test/nodes/test_batch.py`
137+
for an example.
138+
4. Documentation - ensure that the Node has a docstring, and a usage example.
139+
5. Import - import the Node in the correct `__init__.py` file.
109140

110141
## Contributor License Agreement ("CLA")
111142

0 commit comments

Comments
 (0)