@@ -35,6 +35,16 @@ description is clear and has sufficient instructions to be able to reproduce the
3535For question related to the usage of this library, please post a question on the
3636[ PyTorch forum, under the "data" category] ( https://discuss.pytorch.org/c/data/37 ) .
3737
38+ ## Pull Requests
39+
40+ We actively welcome your pull requests.
41+
42+ 1 . Fork the repo and create your branch from ` main ` .
43+ 2 . If you've added code that should be tested, add tests.
44+ 3 . If you've changed APIs, update the documentation and examples.
45+ 4 . Ensure the test suite passes.
46+ 5 . If you haven't already, complete the Contributor License Agreement ("CLA").
47+
3848## Development installation
3949
4050### Install PyTorch Nightly
@@ -46,66 +56,87 @@ conda install pytorch -c pytorch-nightly
4656# pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
4757```
4858
49- ### Install TorchData
59+ ### Install TorchData and Test Requirements
5060
5161``` bash
5262git clone https://github.com/pytorch/data.git
5363cd data
5464pip install -e .
55- pip install flake8 typing mypy pytest expecttest
65+ pip install -r test/requirements.txt
5666```
5767
58- ## Pull Requests
59-
60- We actively welcome your pull requests.
61-
62- 1 . Fork the repo and create your branch from ` main ` .
63- 2 . If you've added code that should be tested, add tests.
64- 3 . If you've changed APIs, update the documentation and examples.
65- 4 . Ensure the test suite passes.
66- 5 . If you haven't already, complete the Contributor License Agreement ("CLA").
67-
6868### Code style
6969
7070` torchdata ` enforces a fairly strict code format through [ ` pre-commit ` ] ( https://pre-commit.com ) . You can install it with
7171
72- ``` shell
73- pip install pre-commit
72+ ``` bash
73+ conda install -c conda-forge pre-commit
74+ # or pip install pre-commit
75+ cd data
76+ with-proxy conda install pre-commit
77+ pre-commit install --install-hooks
7478```
7579
76- or
80+ ### Running mypy and unit-tests locally
7781
78- ``` shell
79- conda install -c conda-forge pre-commit
82+ Currently we don't run mypy as part of pre-commit hooks
83+
84+ ``` bash
85+ mypy --config-file=mypy.ini
86+ ```
87+
88+ ``` bash
89+ pytest --durations=0 --no-header -v test/nodes/
8090```
8191
82- To check and in most cases fix the code format, stage all your changes (` git add ` ) and run ` pre-commit run ` . To perform
83- the checks automatically before every ` git commit ` , you can install them with ` pre-commit install ` .
84-
85- ### Adding a new DataPipe
86-
87- When adding a new DataPipe, there are few things that need to be done to ensure it is working and documented properly.
88-
89- 1 . Naming - please follow the naming convention as
90- [ described here] ( https://pytorch.org/data/main/dp_tutorial.html#naming ) .
91- 2 . Testing - please add unit tests to ensure that the DataPipe is functioning properly. Here are the
92- [ test requirements] ( https://github.com/pytorch/data/issues/106 ) that we have.
93- - One test that is commonly missed is the serialization test. Please add the new DataPipe to
94- [ ` test_serialization.py ` ] ( https://github.com/pytorch/data/blob/main/test/test_serialization.py ) .
95- - If your test requires interacting with files in the file system (e.g. opening a ` csv ` or ` tar ` file, we prefer
96- those files to be generated during the test (see ` test_local_io.py ` ). If the file is on a remote server, see
97- ` test_remote_io.py ` .
98- 3 . Documentation - ensure that the DataPipe has docstring, usage example, and that it is added to the right category of
99- the right RST file to be rendered.
100- - If your DataPipe has a functional form (i.e. ` @functional_datapipe(...) ` ), include at the
101- [ end of the first sentence] ( https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/combining.py#L25 )
102- of your docstring. This will make sure it correctly shows up in the
103- [ summary table] ( https://pytorch.org/data/main/torchdata.datapipes.iter.html#archive-datapipes ) of our
104- documentation.
105- 4 . Import - import the DataPipe in the correct ` __init__.py ` file.
106- 5 . Interface - if the DataPipe has a functional form, make sure that is generated properly by ` gen_pyi.py ` into the
107- relevant interface file.
108- - You can re-generate the pyi files by re-running ` pip install -e . ` , then you can examine the new outputs.
92+ ### Adding a new Node
93+
94+ When adding a new Node, there are few things that need to be done to ensure it is working and documented properly.
95+
96+ The following simplifying assumptions are made of node implementations:
97+
98+ - state is managed solely by the BaseNode, not through any iterators returned from them.
99+ - state_dict() returns the state of the most recently requested iterator.
100+ - load_state_dict() will set the state for the next iterator.
101+
102+ 1 . Functionality - Nodes must subclass BaseNode and implement the required methods.
103+
104+ - ` .iterator(self, initial_state: Optional[Dict[str, Any]]) ` - return a new iterator/generator that is properly
105+ initialized with the optional initial_state
106+ - ` .get_state(self) -> Dict[str, Any] ` - return a dictionary representing the state of the most recently returned
107+ iterator, or if not yet requested, the initial state.
108+ - ensure you're calling ` state_dict()/load_state_dict() ` on ancestor BaseNodes. Here is a simple example of a pretty
109+ useless node:
110+
111+ ``` python
112+ class MyNode (BaseNode[T]):
113+ def __init__ (self , parent : BaseNode[T]):
114+ self .parent = parent
115+ self .idx = 0 # not very useful state
116+
117+ def iterator (self , initial_state : Optional[Dict[str , Any]]) -> Iterator[T]
118+ if initial_state is not None :
119+ self .parent.load_state_dict(initial_state[" parent" ])
120+ self .idx = initial_state[" idx" ]
121+
122+ for item in self .parent:
123+ self .idx += 1
124+ yield item
125+
126+ def get_state (self ) -> Dict[str , Any]:
127+ return {
128+ " parent" : self .parent.state_dict(), # note we call state_dict() and not get_state() here
129+ " idx" : self .idx,
130+ }
131+ ```
132+
133+ 2 . Typing - Include type-hints for all public functions and methods
134+ 3 . Testing - please add unit tests to ensure that the Node is functioning properly.
135+ - In addition to testing basic functionatity, state management must also be tested.
136+ - For basic state testing, you may use ` test.nodes.utils.run_test_save_load_state ` . See ` test/nodes/test_batch.py `
137+ for an example.
138+ 4 . Documentation - ensure that the Node has a docstring, and a usage example.
139+ 5 . Import - import the Node in the correct ` __init__.py ` file.
109140
110141## Contributor License Agreement ("CLA")
111142
0 commit comments