Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Commit 3d6453e

Browse files
authored
Docs for pandas.read_csv() (#726)
* Add Input/output section * Add docstring for read_csv with unsupported parameters * Add explanation for inferencing and examples * Add explanation of resulting DataFrame * Fix style
1 parent 5c9aed9 commit 3d6453e

File tree

4 files changed

+99
-0
lines changed

4 files changed

+99
-0
lines changed
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _api_ref.pandas.io:
2+
.. include:: ./../ext_links.txt
3+
4+
Pandas Input/output
5+
===================
6+
.. currentmodule:: pandas
7+
8+
This is basic `Pandas*`_ input/output functions.
9+
10+
Flat file
11+
---------
12+
13+
.. sdc_toctree
14+
read_csv

docs/source/apireference.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ API Reference
66
.. toctree::
77
:maxdepth: 2
88

9+
Input/output <./_api_ref/api_ref.pandas.io.rst>
910
Series: Columnar Data Structure <./_api_ref/api_ref.pandas.series.rst>
1011
Dataframe: Tabular Data Structure <./_api_ref/api_ref.pandas.dataframe.rst>
1112
Window <./_api_ref/api_ref.pandas.window.rst>

docs/source/buildscripts/apiref_generator.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535

3636

3737
APIREF_TEMPLATE_FNAMES = [
38+
'./_templates/_api_ref.pandas.io_templ.rst',
3839
'./_templates/_api_ref.pandas.series_templ.rst',
3940
'./_templates/_api_ref.pandas.dataframe_templ.rst',
4041
'./_templates/_api_ref.pandas.window_templ.rst',

sdc/datatypes/hpat_pandas_functions.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,3 +97,86 @@ def sdc_pandas_read_csv(fname, sep=',', delimiter=None, skiprows=0):
9797
csv_reader_py = _gen_csv_reader_py_pyarrow_py_func(func_text, func_name)
9898

9999
return csv_reader_py
100+
101+
102+
sdc_pandas_read_csv.__doc__ = r"""
103+
Intel Scalable Dataframe Compiler User Guide
104+
********************************************
105+
106+
Pandas API: pandas.read_csv
107+
108+
Limitations
109+
-----------
110+
- Parameters \
111+
``header``, \
112+
``index_col``, \
113+
``squeeze``, \
114+
``prefix``, \
115+
``mangle_dupe_cols``, \
116+
``engine``, \
117+
``converters``, \
118+
``true_values``, \
119+
``false_values``, \
120+
``skipinitialspace``, \
121+
``skipfooter``, \
122+
``nrows``, \
123+
``na_values``, \
124+
``keep_default_na``, \
125+
``na_filter``, \
126+
``verbose``, \
127+
``skip_blank_lines``, \
128+
``parse_dates``, \
129+
``infer_datetime_format``, \
130+
``keep_date_col``, \
131+
``date_parser``, \
132+
``dayfirst``, \
133+
``cache_dates``, \
134+
``iterator``, \
135+
``chunksize``, \
136+
``compression``, \
137+
``thousands``, \
138+
``decimal``, \
139+
``lineterminator``, \
140+
``quotechar``, \
141+
``quoting``, \
142+
``doublequote``, \
143+
``escapechar``, \
144+
``comment``, \
145+
``encoding``, \
146+
``dialect``, \
147+
``error_bad_lines``, \
148+
``warn_bad_lines``, \
149+
``delim_whitespace``, \
150+
``low_memory``, \
151+
``memory_map`` and \
152+
``float_precision`` \
153+
are currently unsupported by Intel Scalable Dataframe Compiler.
154+
- Resulting DataFrame type could be inferred from constant file name of from parameters. \
155+
``filepath_or_buffer`` could be constant for inferencing from file. \
156+
``filepath_or_buffer`` could be variable for inferencing from parameters if ``dtype`` is constant. \
157+
If both ``filepath_or_buffer`` and ``dtype`` are constants then default is inferencing from parameters.
158+
- For inferring from parameters ``names`` or ``usecols`` should be provided additionally to ``dtype``.
159+
- For inferring from file ``sep``, ``delimiter`` and ``skiprows`` should be constants or omitted.
160+
- ``names`` and ``usecols`` should be constants or omitted for both types of inferrencing.
161+
- ``usecols`` with list of ints is unsupported by Intel Scalable Dataframe Compiler.
162+
163+
Examples
164+
--------
165+
Inference from file. File name is constant. \
166+
Resulting DataFrame depends on CSV file content at the moment of compilation.
167+
168+
>>> pd.read_csv('data.csv') # doctest: +SKIP
169+
170+
Inference from file. File name, ``names``, ``usecols``, ``delimiter`` and ``skiprow`` are constants. \
171+
Resulting DataFrame contains one column ``A`` \
172+
with type of column depending on CSV file content at the moment of compilation.
173+
174+
>>> pd.read_csv('data.csv', names=['A','B'], usecols=['A'], delimiter=';', skiprows=2) # doctest: +SKIP
175+
176+
Inference from parameters. File name, ``delimiter`` and ``skiprow`` are variables. \
177+
``names``, ``usecols`` and ``dtype`` are constants. \
178+
Resulting DataFrame contains column ``A`` with type ``np.float64``.
179+
180+
>>> pd.read_csv(file_name, names=['A','B'], usecols=['A'], dtype={'A': np.float64}, \
181+
delimiter=some_char, skiprows=some_int) # doctest: +SKIP
182+
"""

0 commit comments

Comments
 (0)