Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
1bbaf6d
docs: convert restructuredText sources to MyST markdown
timsaucer Jun 5, 2026
30efd76
docs: fix Apache license header format in converted markdown files
timsaucer Jun 7, 2026
20707b8
docs: migrate from Sphinx+MyST to MkDocs+mkdocstrings
timsaucer Jun 7, 2026
3e54dad
docs: hide notebook setup cells from rendered output
timsaucer Jun 7, 2026
b3dd199
docs: scrollable notebook output, live display() repr, polish
timsaucer Jun 7, 2026
3bde9f7
docs: sweep leftover Sphinx/MyST roles, expand short autoref targets
timsaucer Jun 7, 2026
e2d0d91
docs: polish links, navigation, formatter coverage, admonitions
timsaucer Jun 7, 2026
d5579c3
docs: content cleanup, dead-link fixes, FFI page refresh
timsaucer Jun 8, 2026
168ac00
docs: switch executable code blocks from mkdocs-jupyter to markdown-exec
timsaucer Jun 8, 2026
d6ede7f
docs: centralize markdown-exec setup, fix output capture and truncation
timsaucer Jun 8, 2026
fd738cc
docs: fix duplicate TOC entries and broken user_defined autoref
timsaucer Jun 8, 2026
10af779
docs: rename `Example:` to `Examples:` in formatter docstrings
timsaucer Jun 8, 2026
6c42adc
docs: restructure API reference under datafusion package hierarchy
timsaucer Jun 8, 2026
28874fe
docs: resolve mkdocs build warnings and tighten public API surface
timsaucer Jun 8, 2026
9b28a49
docs: fix unrecognized relative links across user guide and ffi page
timsaucer Jun 8, 2026
2ebd1e5
docs: prefer DataFrame.show() over print() in user-guide examples
timsaucer Jun 8, 2026
03ab28d
Add test to ensure documentation site coverage
timsaucer Jun 8, 2026
0a67654
docs: use sphinx cross-ref roles in docstrings for IDE rendering
timsaucer Jun 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,7 @@ jobs:
enable-cache: true

# Download the Linux wheel built in the previous job.
# Docs only need the abi3 wheel — interpreter doesn't matter for sphinx.
# Docs only need the abi3 wheel — interpreter doesn't matter for mkdocs.
- name: Download pre-built Linux wheel
uses: actions/download-artifact@v8
with:
Expand All @@ -549,12 +549,19 @@ jobs:
fi

- name: Build docs
env:
DISABLE_MKDOCS_2_WARNING: "true"
run: |
set -x
cd docs
# Stage notebook data files at docs_dir root so notebooks can
# resolve relative paths like "pokemon.csv" during execution.
cd docs/source
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
uv run --no-project make html
cd ../..
# Verify every datafusion.__all__ entry is documented.
uv run --no-project python dev/check_api_coverage.py
uv run --no-project mkdocs build

- name: Copy & push the generated HTML
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ target
.idea
/docs/temp
/docs/build
/.cache
.DS_Store
.vscode

Expand Down
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@ Every Python function must include a docstring with usage examples.
When adding or updating an aggregate or window function, ensure the corresponding
site documentation is kept in sync:

- **Aggregations**: `docs/source/user-guide/common-operations/aggregations.rst` —
- **Aggregations**: `docs/source/user-guide/common-operations/aggregations.md` —
add new aggregate functions to the "Aggregate Functions" list and include usage
examples if appropriate.
- **Window functions**: `docs/source/user-guide/common-operations/windows.rst` —
- **Window functions**: `docs/source/user-guide/common-operations/windows.md` —
add new window functions to the "Available Functions" list and include usage
examples if appropriate.
82 changes: 82 additions & 0 deletions dev/check_api_coverage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""Check that every symbol in datafusion.__all__ is documented.

Walks every Markdown file under docs/source/reference/ and collects:

1. The dotted target of every ``::: <dotted.path>`` mkdocstrings directive.
2. Every Markdown heading (``##``, ``###``, etc.).

A ``__all__`` entry is considered documented if its name appears as:

- The leaf of a ``::: <...>`` directive, OR
- The leaf of a ``### name`` heading.

Run from the repo root::

python dev/check_api_coverage.py
"""

from __future__ import annotations

import re
import sys
from pathlib import Path

REPO_ROOT = Path(__file__).resolve().parents[1]
REFERENCE_DIR = REPO_ROOT / "docs" / "source" / "reference"


def collect_documented_names() -> set[str]:
documented: set[str] = set()
directive_re = re.compile(r"^:::\s+([A-Za-z0-9_.]+)")
heading_re = re.compile(r"^#{1,6}\s+([A-Za-z0-9_]+)")
for md in REFERENCE_DIR.rglob("*.md"):
if md.stem != "index":
documented.add(md.stem)
for line in md.read_text().splitlines():
m = directive_re.match(line.strip())
if m:
dotted = m.group(1)
documented.add(dotted.split(".")[-1])
documented.add(dotted)
continue
m = heading_re.match(line)
if m:
documented.add(m.group(1))
return documented


def main() -> int:
sys.path.insert(0, str(REPO_ROOT / "python"))
import datafusion # noqa: PLC0415

documented = collect_documented_names()
missing = sorted(name for name in datafusion.__all__ if name not in documented)
if missing:
print("Undocumented entries in datafusion.__all__:")
for name in missing:
print(f" - {name}")
print(f"\n{len(missing)} symbol(s) missing from docs/source/reference/")
return 1
print(f"All {len(datafusion.__all__)} __all__ entries are documented.")
return 0


if __name__ == "__main__":
raise SystemExit(main())
151 changes: 151 additions & 0 deletions dev/rewrite_doc_roles.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""Rewrite Sphinx / MyST cross-reference roles to Markdown links.

Operates on:
- python/datafusion/*.py docstrings
- docs/source/**/*.md
- docs/source/**/*.ipynb (markdown cells)

Conversions:

:py:class:`~datafusion.x.Y` -> [`Y`][datafusion.x.Y]
:py:func:`~mod.fn` -> [`fn`][mod.fn]
:py:meth:`X.do <X.do>` -> [`X.do`][X.do]
{py:class}`~datafusion.x.Y` -> [`Y`][datafusion.x.Y]
{py:func}`mod.fn` -> [`mod.fn`][mod.fn]
{py:mod}`mod` -> [`mod`][mod]
{code}`text` -> `text`
{doc}`path/to/page` -> [path/to/page](path/to/page.md)
{doc}`Label <path/to/page>` -> [Label](path/to/page.md)
{ref}`anchor` -> [anchor](anchor) (best-effort)
{ref}`Label <anchor>` -> [Label](anchor)
(label)= (alone on a line) -> removed
"""

from __future__ import annotations

import json
import re
import sys
from pathlib import Path

REPO = Path(__file__).resolve().parents[1]

ROLE_PATTERNS = [
# Sphinx RST roles: :py:class:`~mod.Name`, :class:`~mod.Name`, plus
# the `Name <mod.Name>` long form. Both `py:` and bare role names.
(
re.compile(
r":(?:py:)?(?:class|func|meth|mod|attr|obj|data|exc):`~?\.?([\w.]+)`"
),
lambda m: f"[`{m.group(1).split('.')[-1]}`][{m.group(1)}]",
),
(
re.compile(
r":(?:py:)?(?:class|func|meth|mod|attr|obj|data|exc):`([^<`]+)\s*<\.?([\w.]+)>`"
),
lambda m: f"[`{m.group(1).strip()}`][{m.group(2)}]",
),
# MyST roles: {py:class}`~mod.Name` and the bare {class}`~mod.Name` aliases.
(
re.compile(
r"\{(?:py:)?(?:class|func|meth|mod|attr|obj|data|exc)\}`~?\.?([\w.]+)`"
),
lambda m: f"[`{m.group(1).split('.')[-1]}`][{m.group(1)}]",
),
(
re.compile(
r"\{(?:py:)?(?:class|func|meth|mod|attr|obj|data|exc)\}`([^<`]+)\s*<\.?([\w.]+)>`"
),
lambda m: f"[`{m.group(1).strip()}`][{m.group(2)}]",
),
# {code}`text`, {file}`path`, {samp}`text`, {kbd}`keys` -> `text`
(
re.compile(r"\{(?:code|file|samp|kbd)\}`([^`]+)`"),
lambda m: f"`{m.group(1)}`",
),
# {doc}`Label <path>` -> [Label](path.md)
(
re.compile(r"\{doc\}`([^<`]+)\s*<([^>]+)>`"),
lambda m: f"[{m.group(1).strip()}]({m.group(2)}.md)",
),
# {doc}`path` -> [path](path.md)
(re.compile(r"\{doc\}`([^`<]+)`"), lambda m: f"[{m.group(1)}]({m.group(1)}.md)"),
# {ref}`Label <anchor>` -> [Label](anchor)
(
re.compile(r"\{ref\}`([^<`]+)\s*<([^>]+)>`"),
lambda m: f"[{m.group(1).strip()}]({m.group(2)})",
),
# {ref}`anchor` -> [anchor](anchor)
(re.compile(r"\{ref\}`([^`<]+)`"), lambda m: f"[{m.group(1)}]({m.group(1)})"),
]

# Drop standalone (label)= anchor lines (MyST cross-reference targets)
ANCHOR_LINE = re.compile(r"^\([a-zA-Z0-9_-]+\)=\s*$", re.MULTILINE)


def rewrite(text: str) -> str:
for pattern, repl in ROLE_PATTERNS:
text = pattern.sub(repl, text)
return ANCHOR_LINE.sub("", text)


def process_file(path: Path, *, dry_run: bool = False) -> int:
if path.suffix == ".ipynb":
original = path.read_text()
nb = json.loads(original)
changed = False
for cell in nb.get("cells", []):
if cell.get("cell_type") != "markdown":
continue
old = cell["source"]
text = "".join(old) if isinstance(old, list) else old
new = rewrite(text)
if new != text:
cell["source"] = new
changed = True
if changed and not dry_run:
path.write_text(json.dumps(nb, indent=1) + "\n")
return 1 if changed else 0

original = path.read_text()
new = rewrite(original)
if new != original:
if not dry_run:
path.write_text(new)
return 1
return 0


def main() -> int:
dry = "--dry-run" in sys.argv
paths = (
list((REPO / "python" / "datafusion").rglob("*.py"))
+ list((REPO / "docs" / "source").rglob("*.md"))
+ list((REPO / "docs" / "source").rglob("*.ipynb"))
)
changed = 0
for p in paths:
changed += process_file(p, dry_run=dry)
print(f"changed: {changed} files" + (" (dry run)" if dry else ""))
return 0


if __name__ == "__main__":
raise SystemExit(main())
5 changes: 4 additions & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
pokemon.csv
yellow_trip_data.parquet
yellow_tripdata_2021-01.parquet

source/pokemon.csv
source/yellow_trip_data.parquet
source/yellow_tripdata_2021-01.parquet
build/
32 changes: 16 additions & 16 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,24 @@
# specific language governing permissions and limitations
# under the License.

#
# Minimal makefile for Sphinx documentation
#
# Thin wrapper. The mkdocs.yml lives at the repo root; run `mkdocs build`
# from one directory up.

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
MKDOCS ?= mkdocs

.PHONY: help html serve clean

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@echo "Targets:"
@echo " html - build site to docs/build/html"
@echo " serve - serve site at http://localhost:8000"
@echo " clean - remove docs/build/"

html:
cd .. && DISABLE_MKDOCS_2_WARNING=true $(MKDOCS) build --strict

.PHONY: help Makefile
serve:
cd .. && DISABLE_MKDOCS_2_WARNING=true $(MKDOCS) serve

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) --fail-on-warning
clean:
rm -rf build/
59 changes: 59 additions & 0 deletions docs/griffe_extensions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""Griffe extensions for datafusion-python docs.

`SphinxRefsToAutorefs` rewrites sphinx-style cross-reference roles
(``:func:`~path`, :class:`~path``, etc.) inside docstrings into
mkdocstrings autoref syntax (``[`tail`][path]``) so that the same
docstring renders as a clickable cross-reference both in JetBrains-style
IDEs (which understand sphinx roles) and on the published docs site
(which understands mkdocstrings autorefs).
"""

from __future__ import annotations

import re
from typing import Any

from griffe import Extension, Object

_ROLE_RE = re.compile(
r":(?:py:)?(?P<role>func|class|meth|attr|mod|obj|exc|const|data)"
r":`(?P<tilde>~?)(?P<target>[\w.]+)`"
)


def _rewrite(text: str) -> str:
def repl(match: re.Match[str]) -> str:
target = match.group("target")
tail = target.rsplit(".", 1)[-1]
return f"[`{tail}`][{target}]"

return _ROLE_RE.sub(repl, text)


class SphinxRefsToAutorefs(Extension):
"""Convert sphinx-style cross-references into mkdocstrings autorefs."""

def on_object(self, *, obj: Object, **_: Any) -> None:
docstring = obj.docstring
if docstring is None:
return
new = _rewrite(docstring.value)
if new != docstring.value:
docstring.value = new
Loading
Loading