docs: convert reStructuredText sources to MyST markdown#1579
Draft
timsaucer wants to merge 2 commits into
Draft
Conversation
a400ec1 to
67c2761
Compare
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:
- 33 .rst files become 33 .md files (user guide, contributor guide,
index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
but every `{ref}` target in the corpus still uses the underscore
form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
Python docstrings that AutoAPI pulls in. Rewrite the anchors back
to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
directives, which have no first-class MyST equivalent. They render
identically and don't block the build.
conf.py changes:
- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
remains: sphinx-autoapi generates RST under autoapi/ at build time
and Sphinx needs the suffix registered to parse it.
AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.
Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RST-to-MD conversion emitted MyST `%` comment syntax with blank line between each header line, which renders as visible text. Replace with canonical `<!--- ... -->` HTML comment block matching upstream apache/datafusion and this repo's existing markdown files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
026b9e5 to
30efd76
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Phase 2 of the documentation-site refresh started in #1578. With the
modern pydata-sphinx-theme + navigation in place, this PR moves the
content format off
.rstand onto MyST.md. The motivation:on GitHub and modern docs parse Markdown reliably; reStructuredText
is a minority dialect that frequently confuses both humans editing
via PR review and agents reading the source. The Apache
datafusion-cometsibling project completed the same migrationrecently and reported smoother contributor onboarding.
Sphinx features we actually use (toctrees, cross-references,
code-blocks, admonitions, eval-rst escape hatch).
myst-parserextension is already in the docs dependencygroup and was loaded by
conf.pyeven before this PR — switchingthe on-disk format is a low-risk, mechanical change.
This PR stacks on #1578 (theme + navbar refresh). It should land
after #1578.
What changes are included in this PR?
Format conversion (mechanical, via
rst-to-myst):.rstfiles underdocs/source/become 33.mdfiles — the user guide, contributor guide, IO subsection,common-operations subsection, dataframe subsection, top-level
index, andlinks.and license headers all round-trip cleanly.
Manual fixes layered on top of the converter output:
(label)=anchor (e.g.(io-csv)=), but every{ref}in thecorpus — including the Python docstrings that
sphinx-autoapipulls into the API reference — still uses the underscore form
(
{ref}\CSV <io_csv>`). Rewrite the anchors back to underscore form ((io_csv)=,(window_functions)=,(user_guide_concepts)=,(execution_metrics)=`, etc.) so existing references resolvewithout churning every callsite.
colon_fenceanddeflistinmyst_enable_extensions(the converter emits these on a fewfiles, notably
dataframe/execution-metrics.md).source_suffix. Keep.rstregistered even though nohuman-authored RST remains:
sphinx-autoapigenerates.rstunder
autoapi/at build time and Sphinx needs the suffix toparse it. The comment in
conf.pyflags this so a future cleanuppass doesn't strip it again.
86
{eval-rst}blocks remain in the converted output. Every one ofthem wraps a
.. ipython::directive, which has no first-class MySTequivalent in our extensions setup. The blocks render identically
and don't block the build. Migrating these to a native MyST exec
syntax is a follow-up that requires either
myst-nbor a customparser registration — out of scope here.
AGENTS.mdis updated so the two.rstpaths called out under"Aggregate and Window Function Documentation" point at the new
.mdequivalents.
Are there any user-facing changes?
No behavioral change to the
datafusionpackage — only the sourceformat of the published documentation. Readers of the rendered site
will not notice the migration; the HTML output is unchanged. Internal
cross-references resolve, the
pokemon.csvipython example on thelanding page and the
yellow_tripdata_2021-01.parquetexample onthe basics page both still execute.
No
api changelabel — public APIs untouched.Follow-ups (out of scope for this PR)
{eval-rst}.. ipython::blocks to aMyST-native exec syntax. Requires either pulling in
myst-nborconfiguring a per-language parser.
asf-sitepublishing workflow.