feat: expose SessionContext.write_csv, write_json, write_parquet by timsaucer · Pull Request #1569 · apache/datafusion-python

timsaucer · 2026-06-04T17:03:17Z

Which issue does this PR close?

Closes #. No dedicated tracking issue; related to umbrella issue #462 (interface design / user stories).

Rationale for this change

DataFusion's SessionContext exposes write_csv, write_json, and write_parquet methods that take an already-built Arc<dyn ExecutionPlan> and a target path. These complement the existing per-DataFrame write methods and are the right entry point when a caller already holds a physical plan -- for example after running custom physical optimizer rules (recently exposed via PR #1557) or after constructing a plan directly. The Python bindings did not surface them.

What changes are included in this PR?

crates/core/src/context.rs: add write_csv, write_json, and write_parquet PyO3 methods on PySessionContext. Each accepts a PyExecutionPlan and a path, converts the plan to Arc<dyn ExecutionPlan>, and delegates to the matching upstream SessionContext method. write_parquet passes None for the WriterProperties argument; per-partition Parquet tuning remains on DataFrame.write_parquet.
python/datafusion/context.py: add Python wrappers with doctest examples that round-trip data through a temp directory. The docstrings flag DataFrame.write_* as the right entry point when callers need header control, compression, or other write options.

Are there any user-facing changes?

Yes. Three new public methods on datafusion.SessionContext:

write_csv(plan, path)
write_json(plan, path)
write_parquet(plan, path)

No breaking changes.

Adds three plan-level writers on SessionContext that mirror the upstream datafusion::execution::context API. Each takes an ExecutionPlan and an output directory path; the plan is executed and its results are written one partition per file inside that directory. These complement the existing DataFrame.write_* methods, which are the right choice when callers need finer control (CSV header, Parquet compression, write options). The new SessionContext methods are the right choice when a caller already holds a physical ExecutionPlan (for example after custom physical optimizer rules or hand-built plans) and just wants the rows materialized. Related to apache#462. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose SessionContext.write_csv, write_json, write_parquet#1569

feat: expose SessionContext.write_csv, write_json, write_parquet#1569
timsaucer wants to merge 1 commit into
apache:mainfrom
timsaucer:feat/df54-session-write-methods

timsaucer commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timsaucer commented Jun 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant