[DISCUSSION] Adopt datafusion-functions-json and datafusion-variant into core repo

# Introduction

As semi-structured data processing becomes more important, I hear from more and more DataFusion users that they would like better support for JSON and [Parquet Variant](https://parquet.apache.org/docs/file-format/types/variantencoding/) functions (see this [blog post](https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/) for more details).

Today, function libraries for these two areas live in [datafusion-contrib](https://github.com/datafusion-contrib) rather than in the [main Apache DataFusion repository](https://github.com/apache/datafusion):

- [datafusion-variant](https://github.com/datafusion-contrib/datafusion-variant)
- [datafusion-functions-json](https://github.com/datafusion-contrib/datafusion-functions-json)

Keeping them outside the main repository has benefits, such as faster iteration and not being tied to ASF release cycles.

However, it also means they are:
1. outside ASF governance and release processes
2. less discoverable to users
3. harder to integrate and version downstream alongside DataFusion releases
4. perceived as more experimental even when they solve common problems
5. potentially harder to attract outside maintenance contributions

# Proposal

Bring these two crates into the main DataFusion repository, similarly to how we did for Spark-related functionality. The crates would remain optional and would not become part of the main `datafusion` crate or a default feature flag.

I think we would need buy-in from the current maintainers/authors, including @pydantic, @adriangb, @friendlymatthew, and others.

We previously did this for Spark-compatible functions by bringing [`datafusion-spark`](https://github.com/apache/datafusion/tree/main/datafusion/spark) into the core DataFusion repo because the functionality was widely useful and maintaining it in one place made contribution and coordination easier.

There was also recent discussion on the mailing list about using JSON functionality in the Python bindings, where this topic also came up: https://lists.apache.org/thread/f591qmhx97wsl7h5xfoh7sfhv2gh9t2k

# Alternatives you've considered

1. Keep these crates in `datafusion-contrib` indefinitely.
   This keeps the core repo smaller and preserves flexibility, but leaves the crates outside the main project's release and governance process.
2. Keep them in `datafusion-contrib`, but improve discoverability and documentation.
   This helps users find them, but does not address governance, release coordination, or long-term maintenance.
3. Bring in only one library at a time, starting with the most mature or most widely used.
   This is likely the lowest-risk path if there is agreement in principle but uncertainty about scope.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] Adopt datafusion-functions-json and datafusion-variant into core repo #21301

Introduction

Proposal

Alternatives you've considered

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[DISCUSSION] Adopt datafusion-functions-json and datafusion-variant into core repo #21301

Description

Introduction

Proposal

Alternatives you've considered

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions