Allow using pandas as a table join engine by taldcroft · Pull Request #19860 · astropy/astropy

taldcroft · 2026-06-03T12:57:28Z

Description

Pandas has extremely efficient and optimized support for table joins using a dict-like mapping and C/Cython code. Joining a large table using pandas is up to 20 times faster than astropy, which uses a fairly naive implementation using numpy sorting.

This PR allows using pandas as a join engine, resulting in astropy table join performance this is nearly as fast as pandas (about 10-20% slower).

Fixes #

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2026-06-03T12:57:46Z

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

Do the proposed changes actually accomplish desired goals?
Do the proposed changes follow the Astropy coding guidelines?
Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

hamogu · 2026-06-03T13:33:53Z

How about using "pandas" as the default when it's installed and only to fall back to "astropy" if pandas is not available. In fact, you might not add a keyword at all and just take what's available.

neutrinoceros · 2026-06-03T13:58:45Z

you might not add a keyword at all and just take what's available.

that's the approach we've been following with bottleneck-powered accelerations but I would recommend we avoid it in the future, as it also creates confusion when the "faster" implementation is silently selected but returns incorrect results (or causes a crash).

taldcroft · 2026-06-03T14:19:40Z

How about using "pandas" as the default when it's installed and only to fall back to "astropy" if pandas is not available. In fact, you might not add a keyword at all and just take what's available.

I'm pretty keen on maintaining explicit control. It might be that astropy is faster for small tables, or the user wants the sort order from astropy to maintain stability in their pipeline products. So I would propose an option engine="auto" to automatically select the engine based on some heuristics.

According to gemini, Polars join is even faster than pandas by factors of 4-40x for medium to large datasets. So we want both explicit control and some logic to decide on the fastest join engine.

Initial support for pandas as a join engine

9208db5

github-actions Bot added the table label Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow using pandas as a table join engine#19860

Allow using pandas as a table join engine#19860
taldcroft wants to merge 1 commit into
astropy:mainfrom
taldcroft:table-join-engine-pandas

taldcroft commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

hamogu commented Jun 3, 2026

Uh oh!

neutrinoceros commented Jun 3, 2026

Uh oh!

taldcroft commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

taldcroft commented Jun 3, 2026

Description

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

hamogu commented Jun 3, 2026

Uh oh!

neutrinoceros commented Jun 3, 2026

Uh oh!

taldcroft commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taldcroft commented Jun 3, 2026 •

edited

Loading