Skip to content

DEP: avoid direct dependency on html5lib as an implementation detail from pandas#19900

Open
neutrinoceros wants to merge 1 commit into
astropy:mainfrom
neutrinoceros:dep/drop-html5lib
Open

DEP: avoid direct dependency on html5lib as an implementation detail from pandas#19900
neutrinoceros wants to merge 1 commit into
astropy:mainfrom
neutrinoceros:dep/drop-html5lib

Conversation

@neutrinoceros

Copy link
Copy Markdown
Contributor

Description

ref #14316

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@neutrinoceros neutrinoceros added this to the v8.1.0 milestone Jun 11, 2026
@neutrinoceros neutrinoceros added no-changelog-entry-needed dependencies Pull requests that update a dependency file labels Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Comment on lines -45 to -51
# Special case: pandas defaults to HTML lxml for reading, but does not attempt
# to fall back to bs4 + html5lib. So do that now for convenience if user has
# not specifically selected a flavor. If things go wrong the pandas exception
# with instruction to install a library will come up.
if pandas_fmt == "html" and "flavor" not in kwargs:
if not HAS_LXML and HAS_HTML5LIB and HAS_BS4:
read_kwargs["flavor"] = "bs4"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out, this hack is actually obsolete with pandas>=2.2 (and we don't support older versions):

The default of None tries to use lxml to parse and if that fails it falls back on bs4 + html5lib.

source:
https://pandas.pydata.org/pandas-docs/version/2.2/reference/api/pandas.read_html.html

@neutrinoceros neutrinoceros marked this pull request as ready for review June 11, 2026 15:02

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems nice to make the dependency implicit! Comment in-line is minor, so approving, but might be good to get some further opinion.

Comment thread pyproject.toml
ipython = [
"ipython>=8.1.0",
]
pandas = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly wonder why make an explicit option astropy[pandas] rather than just list pandas[html]>=2.2.2 where needed. Is it so that users can do it too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for deduplication of the lower boundary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants