fix: build StaticTable version-hint metadata locations with forward slashes#3604
Open
anxkhn wants to merge 1 commit into
Open
fix: build StaticTable version-hint metadata locations with forward slashes#3604anxkhn wants to merge 1 commit into
anxkhn wants to merge 1 commit into
Conversation
…lashes StaticTable._metadata_location_from_version_hint built the version-hint and resolved metadata locations with os.path.join. Iceberg locations are URIs and must always be forward-slash separated, but os.path.join uses the OS separator, so on Windows it produced backslash-separated keys such as s3://.../taxis\metadata\version-hint.text. That is an invalid object-store key, which breaks the documented "pass the table root path" usage of StaticTable.from_metadata on Windows. Join the components with explicit forward slashes, matching how locations are built elsewhere in pyiceberg (for example pyiceberg/table/locations.py). Behavior is unchanged on POSIX. The now-unused os import is removed. Add an OS-independent regression test that swaps os.path.join for the Windows join so the metadata location is asserted to stay a valid forward-slash URI on any platform (the existing test only exercised local temp paths, which use the POSIX separator on Linux CI and never caught this). Related: apache#2477
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
StaticTable._metadata_location_from_version_hintbuilds theversion-hint.textlocation and the resolved metadata-file location with
os.path.join:Iceberg locations are URIs (
s3://,gs://,abfss://,file://, ...) and arealways forward-slash separated.
os.path.joinuses the host OS separator, so onWindows it produces backslash-separated keys such as
s3://warehouse/wh/nyc.db/taxis\metadata\version-hint.text. That is an invalidobject-store key, so
FileIO.new_input(...)cannot find the file and the lookupfails.
This breaks the documented "pass the table root path" usage of
StaticTable.from_metadataon Windows. The API docs advertise exactly this path:This is the only place in the codebase that builds a table/metadata location
with
os.path.join; every other location builder already uses explicit forwardslashes (for example
pyiceberg/table/locations.py, and the metadata-path helpersin
pyiceberg/catalog/__init__.py). This change makes_metadata_location_from_version_hintconsistent with them:Behavior is unchanged on POSIX. The now-unused
osimport (this was its only usein the module) is removed to keep the linter happy.
Are these changes tested?
Yes.
tests/table/test_init.pygainstest_static_table_version_hint_location_uses_forward_slashes, a regression testthat is deliberately OS-independent: it swaps
os.path.joinforntpath.join(the Windows join) via
monkeypatch, stubsload_file_iowith a small fakeFileIO, and asserts the resolved metadata location stays a valid forward-slashURI (
.../metadata/version-hint.textand.../metadata/<file>.metadata.json,with no backslash). It is parametrized over the three version-hint content forms
the method handles (full
*.metadata.jsonfilename, a numeric version, and anon-numeric value). It fails on the old
os.path.joincode (backslash separators)and passes after this change, on any platform.
The pre-existing
test_static_table_version_hint_same_as_tableonly exercises alocal temp path, where the separator is already
/on Linux CI, so it nevercaught this. Windows is not yet in CI (tracked separately in #2477), which is why
the OS-independent approach is used here.
pytest tests/table/test_init.py -k version_hint-> passes (existing + 3 new cases).pytest tests/table/test_init.py-> passes.pytest tests/test_serializers.py-> passes (exercises the realfrom_metadata->_metadata_location_from_version_hintpath over local files).make lint-> clean.The integration suite (Docker + Spark) was not run locally; this change is covered
by the unit tests above and does not alter POSIX behavior.
Are there any user-facing changes?
No API or signature changes. This is a bug fix:
StaticTable.from_metadata(<table root>)andStaticTable._metadata_location_from_version_hintnow build correctforward-slash metadata URIs on Windows. POSIX behavior is unchanged.
Related: #2477 (infra: run tests for windows).