Skip to content
This repository was archived by the owner on Mar 23, 2026. It is now read-only.

S3: fix URL encoding of ContinuationToken in ListObjectsV2#13746

Merged
bentsku merged 3 commits into
mainfrom
s3-fix-continuation
Feb 12, 2026
Merged

S3: fix URL encoding of ContinuationToken in ListObjectsV2#13746
bentsku merged 3 commits into
mainfrom
s3-fix-continuation

Conversation

@bentsku
Copy link
Copy Markdown
Contributor

@bentsku bentsku commented Feb 11, 2026

Motivation

As reported by #13745, we had an issue when dealing with keys that would require to be URL encoded.

I've tracked the issue to be because the client would automatically pass the EncodingType to url if the key had unsafe characters. But after that, we would compare the URL encoded key against the non-encoded ContinuationToken, leading to no match.

I found the issue to be true for StartAfter as well.

This PR now properly encodes the ContinuationToken (after b64 decoding) and StartAfter so that it can be properly compared against the Object Key.

Because I first thought it was an issue with the b64 encoding itself, I've moved the encoding/decoding in utils, not required but it cleans up a little bit 🧹

Changes

  • add test validating the AWS behavior / showing the issue on our side and preventing regression
  • fix the behavior by properly encoding the values so that they can properly compared, and be returned encoded too (for StartAfter)

Tests

Related

fixes #13745

@bentsku bentsku added this to the 4.14 milestone Feb 11, 2026
@bentsku bentsku self-assigned this Feb 11, 2026
@bentsku bentsku added aws:s3 Amazon Simple Storage Service semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Feb 11, 2026
],
"ChecksumType": "FULL_OBJECT",
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"Key": "date%3D2026-03-01/",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that the key is encoded in the response is a side-effect/behavior of the Botocore client itself, I think by default it always sends EncodingType=url all the type, but if you manually specify it, it quotes the response keys on its own for some reasons.

Copy link
Copy Markdown
Contributor

@aidehn aidehn Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, so just for some clarification for this - regardless of whether you specify EncodingType=url in the request it will always send it in the response, but it does affect whether or not the Key returned is URL encoded? Just reread the description, so it will automatically set EncodingType=url if the key has unsafe characters, but is the idea that it doesn't actually URL encode the keys in the response unless manually specified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that Boto3 (the Python SDK) will always put EncodingType=url whatever the Object Key is.

Then it seems it has some hooks, AWS will always return the keys "raw" and not encoded, but Boto3 will only runs some kind of hooks to URL encode the Keys if you manually specify the value to be EncodingType=url (so technically the same as the default). If you let Boto3 with the default value, it doesn't run the hooks. Hope it's clear, it wasn't to me when I ran those tests 😅

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 11, 2026

S3 Image Test Results (AMD64 / ARM64)

    2 files      2 suites   8m 23s ⏱️
  564 tests   512 ✅  52 💤 0 ❌
1 128 runs  1 024 ✅ 104 💤 0 ❌

Results for commit 03fe8a9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 11, 2026

Test Results - Preflight, Unit

23 111 tests  +12   21 249 ✅ +12   6m 14s ⏱️ ±0s
     1 suites ± 0    1 862 💤 ± 0 
     1 files   ± 0        0 ❌ ± 0 

Results for commit 03fe8a9. ± Comparison against base commit 2046dff.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 11, 2026

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 1s ⏱️ -3s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit 03fe8a9. ± Comparison against base commit 2046dff.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 11, 2026

Test Results (amd64) - Integration, Bootstrap

    5 files  ±    0      5 suites  ±0   1h 38m 1s ⏱️ - 57m 59s
2 091 tests  - 3 521  1 925 ✅  - 3 179  166 💤  - 342  0 ❌ ±0 
2 097 runs   - 3 521  1 925 ✅  - 3 179  172 💤  - 342  0 ❌ ±0 

Results for commit 03fe8a9. ± Comparison against base commit 2046dff.

This pull request removes 3526 and adds 5 tests. Note that renamed tests count towards both.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…
tests.aws.services.s3.test_s3_list_operations.TestS3ListObjectsV2 ‑ test_list_objects_v2_continuation_token_safe_chars
tests.aws.services.s3.test_s3_list_operations.TestS3ListObjectsV2 ‑ test_list_ops_encoding_type_validation[list_multipart_uploads]
tests.aws.services.s3.test_s3_list_operations.TestS3ListObjectsV2 ‑ test_list_ops_encoding_type_validation[list_object_versions]
tests.aws.services.s3.test_s3_list_operations.TestS3ListObjectsV2 ‑ test_list_ops_encoding_type_validation[list_objects]
tests.aws.services.s3.test_s3_list_operations.TestS3ListObjectsV2 ‑ test_list_ops_encoding_type_validation[list_objects_v2]

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 11, 2026

LocalStack Community integration with Pro

    2 files      2 suites   1h 15m 25s ⏱️
2 067 tests 1 897 ✅ 170 💤 0 ❌
2 069 runs  1 897 ✅ 172 💤 0 ❌

Results for commit 03fe8a9.

♻️ This comment has been updated with latest results.

@bentsku bentsku marked this pull request as ready for review February 11, 2026 17:31
Copy link
Copy Markdown
Contributor

@aidehn aidehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for picking this up quickly 💯 I just left a few questions mostly for understanding, but I trust the behavior is as expected with the new tests! None of the comments are blocking so feel free to merge 👍🏼

snapshot.add_transformer(
snapshot.transform.key_value("ContinuationToken", reference_replacement=False)
)
s3_client = aws_client_factory(config=Config(parameter_validation=False)).s3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know you could do this! OOC why is it needed in this scenario, does the validation not like one of the file keys?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added comments 👍 Boto does not accept None as a valid value for EncodingType but it is a strict enum

start_after = start_after or ""
decoded_continuation_token = decode_continuation_token(continuation_token)

if encoding_type:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not related to this PR, but for clarity it could be worth changing this to

if encoding_type == EncodingType.url:
 ...

I don't think it's immediately obvious that S3 List Objects V2 only supports a single encoding type! Also means if it changes in the future to support more it's logic kept separate from any new encoding types

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! Actually even if we have EncodingType.url as a string enum, the parser does not enforce it being in the Enum.

So I've added validation around that! 👍

@bentsku bentsku force-pushed the s3-fix-continuation branch from 9d70696 to 03fe8a9 Compare February 12, 2026 10:34
Copy link
Copy Markdown
Contributor Author

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review, it showcased an issue in how we handled the EncodingType so added additional validation 👍

start_after = start_after or ""
decoded_continuation_token = decode_continuation_token(continuation_token)

if encoding_type:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! Actually even if we have EncodingType.url as a string enum, the parser does not enforce it being in the Enum.

So I've added validation around that! 👍

snapshot.add_transformer(
snapshot.transform.key_value("ContinuationToken", reference_replacement=False)
)
s3_client = aws_client_factory(config=Config(parameter_validation=False)).s3
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added comments 👍 Boto does not accept None as a valid value for EncodingType but it is a strict enum

],
"ChecksumType": "FULL_OBJECT",
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"Key": "date%3D2026-03-01/",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that Boto3 (the Python SDK) will always put EncodingType=url whatever the Object Key is.

Then it seems it has some hooks, AWS will always return the keys "raw" and not encoded, but Boto3 will only runs some kind of hooks to URL encode the Keys if you manually specify the value to be EncodingType=url (so technically the same as the default). If you let Boto3 with the default value, it doesn't run the hooks. Hope it's clear, it wasn't to me when I ran those tests 😅

@bentsku bentsku merged commit a02a8d9 into main Feb 12, 2026
42 checks passed
@bentsku bentsku deleted the s3-fix-continuation branch February 12, 2026 11:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

aws:s3 Amazon Simple Storage Service docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: patch Non-breaking changes which can be included in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: S3 list-objects ContinuationToken does not work when files' keys contain =

2 participants