S3: fix URL encoding of ContinuationToken in ListObjectsV2#13746
Conversation
| ], | ||
| "ChecksumType": "FULL_OBJECT", | ||
| "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", | ||
| "Key": "date%3D2026-03-01/", |
There was a problem hiding this comment.
the fact that the key is encoded in the response is a side-effect/behavior of the Botocore client itself, I think by default it always sends EncodingType=url all the type, but if you manually specify it, it quotes the response keys on its own for some reasons.
There was a problem hiding this comment.
Ah interesting, so just for some clarification for this - regardless of whether you specify Just reread the description, so it will automatically set EncodingType=url if the key has unsafe characters, but is the idea that it doesn't actually URL encode the keys in the response unless manually specified?EncodingType=url in the request it will always send it in the response, but it does affect whether or not the Key returned is URL encoded?
There was a problem hiding this comment.
So it seems that Boto3 (the Python SDK) will always put EncodingType=url whatever the Object Key is.
Then it seems it has some hooks, AWS will always return the keys "raw" and not encoded, but Boto3 will only runs some kind of hooks to URL encode the Keys if you manually specify the value to be EncodingType=url (so technically the same as the default). If you let Boto3 with the default value, it doesn't run the hooks. Hope it's clear, it wasn't to me when I ran those tests 😅
S3 Image Test Results (AMD64 / ARM64) 2 files 2 suites 8m 23s ⏱️ Results for commit 03fe8a9. ♻️ This comment has been updated with latest results. |
Test Results (amd64) - Integration, Bootstrap 5 files ± 0 5 suites ±0 1h 38m 1s ⏱️ - 57m 59s Results for commit 03fe8a9. ± Comparison against base commit 2046dff. This pull request removes 3526 and adds 5 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
LocalStack Community integration with Pro 2 files 2 suites 1h 15m 25s ⏱️ Results for commit 03fe8a9. ♻️ This comment has been updated with latest results. |
aidehn
left a comment
There was a problem hiding this comment.
Looks good! Thank you for picking this up quickly 💯 I just left a few questions mostly for understanding, but I trust the behavior is as expected with the new tests! None of the comments are blocking so feel free to merge 👍🏼
| snapshot.add_transformer( | ||
| snapshot.transform.key_value("ContinuationToken", reference_replacement=False) | ||
| ) | ||
| s3_client = aws_client_factory(config=Config(parameter_validation=False)).s3 |
There was a problem hiding this comment.
Didn't know you could do this! OOC why is it needed in this scenario, does the validation not like one of the file keys?
There was a problem hiding this comment.
Yes, added comments 👍 Boto does not accept None as a valid value for EncodingType but it is a strict enum
| start_after = start_after or "" | ||
| decoded_continuation_token = decode_continuation_token(continuation_token) | ||
|
|
||
| if encoding_type: |
There was a problem hiding this comment.
nit: Not related to this PR, but for clarity it could be worth changing this to
if encoding_type == EncodingType.url:
...
I don't think it's immediately obvious that S3 List Objects V2 only supports a single encoding type! Also means if it changes in the future to support more it's logic kept separate from any new encoding types
There was a problem hiding this comment.
Very good point! Actually even if we have EncodingType.url as a string enum, the parser does not enforce it being in the Enum.
So I've added validation around that! 👍
9d70696 to
03fe8a9
Compare
bentsku
left a comment
There was a problem hiding this comment.
Thanks a lot for the review, it showcased an issue in how we handled the EncodingType so added additional validation 👍
| start_after = start_after or "" | ||
| decoded_continuation_token = decode_continuation_token(continuation_token) | ||
|
|
||
| if encoding_type: |
There was a problem hiding this comment.
Very good point! Actually even if we have EncodingType.url as a string enum, the parser does not enforce it being in the Enum.
So I've added validation around that! 👍
| snapshot.add_transformer( | ||
| snapshot.transform.key_value("ContinuationToken", reference_replacement=False) | ||
| ) | ||
| s3_client = aws_client_factory(config=Config(parameter_validation=False)).s3 |
There was a problem hiding this comment.
Yes, added comments 👍 Boto does not accept None as a valid value for EncodingType but it is a strict enum
| ], | ||
| "ChecksumType": "FULL_OBJECT", | ||
| "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", | ||
| "Key": "date%3D2026-03-01/", |
There was a problem hiding this comment.
So it seems that Boto3 (the Python SDK) will always put EncodingType=url whatever the Object Key is.
Then it seems it has some hooks, AWS will always return the keys "raw" and not encoded, but Boto3 will only runs some kind of hooks to URL encode the Keys if you manually specify the value to be EncodingType=url (so technically the same as the default). If you let Boto3 with the default value, it doesn't run the hooks. Hope it's clear, it wasn't to me when I ran those tests 😅
Motivation
As reported by #13745, we had an issue when dealing with keys that would require to be URL encoded.
I've tracked the issue to be because the client would automatically pass the
EncodingTypetourlif the key had unsafe characters. But after that, we would compare the URL encoded key against the non-encodedContinuationToken, leading to no match.I found the issue to be true for
StartAfteras well.This PR now properly encodes the
ContinuationToken(after b64 decoding) andStartAfterso that it can be properly compared against the Object Key.Because I first thought it was an issue with the b64 encoding itself, I've moved the encoding/decoding in utils, not required but it cleans up a little bit 🧹
Changes
StartAfter)Tests
Related
fixes #13745