Fix Base58 dropping leading zero bytes#44
Open
gaoflow wants to merge 1 commit into
Open
Conversation
The generic base_encode/base_decode convert the whole input to a single integer, so leading null bytes (high-order zeros) were silently lost: e.g. Base58 encoded b'\x00abc' to 'ZiCa' instead of '1ZiCa', and b'\x00' to an empty string. Per the Base58 spec each leading 0x00 byte maps to a leading charset[0] character. Preserve the leading-zero count on encode and restore it on decode, so values round-trip and match reference implementations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Base58 (and the other big-integer base codecs) silently drop leading null bytes:
base_encode/base_decodeinsrc/codext/base/_base.pyconvert the whole input to a single integer (s2i) and back viadivmod, so leading0x00bytes (high-order zeros) vanish. Per the Base58 specification the codec cites (and every reference implementation, e.g. thebase58PyPI library / Bitcoin Core), each leading0x00byte must map to a leadingcharset[0]character ('1'for the bitcoin alphabet). This also broke round-tripping for any value beginning with a null byte.Fix
Preserve the leading-zero count on encode (prepend one
charset[0]per leading\x00) and restore it on decode (prepend one\x00per leadingcharset[0]). Both changes are guarded to the byte-input path so the integer recode used internally is untouched.Verified against the
base58reference library: 0 mismatches and 0 round-trip failures across random inputs (every leading-zero input failed before).Test
Extended
test_codec_base58intests/test_base.pywith leading-null-byte encode/decode/round-trip assertions (str and bytes paths). Verified red→green: the test fails without the source change (AssertionError) and passes with it; the full test suite stays green (103 passed).Disclosure: I use AI assistance (under my direction) for my contributions; I review and verify every change before submitting.