Fix string_to_int radix when alphabet_index is provided#115
Merged
skorokithakis merged 2 commits intoJun 18, 2026
Conversation
string_to_int's docstring states that when an alphabet_index is passed, the alphabet argument is ignored. The implementation still used len(alphabet) as the numeric base, so passing an alphabet_index built from a longer alphabet than the alphabet argument decoded the string in the wrong radix. Derive the radix from the supplied index instead; for any index built the documented way this is identical to len(alphabet).
| alpha_len = len(alphabet) | ||
| # Derive the radix from the index so `alphabet` is truly ignored when | ||
| # alphabet_index is passed, as documented above. | ||
| alpha_len = max(alphabet_index.values()) + 1 |
Owner
There was a problem hiding this comment.
Suggested change
| alpha_len = max(alphabet_index.values()) + 1 | |
| alpha_len = len(alphabet_index) |
Contributor
Author
There was a problem hiding this comment.
Good call — applied in 517e398. len(alphabet_index) is O(1) and equivalent for any well-formed index (keys are the chars, values 0..N-1), so it's both simpler and faster. Tests still green.
Simpler and O(1); identical to max(values)+1 for any well-formed index (keys=chars, values=0..N-1).
skorokithakis
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
string_to_intdoes not honour its own docstring: when analphabet_indexis supplied, the docstring statesalphabetis ignored, but the implementation still useslen(alphabet)as the numeric base. If the caller passes analphabet_indexbuilt from a different (longer) alphabet than thealphabetargument, the result is decoded in the wrong radix.Reproduction
Expected
16(the string"10"decoded base-16, per the supplied index). Correct usage wherealphabetmatches the index is unaffected:Cause
The docstring:
But the base is taken from the
alphabetargument regardless:So when
alphabet_indexis provided andalphabetis shorter than the index's true radix, the magnitude is wrong.Fix
Derive the radix from the supplied index instead:
For any index built the documented way (
{char: idx for idx, char in enumerate(alphabet)}),max(values) + 1 == len(alphabet), so all existing behaviour is byte-for-byte unchanged. Only the mismatched case is corrected.Tests
Adds
test_string_to_int_ignores_alphabet_when_index_given, which builds a base-16 index while passing a base-10 alphabet and asserts the result uses base 16. It fails on the current code (10 != 16) and passes with the fix. Full suite: 22 tests OK.I couldn't find an existing issue or PR covering this — happy to fold it into one if I missed it.