Skip to content

Fix string_to_int radix when alphabet_index is provided#115

Merged
skorokithakis merged 2 commits into
skorokithakis:masterfrom
patchwright:fix/string-to-int-radix
Jun 18, 2026
Merged

Fix string_to_int radix when alphabet_index is provided#115
skorokithakis merged 2 commits into
skorokithakis:masterfrom
patchwright:fix/string-to-int-radix

Conversation

@patchwright

Copy link
Copy Markdown
Contributor

Summary

string_to_int does not honour its own docstring: when an alphabet_index is supplied, the docstring states alphabet is ignored, but the implementation still uses len(alphabet) as the numeric base. If the caller passes an alphabet_index built from a different (longer) alphabet than the alphabet argument, the result is decoded in the wrong radix.

Reproduction

from shortuuid.main import string_to_int

hex_index = {c: i for i, c in enumerate("0123456789abcdef")}
# alphabet_index is base-16; per the docstring `alphabet` is ignored.
string_to_int("10", alphabet="0123456789", alphabet_index=hex_index)
# -> 10   (decoded base-10, because len("0123456789") == 10)

Expected 16 (the string "10" decoded base-16, per the supplied index). Correct usage where alphabet matches the index is unaffected:

string_to_int("10", alphabet="0123456789abcdef")  # -> 16
string_to_int("1f", alphabet="0123456789abcdef")  # -> 31

Cause

The docstring:

The alphabet_index, if provided, should map each character to its index … If this is passed, alphabet is ignored.

But the base is taken from the alphabet argument regardless:

alpha_len = len(alphabet)

So when alphabet_index is provided and alphabet is shorter than the index's true radix, the magnitude is wrong.

Fix

Derive the radix from the supplied index instead:

     number = 0
-    alpha_len = len(alphabet)
+    alpha_len = max(alphabet_index.values()) + 1
     for char in string:

For any index built the documented way ({char: idx for idx, char in enumerate(alphabet)}), max(values) + 1 == len(alphabet), so all existing behaviour is byte-for-byte unchanged. Only the mismatched case is corrected.

Tests

Adds test_string_to_int_ignores_alphabet_when_index_given, which builds a base-16 index while passing a base-10 alphabet and asserts the result uses base 16. It fails on the current code (10 != 16) and passes with the fix. Full suite: 22 tests OK.

I couldn't find an existing issue or PR covering this — happy to fold it into one if I missed it.

string_to_int's docstring states that when an alphabet_index is passed,
the alphabet argument is ignored. The implementation still used
len(alphabet) as the numeric base, so passing an alphabet_index built
from a longer alphabet than the alphabet argument decoded the string in
the wrong radix. Derive the radix from the supplied index instead;
for any index built the documented way this is identical to len(alphabet).
Comment thread shortuuid/main.py Outdated
alpha_len = len(alphabet)
# Derive the radix from the index so `alphabet` is truly ignored when
# alphabet_index is passed, as documented above.
alpha_len = max(alphabet_index.values()) + 1

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alpha_len = max(alphabet_index.values()) + 1
alpha_len = len(alphabet_index)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — applied in 517e398. len(alphabet_index) is O(1) and equivalent for any well-formed index (keys are the chars, values 0..N-1), so it's both simpler and faster. Tests still green.

Simpler and O(1); identical to max(values)+1 for any well-formed
index (keys=chars, values=0..N-1).
@skorokithakis skorokithakis enabled auto-merge (squash) June 18, 2026 21:06
@skorokithakis skorokithakis disabled auto-merge June 18, 2026 21:06
@skorokithakis skorokithakis merged commit 496cddf into skorokithakis:master Jun 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants