-
-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/base32: Add RFC-compliant error handling and improve reliability #4641
base: master
Are you sure you want to change the base?
Conversation
Replace assertions with proper error handling in base32.decode() to allow programs to handle invalid input gracefully rather than crashing. The function now returns ([]byte, Error) instead of just []byte.
Fix buffer allocation size calculation and add proper bounds checking to ensure output buffer has sufficient space. This fixes crashes that could occur with inputs like "AA" and other edge cases where the output buffer was too small. Remove #no_bounds_check as proper bounds checking is necessary for safe error handling. The small performance trade-off is worth the improved robustness.
Add test suite based on RFC 4648 test vectors and validation rules: - Add section 10 test vectors for valid encoding/decoding - Add test cases for invalid character handling (section 3.2) - Add test cases for padding validation (section 4) - Add test cases for length requirements (section 6) The test vectors verify that: - Empty string encodes/decodes correctly - Standard cases like "foo" -> "MZXW6===" work - Invalid characters are rejected - Missing or malformed padding is detected - Invalid lengths are caught
Rework base32.decode() to properly handle all cases per RFC 4648: - Fix error detection order: - Check minimum length first (Invalid_Length) - Check character validity (Invalid_Character) - Check padding and structure (Malformed_Input) - Fix padding validation: - Add required padding length checks (2=6, 4=4, 5=3, 7=1 chars) - Ensure padding only appears at end - Fix handling of unpadded inputs - Fix buffer handling: - Proper output buffer size calculation - Add bounds checking for buffer access - Add proper buffer validation For example: - "M" correctly returns Invalid_Length (too short) - "mzxq====" correctly returns Invalid_Character (lowercase) - "MZXQ=" correctly returns Malformed_Input (wrong padding) - Unpadded input lengths must be multiples of 8 These changes make the decode function fully compliant with RFC 4648 requirements while providing proper error handling.
Fix memory handling throughout base32 package: - Make padding map package-level constant (to avoid repeated allocs) - Use passed allocator in encode's make() call - Add defer delete for allocated memory in encode - Add proper cleanup in test cases - Fix memory cleanup of output buffers The changes ensure consistent allocator usage and cleanup in both implementation and tests.
Replace package-level map with a simple switch statement for padding validation. This eliminates allocations we can't properly free while maintaining the same validation logic.
Add support for custom alphabet validation through an optional validation function parameter. The default validation follows RFC 4648 base32 alphabet rules (A-Z, 2-7). This properly supports the documented ability to use custom alphabets.
Fix encoding to properly use provided encoding table parameter instead of hardcoded `ENC_TABLE`. This makes encode properly support custom alphabets as documented.
The decoding table was only 224 bytes which caused type mismatches when using custom alphabets, so expand with zeroes to cover full byte range while maintaining the same decoding logic.
Move existing test procedures to a dedicated test file for better code organization and maintainability.
Add test_base32_roundtrip() to verify the encode->decode roundtrip preserves data integrity. This test helps ensure our base32 implementation correctly handles the full encode->decode cycle without data loss or corruption.
Remove premature deallocation of the output buffer which was causing use-after-free behavior. The returned string needs to take ownership of this memory, but the defer delete was freeing it before the string could be used. This fixes issues with encoding that were introduced by overly aggressive memory cleanup in 93238db.
Add test case to verify custom alphabet support. The test uses a decimal-uppercase alphabet (0-9, A-V) to test both encoding and decoding with custom tables, including validation. This ensures the encode and decode functions work correctly with custom encoding tables and validation functions as documented.
Add defer delete for encoded strings across all test procedures to ensure proper cleanup and prevent memory leaks. This completes the memory management improvements started in 591dd87.
60f794a
to
d6f4412
Compare
Fix incorrect RFC 4648 section references: - Add RFC URL reference at package level - Update Error enum documentation to reference correct sections: - Invalid_Character: Section 3.3 (non-alphabet characters) - Invalid_Length: Section 6 (base32 block size requirements) - Malformed_Input: Section 3.2 (padding) - Fix test file section references to match correct sections This ensures all RFC references are accurate and adds a link to the source RFC for reference.
encoding/base64 needs the same treatment I guess? |
Yup! That is why I mentioned it under "Future Work". If this happens to get merged, I will work on encoding/base64 as well. |
Missed it ngl 🙂 |
Add `@(rodata)` attribute to `ENC_TABLE` and `DEC_TABLE` to mark them as read-only data. This places these tables in the read-only section of the executable, protecting them from modification during program execution.
// Process input in 8-byte blocks | ||
outi := 0 | ||
for i := 0; i < input_chars; i += 8 { | ||
buf: [8]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can skip 0 initialization here by using ---
block_size := min(8, input_chars - i) | ||
|
||
// Decode block | ||
for j := 0; j < block_size; j += 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: you can use range style loop: for j in 0 ..< block_size
if you want
} | ||
|
||
// Validate characters using provided validation function | ||
for i := 0; i < len(data); i += 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: you can use range style loop: for c in data
if you want.
alternatively perhaps bytes.index_byte
from core:bytes
can be used to find the index of first PADDING
. and then use that for this loop and the one below to eliminate the PADDING
branch.
Also #no_bounds_check
can be used here
// Validate padding and length | ||
data_len := len(data) | ||
padding_count := 0 | ||
for i := data_len - 1; i >= 0; i -= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#no_bounds_check
can be used here
// Check for proper padding and length combinations | ||
if padding_count > 0 { | ||
// Verify no padding in the middle | ||
for i := 0; i < data_len - padding_count; i += 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#no_bounds_check
can be used here
} | ||
|
||
@private | ||
_encode :: proc(out, data: []byte, ENC_TBL := ENC_TABLE, allocator := context.allocator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can #no_bounds_check
can be used for this proc?
The current implementation of base32.decode() uses assertions for (incomplete) validation, which causes programs to crash on invalid input. This PR replaces assertions with proper error handling according to RFC 4648, allowing users of the package to handle invalid input gracefully.
Motivation: While implementing a pure Odin implementation of TOTP (Time-based One-Time Password), I discovered that base32.decode() would segfault on invalid input due to assertions. As a core library package, it should provide proper error handling to allow users to gracefully handle decode errors, which these changes enable.
Problem Description
#no_bounds_check
) prevents proper error handlingThe original implementation prioritized performance by disabling bounds checking, but this tradeoff is not appropriate for a core library package. The small or potentially negligible performance overhead from bounds checking is worth the improved safety and usability.
Implementation Details
The error enum:
Before (crashes on invalid input):
After (error handling available):
Key implementation changes:
([]byte, Error)
instead of just[]byte
#no_bounds_check
directive for safetyDesign Decisions
Bounds Checking:
#no_bounds_check
for performanceError Handling:
(nil, .None)
by conventionTesting:
Breaking Changes
Function Signature:
([]byte, Error)
instead of[]byte
Performance:
Error Handling:
Testing
Test coverage includes:
All tests pass with 100% coverage of the new functionality and error cases.
Future Work
Thank you for considering this contribution.
Happy New Year! :)