Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MONGOCRYPT-755 Implement StrEncode #928

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

marksg07
Copy link
Collaborator

No description provided.

@marksg07 marksg07 marked this pull request as ready for review December 31, 2024 20:07
mc_substring_set_t *substring_set;
char *exact;
size_t exact_len;
} mc_str_encode_sets_t;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment on these fields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


mc_str_encode_sets_t mc_text_search_str_encode(const mc_FLE2TextSearchInsertSpec_t *spec) {
// TODO MONGOCRYPT-759 Implement and use CFold
uint32_t unfolded_len = spec->len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add BSON_ASSERT_PARAM(spec) before this

sets.substring_set = NULL;
// Base string is the folded string plus the 0xFF character
sets.base_string = make_base_string_for_str_encode(folded_str, folded_len);
sets.base_len = spec->len + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sets.base_len = spec->len + 1;
sets.base_len = folded_len + 1;

Prefer using folded/unfolded_len over spec->len.

}
// Exact string is always the first len characters of the base string
sets.exact = sets.base_string;
sets.exact_len = spec->len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sets.exact_len = spec->len;
sets.exact_len = folded_len;

same here

// TODO MONGOCRYPT-759 This helper only exists to test folded_len != unfolded_len; make the test actually use folding
mc_str_encode_sets_t mc_text_search_str_encode_helper(const mc_FLE2TextSearchInsertSpec_t *spec,
uint32_t unfolded_len) {
const char *folded_str = spec->v;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add BSON_ASSERT_PARAM(spec) before this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

static mc_substring_set_t *generate_substring_tree(const char *base_str,
uint32_t folded_len,
uint32_t unfolded_len,
const mc_FLE2SubstringInsertSpec_t *spec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function also needs beta and gamma parameters which are the code-point lengths, not the byte length.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (folded codepoint len is inside the base string)

@marksg07 marksg07 marked this pull request as draft January 6, 2025 16:02
@marksg07 marksg07 requested a review from erwee January 6, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants