-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simdutf: simdutf_connector: in_tail: Implement UTF-16LE/UTF-16BE encoder #9468
Open
cosmo0920
wants to merge
24
commits into
master
Choose a base branch
from
cosmo0920-try-to-bundle-simdutf-amalgamation
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+45,269
−19
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cosmo0920
force-pushed
the
cosmo0920-try-to-bundle-simdutf-amalgamation
branch
from
October 7, 2024 07:13
d1b404a
to
4053bbd
Compare
cosmo0920
force-pushed
the
cosmo0920-try-to-bundle-simdutf-amalgamation
branch
from
October 7, 2024 07:17
4053bbd
to
2a515ea
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
From UTF-16LE, UTF-16BE and UTF-16LE with BOM, UTF-16BE with BOM to UTF-8 are supported. This could be useful for Windows' Unicode insisted logs. They are usually using UTF-16LE with BOM. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…code encoder testing Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…s not fully support C++11 Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…stuffs Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Plus, waiting for relatively longer for the ordinary test cases. This is because these test cases for unicode need to read contents from filesystem. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
cosmo0920
force-pushed
the
cosmo0920-try-to-bundle-simdutf-amalgamation
branch
from
December 20, 2024 02:30
7242456
to
cb0b6ba
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In Windows, there are lots of using UTF-16LE programs. This is because Unicode on Windows means UTF-16LE with BOM(Byte Order Mark).
In addition, there is lots of differences between UTF-16LE/UTF-16BE and UTF-8.
I added some of C, J and subdivision flags test cases for converting from UTF-16LE/UTF-16BE to UTF-8 in unit tests for in_tail plugin. This is because in_tail is the main usages to process non-UTF-8 encodings.
At first, we need to process UTF-16LE and UTF-16BE encodings.
Note that simdutf library is written in C++. So, we also provide an option (
FLB_UNICODE_ENCODER
) to turn on/off this feature.Closes #9321
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#1471
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.