-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short NUL-only keys incorrectly detected as duplicates in CBOR and SMILE #312
Comments
No idea how one might fix this but here's a test for it at least (against 45e8780): diff --git a/smile/src/test/java/com/fasterxml/jackson/dataformat/smile/parse/BasicParserTest.java b/smile/src/test/java/com/fasterxml/jackson/dataformat/smile/parse/BasicParserTest.java
index 26bfd377..a31feba8 100644
--- a/smile/src/test/java/com/fasterxml/jackson/dataformat/smile/parse/BasicParserTest.java
+++ b/smile/src/test/java/com/fasterxml/jackson/dataformat/smile/parse/BasicParserTest.java
@@ -478,6 +478,30 @@ public class BasicParserTest extends BaseTestForSmile
p.close();
}
+ public void testNULInFieldNames() throws IOException
+ {
+ ByteArrayOutputStream bytes = new ByteArrayOutputStream(1000);
+ JsonGenerator jgen = new SmileFactory().createGenerator(bytes);
+ jgen.writeStartObject();
+ for (String fieldName = ""; fieldName.length() < 20; fieldName += "\0") {
+ jgen.writeStringField(fieldName, "");
+ }
+ jgen.writeEndObject();
+ jgen.close();
+
+ SmileParser p = _smileParser(bytes.toByteArray());
+
+ assertToken(JsonToken.START_OBJECT, p.nextToken());
+ for (String fieldName = ""; fieldName.length() < 20; fieldName += "\0") {
+ assertToken(JsonToken.FIELD_NAME, p.nextToken());
+ assertEquals(fieldName, p.getCurrentName());
+ assertToken(JsonToken.VALUE_STRING, p.nextToken());
+ assertEquals("", p.getValueAsString());
+ }
+ assertToken(JsonToken.END_OBJECT, p.nextToken());
+ assertNull(p.nextToken());
+ p.close();
+ }
public void testBufferRelease() throws IOException
{ |
@DaveCTurner first of all, thank you for reporting this and especially contributing a test to reproduce! I have a suspicion that the issue is with zero-padding of symbol table used; I vaguely recall something similar with JSON (with escaping, null bytes are legal in JSON names too). But the gist of this is that symbol table lookups are keyed by "quads" ( Now: I hope to look into this in near future, but I do have bit of a backlog. I just wanted to add a note saying above. :) |
Ok, yes, I can reproduce this. Somehow looks like two-null-byte case gets decoded as single null byte, if I got that right. |
Interestingly enough, JSON parser from Actually, hmmh. Yes, it goes through different decoder but ends up with similar look up, with a call to and looks like JSON backend has no mismatch. Need to see if something similar would help with CBOR, Smile. |
Hmmmh. Ok... this is tricky. Code in Smile, CBOR codecs does not keep track of "unused" bytes in quads, unlike JSON codec. But we do need to distinguish between "unused" and Two ways to go about it, really:
JSON codec uses (2) for what that is worth. |
👍 that ties in with what I thought I was seeing yes. I'd completely forgotten that diff --git a/smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileParser.java b/smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileParser.java
index ea4fe91c..72560434 100644
--- a/smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileParser.java
+++ b/smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileParser.java
@@ -1825,7 +1825,7 @@ versionBits));
if (len < 5) {
int inPtr = _inputPtr;
final byte[] inBuf = _inputBuffer;
- int q = inBuf[inPtr] & 0xFF;
+ int q = inBuf[inPtr] & 0xFF | 0xFFFFFF00;
if (len > 1) {
q = (q << 8) + (inBuf[++inPtr] & 0xFF);
if (len > 2) {
@@ -1849,7 +1849,7 @@ versionBits));
q1 = (q1 << 8) | (inBuf[inPtr++] & 0xFF);
if (len < 9) {
- int q2 = (inBuf[inPtr++] & 0xFF);
+ int q2 = (inBuf[inPtr++] & 0xFF) | 0xFFFFFF00;
int left = len - 5;
if (left > 0) {
q2 = (q2 << 8) + (inBuf[inPtr++] & 0xFF);
@@ -1871,7 +1871,7 @@ versionBits));
q2 = (q2 << 8) | (inBuf[inPtr++] & 0xFF);
if (len < 13) {
- int q3 = (inBuf[inPtr++] & 0xFF);
+ int q3 = (inBuf[inPtr++] & 0xFF) | 0xFFFFFF00;
int left = len - 9;
if (left > 0) {
q3 = (q3 << 8) + (inBuf[inPtr++] & 0xFF);
@@ -1920,7 +1920,7 @@ versionBits));
} while ((len -= 4) > 3);
// and then leftovers
if (len > 0) {
- int q = inBuf[inPtr] & 0xFF;
+ int q = inBuf[inPtr] & 0xFF | 0xFFFFFF00;
if (len > 1) {
q = (q << 8) + (inBuf[++inPtr] & 0xFF);
if (len > 2) { That fixes it for me, but I see other similar code that might also have the same problem. |
@DaveCTurner ah. I was looking at CBOR end of things. There are indeed a few other code paths to consider... partly for different lengths, partly for boundary conditions. So I'll need to improve the tests I think. I'll see if I can fix this but just for sake of safety, for 2.14 branch. I have a feeling that there's non-trivial chance for regression for either correctness or performance here. |
Ok, fixed; yes, pre-padding is probably bit simpler although both work (and had to use post-padding for one case in Smile). |
The following object apparently cannot be deserialised in CBOR and SMILE:
The following are base64-encoded copies of its representation in various formats:
As far as I can tell these all look correct, but on deserialization the field names are reported as duplicates:
Relates elastic/elasticsearch#84146
The text was updated successfully, but these errors were encountered: