Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StreamReadConstraints limit for longest textual value to allow (default: 5M) #863

Closed
cowtowncoder opened this issue Dec 14, 2022 · 26 comments
Labels
processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Milestone

Comments

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 14, 2022

So, similar to #827 there should be some upper limit for length of String values. We can start with something rather long -- 5 megs -- since the handling of textual values is not nearly as sensitive to input length as that of numeric (in particular floating-point). And also because there are likely use cases that rely on ability to process sizable Strings (different from numbers once again).

Different from number constraints we can and should apply limit during accumulation of textual value, into TextBuffer (either within it, or maybe more likely, when caller fetches new local char[] buffer). The idea being that we do want to avoid processing of the whole String before failing.
This would probably mean that the limit applies to char length after entity resolution. That should be fine too; the goal is not to impose byte-accurate limits but to prevent overloading.

Note: this is also distinct from total document length limit that we likely want to impose too, eventually.

EDIT: (21-Mar-2023) 2.15.0-rc1 had limit of 1M, rc2 planned to use 5M. Updated description and title accordingly.

@cowtowncoder cowtowncoder added performance Issue related to performance problems or enhancements 2.15 labels Dec 14, 2022
@cowtowncoder
Copy link
Member Author

cc @pjfanning Just creating placeholder, based on our discussions. Even at the risk of security mongerers reading this :)

@pjfanning
Copy link
Member

this one may not be that useful - users would be better off setting a max length on the JSON stream that they will accept

@cowtowncoder
Copy link
Member Author

@pjfanning I am not sure -- I guess I am considering bigger streaming use cases (where total content length cannot be easily limited), although granted those would probably not accept external untrusted input.

@cowtowncoder cowtowncoder changed the title Add StreamReadConstraints limit for longest textual value to allow Add StreamReadConstraints limit for longest textual value to allow (default: 1M) Feb 14, 2023
cowtowncoder added a commit that referenced this issue Feb 14, 2023
@cowtowncoder cowtowncoder added this to the 2.15.0 milestone Feb 14, 2023
@cowtowncoder cowtowncoder changed the title Add StreamReadConstraints limit for longest textual value to allow (default: 1M) Add StreamReadConstraints limit for longest textual value to allow (default: 5M) Mar 22, 2023
flowertwig added a commit to FortnoxAB/reactive-wizard that referenced this issue Apr 26, 2023
@motlin
Copy link

motlin commented Apr 28, 2023

I upgraded to Jackson 2.15 and ran into the error:

Caused by: com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (X) exceeds the maximum length (5000000)

Searching for that error message led me here. It's clear this is an intentional change. It would be helpful to add instructions here on how to fix it. Am I able to configure that limit to be higher? Do I need to switch to a more efficient API?

@pjfanning
Copy link
Member

@motlin see #1001 (comment)

@motlin
Copy link

motlin commented Apr 28, 2023

Thank you @pjfanning.

That comment shows that we're able to set the max length higher with code like:

objectMapper.getFactory()
		.setStreamReadConstraints(StreamReadConstraints.builder().maxStringLength(10_000_000).build())

I'm able to effectively disable the feature by setting the max length to Integer.MAX_VALUE, but not 0.

ObjectMapper objectMapper = ...;
StreamReadConstraints streamReadConstraints = StreamReadConstraints
    .builder()
    .maxStringLength(Integer.MAX_VALUE)
    .build();
objectMapper.getFactory().setStreamReadConstraints(streamReadConstraints);

I'm going to set the max length high to unblock the upgrade. It's not clear to me if this is a good idea or if this indicates a real performance problem in my code. Are there other options I ought to consider?

@pjfanning
Copy link
Member

@motlin this is your decision - Integer.MAX_VALUE is the right value if you don't want to apply a limit.

@motlin
Copy link

motlin commented Apr 28, 2023

The json I'm parsing has huge string values inside, so I think this is the correct thing to do. What confused me is that the error mentions StreamConstraintsException which made me think of Jackson's streaming apis. I'm using objectMapper.readValue(...) and thought the error might be telling me to switch to the streaming api.

@cowtowncoder
Copy link
Member Author

Ah. No, it refers to where the constraints are enforced.

@pjfanning Remember when I mentioned it matters what exception says, wrt users trying to figure out how to change settings? :)

@pjfanning
Copy link
Member

pjfanning commented Apr 28, 2023

Exceptions should not have essay length messages. Could we create a FAQ entry that describes what to do when you get this exception? We could even put a link in the exception message. The text of how to set these constraints is going to take 10-20 lines.

@cowtowncoder
Copy link
Member Author

Just to make clear: no one is asking for essay-length messages but rather useful pointer(s): for that FAQ + link sounds like a good idea (for example). Keywords in message help too.

@motlin What do you think? What was the exact message if you still have access? (I know, it's in code too but just for convenience for readers here).

@motlin
Copy link

motlin commented Apr 28, 2023

Anonymized error below. I don't have strong opinions on what the error message ought to look like. As long as users can search for the error and find a pointer to the advice above about objectMapper.getFactory().setStreamReadConstraints(streamReadConstraints) then I think we're good. Right now, searching for "jackson String length exceeds the maximum length" in Google leads me here to this issue, but it's not the top result.

java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: String length (5000001) exceeds the maximum length (5000000) (through reference chain: MyClass["myMap"]->java.util.LinkedHashMap["myKey"]->MyClass2["myLongString"])
	at <my code>
Caused by: com.fasterxml.jackson.databind.JsonMappingException: String length (5000001) exceeds the maximum length (5000000) (through reference chain: MyClass["myMap"]->java.util.LinkedHashMap["myKey"]->MyClass2["myLongString"])
	at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
	at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361)
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1830)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:570)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1409)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:557)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:451)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
	at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:545)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:568)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1409)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3637)
	at <my code>
	... 16 more
Caused by: com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (14408) exceeds the maximum length (10000)
	at com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:290)
	at com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:482)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2561)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:335)
	at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:42)
	at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:11)
	at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:545)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:568)
	... 35 more

@cowtowncoder
Copy link
Member Author

Ok so message has no reference to any class or such. We could add a reference to StreamReadConstaints.[CONSTRAINT] and that would probably be more searchable.

@stolsvik
Copy link

stolsvik commented May 2, 2023

Got surprised by this. Would have appreciated more info in Exception, yes! I am not scared of essay-length exceptions - it is better with a bit on the verbose side, than leaving users (devs) completely dumbfounded.

Situation: I have a use case in https://mats3.io/ where I stick one JSON serialized DTO inside another DTO, the containing envelope "MatsTrace". Although this might not be seen as ideal from a performance perspective, I want these two to be completely separate (they are separately deser'ed, and the "outer" container doesn't know the "inner" classes, and the inner might even use another deser entirely), and it works just fine.

The 5M limit bit pretty hard, and is quite small, and I really don't quite see the need for a default limit at all - when the previous version had no limit. One can also envision large XML, other markup documents, in addition to JSON (as with me) being carried inside strings, and 5 meg is not much. Should this not be a configurable constraint which default is non-constrained? Or at least have a much higher limit as default?

I assume this is meant as a security measure. However, this gives the possibly unintended consequence: Jackson serializes an object just fine, but then refuses to deserialize its own output. This is not a good situation.

I cannot be the only one being bitten by this - and it can come at rather inopportune times, where the data can slide through in staging, but then hit this .. rather arbitrary limit .. in prod. Cc @cowtowncoder

@cowtowncoder
Copy link
Member Author

@stolsvik We had long discussions on this, and were really hoping that more developers had tried out release candidates so we'd get feedback on possible issues.
And yes, this is specifically to address security concerns (although TBH, of 3 limits imposed, String values are probably the least important due to various reasons).

For what it is worth, the original setting was lower -- 1 MB -- but based on feedback by one of developers increased it to 5 MB (was considering 10 MB but his tests found maximum size below 4MB or so for production usage).
There is currently no limit for the total document size.

As to why introduce any limits: this is a hot topic in security community and I think that in general it would make sense to start with some limits, even if high -- many would consider 5 meg Strings rather large fwtw, but of course if your use cases require higher it's not high.
I do understand it is problematic in cases like ours where formerly there was no limit aside from memory usage.

Given that 2.15.0 release is out, I am not sure there is much point in trying to tune the defaults. Problem exists for those who do need to process humongous Strings.

But just in case: what would you think would be limit that works for your use cases? How big are biggest string values seen in production (if you have such data).

@stolsvik
Copy link

stolsvik commented May 2, 2023

@cowtowncoder Thanks.
I have a "this is crazy, something must be wrong!!" hard limit on 100M total.
That is on the serialization-side.

That is what I find the most problematic here: Jackson serializes an object just fine, and then you turn around, and it crashes on deserialization.

I do not agree to you being too late with adjusting that limit. Judging by mvnrepository: https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core, it seems like rather few have gotten around to update yet, and I bet it is much worse inside organizations. A quick 2.15.1 would most probably hinder many heavy in-prod gotchas!

PS: Wrt. to security concerns, does it not arguably make more sense to limit the total memory usage when deserializing? Have some kind of "global counter" for the current object? I mean, as a bad guy, I can possibly bypass this individual limit if I have access to a list or array of strings? Just include a ton of 4.99M strings?

@cowtowncoder
Copy link
Member Author

@stolsvik My point on "too late" is that there seem to be many devs not picking up the latest, but somehow get 2.15.0 and so on. But perhaps you are right.

As to limit, 100 MB is bit high but then again since this is NOT the limit that has been specifically requested (max number limit and nesting have specific vulns disclosed) it'd be doable.

As to total memory: things get complicated and I don't think keeping track of separate pieces is worth it. Conceptually I think combination of max-doc-length (not yet added; quite possible to be done outside Jackson until then) and longest-individual value are suitable building blocks to add service protection. So yeah, focus only on single values would be misplaced.
But I think I need to prioritize max-input-doc-length for 2.16 for completeness.

On serialization vs deserialization, I don't know -- I don't that as a strong argument mostly because there are so many tools & common case being one where library/framework reading is different from one doing writing. So Jackson applying same limits for both is probably not all that important. YMMV.
There is still an idea to add similar limits on output side, fwtw (StreamWriteConstraints), just not been as much a priority as input side.

Now: could file a new issue for raising String value max length for 2.15.1 to.. whatever you think more reasonable?
100 MB means 200 meg size chunk which is somewhat high, but maybe we could go with 20 megs?
I know it is arbitrary at any rate.

@darrenoc3
Copy link

We have been hit by this error too and it's quite frustrating to have to override the limit separately for every objectMapper we create, it should at least be possible to override the limit at a global level. Aside from that, the 5MB limit seems too low, as mentioned elsewhere Strings of this size do not pose much security risk compared to numbers so the limit could safely be set higher. A large config file can easily be over 10MB in the problem domain I'm working in.

@cowtowncoder
Copy link
Member Author

@darrenoc3 Just to make sure: when you say config files, do you have config files with individual String values that big? I can see that Document limit, if imposed, should be much bigger, but individual String value of 10 MB+ would seem unusual.

Having said that, I am still looking for suggestions on default limit to set. As I mentioned, 5 megs was based on feedback. Hence my suggestions of going with 20 megs (characters).

@darrenoc3
Copy link

@darrenoc3 Just to make sure: when you say config files, do you have config files with individual String values that big? I can see that Document limit, if imposed, should be much bigger, but individual String value of 10 MB+ would seem unusual.

Having said that, I am still looking for suggestions on default limit to set. As I mentioned, 5 megs was based on feedback. Hence my suggestions of going with 20 megs (characters).

The config files are not JSON themselves (legacy format), but they still get marshalled in and out of JSON as HTTP requests/responses. So in essence the string limit becomes the document limit for this kind of use case.

@cowtowncoder
Copy link
Member Author

cowtowncoder commented May 5, 2023

Created #1014 for possible increase of the default limit for 2.15.1; please comment there.

/cc @stolsvik

@cowtowncoder
Copy link
Member Author

@darrenoc3 Just to make sure: when you say config files, do you have config files with individual String values that big? I can see that Document limit, if imposed, should be much bigger, but individual String value of 10 MB+ would seem unusual.
Having said that, I am still looking for suggestions on default limit to set. As I mentioned, 5 megs was based on feedback. Hence my suggestions of going with 20 megs (characters).

The config files are not JSON themselves (legacy format), but they still get marshalled in and out of JSON as HTTP requests/responses. So in essence the string limit becomes the document limit for this kind of use case.

Ah. Sort of like config files embedded in K8s YAML specs (or XML in JSON or legacy format) etc.

As per my other comment, I am considering raising the default for ergonomic reasons. But would like to find a sweet spot fully knowing that no limit will work perfectly.

@davidmoten
Copy link

davidmoten commented Sep 8, 2023

I'd prefer to see the default stream limit being unlimited. Nice to be able to set it but should be the user's choice.

My argument for this is that this change will break a lot of systems at runtime in production, including the Australian Government Search and Rescue system that I work on. I was lucky to notice it in a lower environment so we will be ok but this will hurt a lot of production systems out there.

Note that I am happy to see that it was well described in the 2.15 release notes, that was on me to miss it as I review these things normally.

@cowtowncoder
Copy link
Member Author

I'd prefer to see the default stream limit being unlimited. Nice to be able to set it but should be the user's choice.

And it is unlimited, as you wish! 2.15 does not limit maximum content length at all. Limits are only for JSON values of type Text/String (this issue) and Numbers.

2.16 will add this via #1046 but will default to "unlimited".

@davidmoten
Copy link

And it is unlimited, as you wish! 2.15 does not limit maximum content length at all. Limits are only for JSON values of type Text/String (this issue) and Numbers.

Ah I needed be more precise. I was referring to JSON text values. We encounter them in JSONified emails from O365. Pretty easy to encounter >5MB fields in that context.

@cowtowncoder
Copy link
Member Author

@davidmoten As per #1014 the default was raised to 20 megs in 2.15.1 where it is planned to stay. For what that's worth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Projects
None yet
Development

No branches or pull requests

6 participants