Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of "mixed content" with XML module #402

Closed
cowtowncoder opened this issue May 20, 2020 · 8 comments
Closed

Improve handling of "mixed content" with XML module #402

cowtowncoder opened this issue May 20, 2020 · 8 comments
Labels
mixed-content Issue related to XML mixed content

Comments

@cowtowncoder
Copy link
Member

This is a placeholder issue for improving how Jackson handles XML content like:

<root>
   <value>123</value>
some text
   <value>345</value>
</root>

and other cases where a nesting level has both element(s) and at least one non-blank text value (content other than whitespace).

Currently such content can not be supported at all, and it is difficult to really find a general way of mapping this into POJOs.

However, it should be possible to have some level of support, for example maybe allow reading into JsonNode in some form -- considering that such text content could perhaps be mapped into empty key (and assuming handling of repeated Object keys can be resolved).
They may be other approaches too; to be discussed here.

@cowtowncoder
Copy link
Member Author

cowtowncoder commented May 22, 2020

Looks like JsonNode only builds mixed content that comes right after start element, but not one(s) after end-element. Seems like token-stream level problem, now what JsonNode itself can express multiple values as array.

@cowtowncoder
Copy link
Member Author

Exposing of mixed content through XmlTokenStream (low-level) and FromXmlParser completed, which allowed fixing #405 so now mixed content is available via:

  1. JsonNode, with logical key of "" (empty String) -- and as per Make JsonNode implicitly create ArrayNodes for repeated XML Elements #403 value becomes ArrayNode if multiple mixed content found
  2. "Untyped" case (java.lang.Object) as well: Map key of "" (empty String), with either single String value or List<String> for multiple mixed content segments

Note that due to limitation in logic blank mixed content can not be included, so all-whitespace segments are not retained. This is unfortunately necessary to support use of indentation for non-mixed-content cases.

I will leave this issue open for possible future improvements regarding POJO (typed) use cases: these will not yet allow more than one @XmlText annotated segment.

alex-bel-apica pushed a commit to ApicaSystem/jackson-dataformat-xml that referenced this issue Sep 4, 2020
# Conflicts:
#	src/test/java/com/fasterxml/jackson/dataformat/xml/deser/JsonNodeBasicDeserTest.java
@Ace2Pic
Copy link

Ace2Pic commented Sep 16, 2020

Could you please detail how is it supposed to be used with POJOs?

I have this problem in 2.11.2, tried your solution with the 2.12.0-snapshot, created a similar case with a root element containing a value field with @JacksonXMLText annotation.
I can confirm that trying to deserialize the element does not throw error anymore, but I can only get the raw text, but not any inner element ("some text" in your example). Am I missing something?

@cowtowncoder
Copy link
Member Author

@Ace2Pic As things are, this would not help use with POJOs, unfortunately. It does help with JsonNode (which will retain duplicates) and, to a degree, "untyped" (read as Object) including case of binding into Map.

@Ace2Pic
Copy link

Ace2Pic commented Sep 17, 2020

@cowtowncoder Thank you for your answer. It helped!
Using a simple custom deserializer and with JsonNode I am now able to get the whole "mixed content".

@cowtowncoder
Copy link
Member Author

@Ace2Pic ah yes, that makes sense!

@nrodriguezmore
Copy link

@cowtowncoder Thank you for your answer. It helped!
Using a simple custom deserializer and with JsonNode I am now able to get the whole "mixed content".

can you share the custom deserializer you wrote? thanks!

@cowtowncoder
Copy link
Member Author

Closing this due to following improvements that will be in 2.12:

Further improvements would be useful too but require specific ideas, and will likely need to go in 2.13 or later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mixed-content Issue related to XML mixed content
Projects
None yet
Development

No branches or pull requests

3 participants