Skip to content

Commit

Permalink
Revised proposal for elements-to-maps function
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelhkay committed Nov 14, 2024
1 parent 70a862c commit 8f37928
Show file tree
Hide file tree
Showing 2 changed files with 1,173 additions and 179 deletions.
235 changes: 60 additions & 175 deletions specifications/xpath-functions-40/src/function-catalog.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25413,10 +25413,10 @@ return json-to-xml($json, $options)]]></eg>
</fos:function>


<fos:function name="element-to-map" prefix="fn">
<fos:function name="elements-to-maps" prefix="fn">
<fos:signatures>
<fos:proto name="element-to-map" return-type="xs:string">
<fos:arg name="input" type="item()*"/>
<fos:proto name="elements-to-maps" return-type="map(*)*">
<fos:arg name="elements" type="element()*"/>
<fos:arg name="options" type="map(*)" usage="inspection" default="map{}"/>
</fos:proto>
</fos:signatures>
Expand All @@ -25426,212 +25426,97 @@ return json-to-xml($json, $options)]]></eg>
<fos:property>focus-independent</fos:property>
</fos:properties>
<fos:summary>
<p>Creates a JSON representation of an arbitrary XDM value.</p>
<p>Converts a sequence of element nodes into maps that are suitable for
JSON serialization.</p>
</fos:summary>
<fos:rules>
<p>This function returns a string, in JSON format, containing a representation of the
supplied input <code>$input</code>.</p>
<p>The function can process any input sequence
whatsoever, but it is not lossless: there are cases when two different XDM values will
have the same JSON representation. For example, the sequence <code>(1, 2)</code>

and the array <code>[ 1, 2 ]</code> are both output as <code>[ 1, 2 ]</code>.</p>

<p>This function returns a sequence of maps corresponding one to one with
the element nodes supplied in <code>$elements</code>. Each map is in a form
that is suitable for JSON serialization, thus providing a mechanism for conversion
of arbitrary XML to JSON.</p>

<p>The entries that may appear in the <code>$options</code> map are as follows:</p>

<p>The entries that may appear in the <code>$options</code> map are as follows.
The <termref def="option-parameter-conventions">option parameter conventions</termref> apply.</p>

<fos:options>
<fos:option key="escape-solidus">
<fos:meaning>Determines whether, when escaping strings, the solidus (<code>/</code>)
should be escaped.</fos:meaning>
<fos:type>xs:boolean</fos:type>
<fos:default>true</fos:default>
<fos:values>
<fos:value value="false">
The solidus character is output as is.
</fos:value>
<fos:value value="true">
The solidus character is escaped with a backslash, that is, as <code>"\/"</code>.
</fos:value>
</fos:values>
</fos:option>
<fos:option key="indent">
<fos:meaning>Determines whether additional whitespace should be added to the output to improve readability.</fos:meaning>
<fos:type>xs:boolean</fos:type>
<fos:default>false</fos:default>
<fos:values>
<fos:value value="false">
The processor must not insert any insignificant whitespace between JSON tokens.
</fos:value>
<fos:value value="true">
The processor <rfc2119>may</rfc2119> insert whitespace between JSON tokens in order to improve readability.
The specification imposes no constraints on how this is done.
</fos:value>
</fos:values>
</fos:option>

<fos:option key="uniform">
<fos:meaning>Indicates that all elements with the same name (except where schema type information is available)
should use the same JSON layout. Setting this option requires the processor to analyze the entire input
<fos:meaning>Indicates that all elements with the same name, at any level in any of the
input trees should use the same conversion rules (known as a layout).
Setting this option requires the processor to analyze the entire input
before deciding what layout to use for each element; but by ensuring consistency across elements, it may
make the resulting JSON easier to process.</fos:meaning>
make the resulting maps easier to process.</fos:meaning>
<fos:type>xs:boolean</fos:type>
<fos:default>false</fos:default>
<fos:values>
<fos:value value="false">
The layout for each element node is decided independently, based on its individual content.
</fos:value>
<fos:value value="true">
In the absence of schema type information, and of an explicit entry in the <code>layouts</code>
In the absence of schema type information, and in the absence of an explicit entry in the <code>layouts</code>
property, the layout chosen for a given element node must be the same as that for all other
elements of the same name.
</fos:value>
</fos:values>
</fos:option>
<fos:option key="attribute-marker">
<fos:meaning>A string that is prefixed to any key value in the output that represents
an XDM attribute node in the input. The string may be empty. If, after applying the requested
prefix (or no prefix) there is a conflict between the names of attributes and child elements,
then the requested prefix (or lack thereof) is ignored and the default prefix <code>"@"</code>
is used.</fos:meaning>
<fos:type>xs:string</fos:type>
<fos:default>"@"</fos:default>
</fos:option>
<fos:option key="name-format">
<fos:meaning>Indicates how the names of element and attribute nodes are handled.</fos:meaning>
<fos:type>xs:string</fos:type>
<fos:default>"default"</fos:default>
<fos:values>
<fos:value value="lexical">Names are output as lexical QNames, in the same form as they
would appear if serialized using the XML output method. The result may
contain a namespace prefix: note that the output will not contain any information
enabling such prefixes to be resolved to a namespace URI.</fos:value>
<fos:value value="local">Namespace URIs in element and attribute names are discarded; only the local
names are output. If this leads to duplicate keys in a context where the names
must be unique, then the setting is ignored and <code>"eqname"</code> is used instead.</fos:value>
<fos:value value="eqname">Names in a namespace are output in the form <code>"Q{uri}local"</code>.
Names in no namespace are output using the local name alone.</fos:value>
<fos:value value="default">Element names in the default namespace of the top-level element node
(the node supplied in the <code>$elements</code> argument), and attribute names
in no namespace, are output using the local name alone.
All other names are output in the format <code>"Q{uri}local"</code>, or <code>Q{}local</code>
in the case of a no-namespace element name where this is not the default.</fos:value>
</fos:values>
</fos:option>
<fos:option key="layouts">
<fos:meaning>A mapping from element names to layout names, used to override the default
formatting rules for a particular element name.</fos:meaning>
<fos:type>map{xs:QName, enum("empty", "empty-plus", "simple", "simple-plus", "list", "list-plus",
"record", "sequence", "mixed", "xml", "html", "xhtml")}</fos:type>
<fos:type>map(xs:QName, enum("empty", "empty-plus", "simple", "simple-plus", "list", "list-plus",
"record", "sequence", "mixed", "xml", "html", "xhtml"))</fos:type>
<fos:default>map{}</fos:default>
</fos:option>
</fos:options>




<p>An input sequence is handled as follows:</p>
<ulist>
<item><p>An empty sequence is output as the JSON value null.</p></item>
<item><p>A singleton sequence is output following the rules for processing items, below.</p></item>
<item><p>A sequence of two or more items results in a JSON array, whose members are constructed
from the items by applying the rules below.</p></item>
</ulist>

<p>Much of the complexity is concerned with the representation of element nodes: the principles are described
<p>The principles for conversion from elements to maps are described
in specref ref="xml-to-json-mappings"/>.</p>


<p>In general, an element node maps to a key-value pair in which the key represents the element name, and the
corresponding value represents the attributes and children of the element. In the case of a top-level element
(a node directly supplied in <code>$nodes</code>), the result will be a singleton map containing this key-value
pair as its only entry. In the case of a descendant element, the key-value pair for a child element will be added
to the content representing its parent element, in a way that depends on the parent element's layout.</p>

<p>Items in a supplied sequence are processed as follows:</p>
<p>The representation of other kinds of node depends on the layout chosen for its parent element.</p>

<ulist>
<item>
<p><emph>atomic items</emph></p>
<ulist>
<item><p>An <code>xs:boolean</code> value is output as the JSON value <code>true</code> or <code>false</code>.</p></item>
<item><p>A numeric value, other than <code>INF</code>, <code>-INF</code>, or <code>NaN</code>,

is output as a JSON number. The format of the number follows the rules for casting the number to
a string.</p></item>
<item><p>Any other atomic value is cast to <code>xs:string</code>, and the result is output as a JSON string,

escaped as described below.</p></item>
</ulist>
</item>
<item>
<p><emph>Nodes</emph></p>
<p>A node is represented as a JSON object containing a single key-value pair.
For elements and attributes, the key
is derived from the element name as described in ???specref ref="element-names-in-json"/>;
for all other node kinds it represents the node kind (for example,
<code>#text</code> or <code>#comment</code>).</p>
<p>In the following rules, a node is considered to be <term>top-level</term> if it
is parentless, or if its parent is not itself part of the input being converted.
For example, if <code>$input</code> is a sequence of attribute nodes, then these
are considered to be top-level nodes, whether or not they have a parent element.</p>
<ulist>
<item>
<p><emph>Document nodes</emph></p>
<p>A document node is output as a JSON object comprising a single key-value pair, where the
key is the string <code>"#document"</code> and the value represents the content, using
the <code>mixed</code> layout described (for element nodes) in
???specref ref="mixed-json-layout"/>.</p>
<p>For example, an XML document containing a single empty element <code>&lt;doc/></code>
would be represented as <code>#document: [{"doc":""}]</code></p>
<p>A document node with no children is represented by an empty JSON array: <code>#document: []</code>.</p>
<note><p>The XDM model allows a a document node to have an arbitrary sequence of children, including text nodes,
comments, processing instructions, and multiple elements.</p></note>
</item>
<item>
<p><emph>Element nodes</emph></p>
<p>A top-level element node is output as a JSON object comprising a single key-value pair, where the
key is derived from the element name as described in ???specref ref="element-names-in-json"/>, and the content
follows the rules of a specific layout appropriate to that element. The available layouts
are described in ???specref ref="xml-to-json-mappings"/>. A layout is selected for each element node <var>E</var>
by applying the following rules in order:</p>
<olist>
<item><p>If the name of <var>E</var> is listed as the key of an entry in the <code>layouts</code>
option, then the corresponding named layout is used.</p></item>
<item><p>If <var>E</var> has a type annotation <var>T</var> other than <code>xs:untyped</code>
or <code>xs:anyType</code>, then the layout appropriate to its schema type
is used. Specifically, taking the layouts in the order presented in
???specref ref="json-element-layouts"/>, the first layout whose <term>XSD conditions</term>
are satisfied by <var>T</var> is chosen.</p></item>
<item><p>If the option <code>uniform</code> is set to false (which is its default value),
then the chosen layout is the first layout, in the order they are listed in
???specref ref="json-element-layouts"/>, whose <term>match pattern</term>
matches <var>E</var>.</p></item>
<item><p>If the option <code>uniform</code> is set to true, then let <var>S</var>
be the set of all element nodes in the input having the same name as <var>E</var>,
including element nodes that are not immediate items in the input sequence, but
are rather contained recursively within nodes, maps, or arrays in the input sequence.
The chosen layout is then the first layout, in the order they are listed in
???specref ref="json-element-layouts"/>, whose <term>match pattern</term> matches
every element node in <var>S</var>.</p></item>
</olist>
</item>
<item>
<p><emph>Text nodes</emph></p>
<p>A top-level text node is output as a JSON object comprising a single key-value pair.
The key is the string <code>"#text"</code>, and the value is the string value of the text node
as a JSON string.</p>
<p>Text nodes appearing as children of an element node are formatted according to the rules of
a specific element layout.</p>
</item>
<item>
<p><emph>Comment nodes</emph></p>
<p>A top-level comment node is output as a JSON object comprising a single key-value pair.
The key is the string <code>"#comment"</code>, and the value is the string value of the comment node
as a JSON string.</p>
<p>Comment nodes appearing as children of an element node are formatted according to the rules of
a specific element layout. If the layout is one that does not ignore comments, then
comments follow the above conventions.</p>
</item>
<item>
<p><emph>Processing instruction nodes</emph></p>
<p>A top-level processing-instruction node is output as a JSON object comprising a single key-value pair.
The key is the string <code>"#processing-instruction"</code>, and the value is a JSON object
with two string-valued properties: <code>"#name"</code> holding the name of the processing instruction,
and <code>"#data"</code> holding the data (in XDM terms, the string value of the node).</p>
<p>Processing instruction nodes appearing as children of an element node are formatted according to the rules of
a specific element layout. If the layout is one that does not ignore processing instructions, then
they are output following the above conventions.</p>
</item>
<item>
<p><emph>Attribute nodes</emph></p>
<p>Attribute nodes that are reached via an element node are output as described
under “element nodes”, above.</p>

<p>Free-standing attribute nodes are output as JSON objects with properties:</p>
<ulist>
<item><p><code>#attribute</code> set to the name of the attribute, as a local name
in the case of an attribute in no namespace, or in <code>Q{uri}local</code> format
otherwise.</p></item>
<item><p><code>#value</code> set to
the result of atomizing the attribute value and applying the
<code>fn:xdm-to-json</code> function to the result.</p></item>
</ulist>
</item>
<item><p><emph>Namespace nodes</emph></p>
<p>Namespace nodes that are reached via an element node result in no output.</p>
<p>A top-level namespace node is output as a JSON object comprising a single key-value pair.
The key is the string <code>"#namespace"</code>, and the value is a JSON object
with two string-valued properties: <code>"#prefix"</code> holding the name of the namespace node,
and <code>"#uri"</code> holding the namespace URI (in XDM terms, the string value of the namespace node).</p>
</item>
</ulist>
</item>
<item>

<!--<item>
<p><emph>Maps</emph></p>
<p>An XDM map is output as a JSON object with one property for each entry in the map.</p>
<p>The property name is derived from the key value by converting the value to a string
Expand Down Expand Up @@ -25686,7 +25571,7 @@ return json-to-xml($json, $options)]]></eg>


</item>
</ulist>
</ulist>-->
<p>Strings are escaped as follows:</p>
<olist>
<item>
Expand Down
Loading

0 comments on commit 8f37928

Please sign in to comment.