diff --git a/specifications/xpath-functions-40/src/function-catalog.xml b/specifications/xpath-functions-40/src/function-catalog.xml index d84deb735..16fe3ee92 100644 --- a/specifications/xpath-functions-40/src/function-catalog.xml +++ b/specifications/xpath-functions-40/src/function-catalog.xml @@ -27534,6 +27534,8 @@ declare function some( any backlashes (\), replace them with forward slashes (/).

+

Strip off the fragment identifier and any query:

+

If the string matches ^(.*)#([^#]*)$, the string is the first match group and the fragment is the second match group. Otherwise, @@ -27546,6 +27548,8 @@ declare function some( the string is unchanged and the query is the empty sequence.

+

Attempt to identify the scheme:

+

If the string matches ^[a-zA-Z][:|].*$:

@@ -27573,42 +27577,54 @@ declare function some(
-

If the scheme is the empty sequence, the - unc-path option is true, and the string - matches ^//[^/].*$, then the scheme is file - and the filepath is the string. -

+

Now that the scheme, if there is one, has been identified, + determine if the URI is hierarchical:

+ +

If the scheme is known to be hierarchical, or known not to be hierarchical, then hierarchical is set accordingly. - Exactly which schemes are known to be hierarchical or - non-hierarchical is - implementation-defined. If the implementation does not know if a scheme is or is not hierarchical, the hierarchical setting depends on the string. If the string is the empty string, hierarchical is the empty sequence (i.e. not known), otherwise hierarchical is - true if string begins with / and false otherwise.

+ true if string begins with / and + false otherwise.

+
+
-

If the scheme is not known or is known to be file and - the string matches ^//*([a-zA-Z]:.*)$, - the authority is empty and the string is - the first match group. Otherwise, if the string - matches ^///*([^/]+)(/.*)?$ then the authority - is the first match group and the string is the second - match group. If the string does not match either - regular expression, the authority is the empty sequence - and the string is unchanged.

+

Then examine the remaining parts of the string.

-

If the string matches ^//*([a-zA-Z]:.*)$, + + +

If the scheme is the empty sequence, the + unc-path option is true, and the + string matches ^//[^/].*$, then the + scheme is file, the authority is + empty, and the filepath is the + string. +

+ + +

Otherwise:

+ + + +

If the scheme is not known or is known to be file + and the string matches ^//*([a-zA-Z]:.*)$, the authority is empty and the string is - the first match group. Otherwise, if the string - matches ^///*([^/]+)(/.*)?$ then the authority + the first match group.

+

Otherwise, if the string + matches ^///*([^/]+)?(/.*)?$, the authority is the first match group and the string is the second - match group. If the string does not match either + match group.

+

Finally, if the string does not match either regular expression, the authority is the empty sequence - and the string is unchanged.

+ and the string is unchanged.

+
+
+

If the authority matches ^(([^@]*)@)(.*)(:([^:]*))?$, @@ -27657,23 +27673,23 @@ declare function some(

Similar care must be taken to match the port because an IPv6/IPvFuture address may contain a colon.

- +

If the authority matches ^(([^@]*)@)?(\[[^\]]*\])(:([^:]*))?$, - then the port is match group 5, otherwise + then the port is match group 5.

-

If the authority matches +

Otherwise, if the authority matches ^(([^@]*)@)?([^:]+)(:([^:]*))?$, - then the port is match group 5, otherwise + then the port is match group 5.

-

the port is the empty sequence.

+

Otherwise, the port is the empty sequence.

-
+

If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same @@ -27697,20 +27713,8 @@ declare function some( separator and applying uri decoding on each token.

-

Applying uri decoding replaces all occurrences of - plus (+) with spaces and all occurrences of - %[a-fA-F0-9][a-fA-F0-9] with a single character with the - codepoint represented by the two digit hexadecimal number that - follows the % character. In other words, "A%42C" becomes - "ABC" If there are any occurrences of % followed - by up to two characters that are not hexadecimal digits, they are - replaced by the character sequence 0xef, 0xbf, 0xbd - (that is, 0xfffd, the Unicode replacement character, in UTF-8). - After replacing all of the percent-escaped characters, the character sequence is - interpreted as UTF-8 to get the string. In other words "A%XYC%Z%F0%9F%92%A9" becomes - "A�C�💩". If the character sequence is - not a valid sequence of UTF-8 characters, any invalid characters are replaced with the - 0xfffd.

+

Applying uri decoding is equivalent to + calling fn:decode-from-uri on the string.

The query separator is the value of the query-separator option. @@ -28292,20 +28296,26 @@ path with an explicit file: scheme.

The components are derived from the contents of the $parts map in the following way:

-

If the scheme key is present in the map, the URI begins - with the value of that key. A URI is considered to be non-hierarchical - if either the hierarchical key is present in the - $parts map with the value - false() or if the scheme is known to be non-hierarchical. - (In other words, schemes are hierarchical by default.)

- -

If the scheme is file and the unc-path - option is true, the scheme is delimited by a trailing :////, - otherwise, if the URI is non-hierarchical, the scheme is delimited by - a trailing :. For all other schemes, it is delimited by - a trailing ://. Exactly which schemes are known to be - non-hierarchical is - implementation-defined.

+

If the scheme key is present in the map, + the URI begins with the value of that key. A URI is considered to be + non-hierarchical if either the hierarchical key + is present in the $parts map with the value + false() or if the scheme is known to be + non-hierarchical. (In other words, schemes are hierarchical by + default.)

+ + +

If the scheme is + known to be non-hierarchical, it is delimited by a trailing + :.

+
+

Otherwise, if the scheme is file and the unc-path + option is true, the scheme is delimited by a trailing :////.

+
+

Otherwise, the scheme is delimited by + a trailing ://.

+
+

For simplicity of exposition, we take the userinfo, host, and @@ -28501,4 +28511,4 @@ path with an explicit file: scheme.

- \ No newline at end of file + diff --git a/specifications/xpath-functions-40/src/xpath-functions.xml b/specifications/xpath-functions-40/src/xpath-functions.xml index 0c9cf4e18..cd64751a9 100644 --- a/specifications/xpath-functions-40/src/xpath-functions.xml +++ b/specifications/xpath-functions-40/src/xpath-functions.xml @@ -3305,6 +3305,15 @@ It is recommended that implementers consult for inf URIs, to identify their structure, and construct URI strings from their structured representation.

+

Some URI schemes are hierarchical and some are non-hierarchical. + Implementations must treat the following schemes as non-hierarchical: + jar, mailto, news, tag, + tel, and urn. Whether additional schemes + are known to be non-hierarchical + implementation-defined. + If a scheme is not known to be non-hierarchical, it must be + treated as hierarchical.

+

The structured representation of a URI is described by the @@ -3312,8 +3321,6 @@ It is recommended that implementers consult for inf - -

The parts of this structure are:

@@ -3361,7 +3368,7 @@ It is recommended that implementers consult for inf - + @@ -3372,39 +3379,31 @@ It is recommended that implementers consult for inf
Parsed and unescaped path segments.
query-segmentsquery-parameters Parsed and unescaped query terms

The segmented forms of the path and query parameters provide - convenient access to commonly used information. They’re represented - in the map as arrays, instead of sequences, just for the convenience - of serializing the structure.

+ convenient access to commonly used information.

The path, if there is one, is tokenized on “/” characters and - each segment is unesaped. Consider the URI http://example.com/path/to/a%2fb. The path portion has to be returned as /path/to/a%2fb because + each segment is unescaped (as per the fn:decode-from-uri function). Consider the URI + http://example.com/path/to/a%2fb. + The path portion has to be returned as /path/to/a%2fb because decoding the %2f would change the nature of the path. - The unescaped form is easily accessible from the path-segments array:

- -[ - "", - "path", - "to", - "a/b" -] + The unescaped form is easily accessible from the path-segments list:

+ + ("", "path", "to", "a/b") +

Note that the presence or absence of a leading slash on the path will effect whether or not the array begins with an empty string.

-

The query parameters are similarly decoded. Consider the URI: +

The query parameters are decoded into a map. Consider the URI: http://example.com/path?a=1&b=2%264&a=3. - Here the decoded form in the query-segments gives quick access to - the parameter values:

- - [ - { "key": "a", - "value": "1" }, - { "key": "b", - "value": "2&4" }, - { "key": "a", - "value": "3" } -] -

Note that both keys and values are unescaped and that it’s an array - of maps because key values can be repeated, as seen for a + The decoded form in the query-parameters is the following map:

+ + { "a": ("1", "3"), + "b": "2&4", +} + +

Note that both keys and values are unescaped. If a key + is repeated in the query string, the map will contain a + sequence of values for that key, as seen for a in this example.