From 94befcbdb929e956c9400f798e5258934d3c2ae4 Mon Sep 17 00:00:00 2001
From: Norman Walsh
Applying
Applying +
) with spaces and all occurrences of
%[a-fA-F0-9][a-fA-F0-9]
with a single character with the
codepoint represented by the two digit hexadecimal number that
@@ -27696,6 +27696,9 @@ declare function some(
not a valid sequence of UTF-8 characters, any invalid characters are replaced with the
0xfffd
.
Applying fn:decode-from-uri
on the string.
The query-separator
option.
A
If the true
if /
and false
otherwise.
If file
and^//*([a-zA-Z]:.*)$
,
+
If the scheme is not known or is known to be file
:
If the ^//*([a-zA-Z]:.*)$
,
the ^///*([^/]+)(/.*)?$
then the
Otherwise, if the ^///*([^/]+)(/.*)?$
, the
If the
If the ^//*([a-zA-Z]:.*)$
,
the file:
scheme.
The components are derived from the contents of the $parts
map in the following way:
If the scheme
key is present in the map, the URI begins
- with the value of that key. A URI is considered to be non-hierarchical
- if either the hierarchical
key is present in the
- $parts
map with the value
- false()
or if the scheme is known to be non-hierarchical.
- (In other words, schemes are hierarchical by default.)
If the scheme
is file
and the unc-path
- option is true
, the scheme is delimited by a trailing :////
,
- otherwise, if the URI is non-hierarchical, the scheme is delimited by
- a trailing :
. For all other schemes, it is delimited by
- a trailing ://
. Exactly which schemes are known to be
- non-hierarchical is
-
If the scheme
key is present in the map,
+ the URI begins with the value of that key. A URI is considered to be
+ non-hierarchical if either the hierarchical
key
+ is present in the $parts
map with the value
+ false()
or if the scheme is known to be
+ non-hierarchical. (In other words, schemes are hierarchical by
+ default.)
If the scheme
is
+ known to be non-hierarchical, it is delimited by a trailing
+ :
.
If the scheme
is file
and the unc-path
+ option is true
, the scheme is delimited by a trailing :////
.
Otherwise, the scheme is delimited by
+ a trailing ://
.
For simplicity of exposition, we take the
userinfo
, host
, and
diff --git a/specifications/xpath-functions-40/src/xpath-functions.xml b/specifications/xpath-functions-40/src/xpath-functions.xml
index 55c38cc76..91b0436e8 100644
--- a/specifications/xpath-functions-40/src/xpath-functions.xml
+++ b/specifications/xpath-functions-40/src/xpath-functions.xml
@@ -3305,6 +3305,15 @@ It is recommended that implementers consult
Some URI schemes are hierarchical and some are non-hierarchical.
+ Implementations must treat the following schemes as non-hierarchical:
+ jar
, mailto
, news
, tag
,
+ tel
, and urn
. Whether additional schemes
+ are known to be non-hierarchical
+
The structured representation of a URI is described by the
@@ -3312,8 +3321,6 @@ It is recommended that implementers consult
The parts of this structure are:
Parsed and unescaped path segments. | ||
query-segments | +query-parameters | Parsed and unescaped query terms |
The segmented forms of the path and query parameters provide - convenient access to commonly used information. They’re represented - in the map as arrays, instead of sequences, just for the convenience - of serializing the structure.
+ convenient access to commonly used information.The path, if there is one, is tokenized on “/” characters and
- each segment is unesaped. Consider the URI http://example.com/path/to/a%2fb
. The path portion has to be returned as /path/to/a%2fb
because
+ each segment is unescaped (as per the fn:decode-from-uri
function). Consider the URI
+ http://example.com/path/to/a%2fb
.
+ The path portion has to be returned as /path/to/a%2fb
because
decoding the %2f
would change the nature of the path.
- The unescaped form is easily accessible from the path-segments array:
Note that the presence or absence of a leading slash on the path will effect whether or not the array begins with an empty string.
-The query parameters are similarly decoded. Consider the URI: +
The query parameters are decoded into a map. Consider the URI:
http://example.com/path?a=1&b=2%264&a=3
.
- Here the decoded form in the query-segments gives quick access to
- the parameter values:
Note that both keys and values are unescaped and that it’s an array
- of maps because key values can be repeated, as seen for a
+ The decoded form in the query-parameters is the following map:
Note that both keys and values are unescaped. If a key
+ is repeated in the query string, the map will contain a
+ sequence of values for that key, as seen for a
in this example.
If the ^//*([a-zA-Z]:.*)$
,
- the ^///*([^/]+)(/.*)?$
then the
If the ^(([^@]*)@)(.*)(:([^:]*))?$
,
then the file:
scheme.
:
.
- If the Otherwise, if the Otherwise, the scheme is delimited by
From 5112df3a88a9fbaa18f06ebbd0e829f749586819 Mon Sep 17 00:00:00 2001
From: Norman Walsh Strip off the fragment identifier and any query: If the Attempt to identify the scheme: If the If the Now that the scheme, if there is one, has been identified,
+ determine if the URI is hierarchical: If the scheme
is file
and the unc-path
+ scheme
is file
and the unc-path
option is true
, the scheme is delimited by a trailing :////
.\
), replace them with forward
slashes (/
).^(.*)#([^#]*)$
,
the ^[a-zA-Z][:|].*$
:unc-path
option is true
, and the ^//[^/].*$
, then the scheme is file
- and the true
if /
and false
otherwise.true
if /
and
+ false
otherwise.
If the scheme is not known or is known to be file
:
Then examine the remaining parts of the string.
+ +If the unc-path
option is true
, and the
+ ^//[^/].*$
, then the
+ scheme is file
, the
Otherwise:
If the If the scheme is not known or is known to be Otherwise, if the If the Finally, if the ^//*([a-zA-Z]:.*)$
,
+ file
+ and the ^//*([a-zA-Z]:.*)$
,
the ^///*([^/]+)(/.*)?$
, the
If the
Similar care must be taken to match the port because an IPv6/IPvFuture address may contain a colon.
-If the ^(([^@]*)@)?(\[[^\]]*\])(:([^:]*))?$
,
- then the
If the
Otherwise, if the ^(([^@]*)@)?([^:]+)(:([^:]*))?$
,
- then the
the
Otherwise, the
If the omit-default-ports
option is true
, the port
is discarded and set to the empty sequence if the port number is the same
@@ -27678,22 +27697,7 @@ declare function some(
separator and applying
Applying +
) with spaces and all occurrences of
- %[a-fA-F0-9][a-fA-F0-9]
with a single character with the
- codepoint represented by the two digit hexadecimal number that
- follows the %
character. In other words, "A%42C"
becomes
- "ABC"
If there are any occurrences of %
followed
- by up to two characters that are not hexadecimal digits, they are
- replaced by the character sequence 0xef
, 0xbf
, 0xbd
- (that is, 0xfffd
, the Unicode replacement character, in UTF-8).
- After replacing all of the percent-escaped characters, the character sequence is
- interpreted as UTF-8 to get the string. In other words "A%XYC%Z%F0%9F%92%A9"
becomes
- "A�C�💩"
. 0xfffd
.
Applying
Applying fn:decode-from-uri
on the string.
The file:
scheme.
Otherwise, if the ^///*([^/]+)(/.*)?$
, the ^///*([^/]+)?(/.*)?$
, the
Finally, if the