From 2c3fe514e09ab088952bcc39b746a34e1d3f4645 Mon Sep 17 00:00:00 2001
From: Michael Kay Asks whether a collation URI is recognized by the implementation. Asks whether a collation URI is recognized by the implementation,
+ and whether it has required properties. The first argument is a candidate collation URI.
equality
indicates that the intended purpose of the collation
- URI is to compare strings for equality, for example in functions such as
-
sort
indicates that the intended purpose of the collation
- URI is to sort or compare different strings in a collating sequence, for example
- in functions such as
compare
indicates that the intended purpose of the collation
+ URI is to compare strings for equality or ordering, for example in functions such as
+
key
indicates that the intended purpose of the collation
+ URI is to obtain collation keys for strings using the
substring
indicates that the intended purpose of the collation
URI is to establish whether one string is a substring of another, for example
in functions such as
$collation
allows the collation to be chosen dynamically.
Note that xs:base64Binary
becomes an ordered type
in XPath 3.1, making binary collation keys possible.
The
A collation is a specification of the manner in which xs:string
or a type derived from xs:string
are
- compared (or, equivalently, sorted), the comparisons are inherently
- performed according to some collation (even if that collation is defined
- entirely on codepoint values). The
Collations can indicate that two different codepoints are, in fact, equal - for comparison purposes (e.g., “v” and “w” are considered equivalent in +
The
Collations can indicate that two different codepoints are to be considered equal + for comparison purposes (for example, “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a - linguistically appropriate manner, as defined by the collation.
-Some collations, especially those based on the
- Unicode Collation Algorithm (see
The
Some sources, for example
This specification defines some collation URIs that provide interoperable + sorting behavior across applications. Other collation URIs are defined only + partially (leaving some aspects implementation-defined). Implementations may + define further collation URIs, or may allow users or third parties to define them.
+ +The
Collations may or may not perform Unicode normalization on strings before comparing them.
-This specification assumes that collations are named and that the collation
- name may be provided as an argument to string functions. Functions that
- allow specification of a collation do so with an argument whose type is
- xs:string
but whose lexical form must conform to an
- xs:anyURI
.
- This specification also defines the manner in which a
- default collation is determined if the collation argument is not specified
- in calls of functions that use a collation but allow it to be omitted.
If the collation is specified using a relative URI reference, + +
This specification allows a collation
+ name to be provided as an argument to many string functions. Although
+ collations are defined to be URIs, they are supplied as instances of
+ xs:string
.
The XQuery/XPath static context supplies a default collation
+ for use when the collation argument is not specified.
+ (see
If the collation is specified using a relative URI reference,
it is resolved relative to an
Previous versions of this specification stated that it must
+ Previous versions of this specification stated that it must
be resolved against the
This specification does not define whether or not the collation URI is
dereferenced. The collation URI may be an abstract identifier, or it may
refer to an actual resource describing the collation. If it refers to a
@@ -2629,7 +2643,7 @@ string conversion of the number as obtained above, and the appropriate suff
One possible candidate is that the resource is a locale description
expressed using the Locale Data Markup Language: see
Functions such as XML allows elements to specify the xml:lang
attribute to
indicate the language associated with the content of such an element.
@@ -2660,6 +2669,27 @@ string conversion of the number as obtained above, and the appropriate suff
when a string is multilingual.
All collations support the ability to compare two strings to decide + whether they are equal, and if not, which one should sort first. This + must always define a total ordering, which implies that the comparison + is transitive.
+A collation may (or may not) support the ability to derive a
Furthermore, a collation may (or may not) support the ability to determine whether
+ one string is a substring of another under that collation. The use of collations
+ in substring matching is described in
The capabilities of a collation may be determined using the
+
While the Unicode codepoint collation does not produce results suitable for quality publishing of
printed indexes or directories, it is adequate for many purposes where a restricted alphabet
is used, such as sorting of vehicle registrations. The Unicode codepoint collation differs from the
+ default sort order used in programming languages that sort strings
+ based on UTF-16 code units, which may include surrogate pairs.