Skip to content

Commit

Permalink
Merge pull request #1068 from Arithmeticus/xqfo-graphemes
Browse files Browse the repository at this point in the history
73 fn:graphemes
  • Loading branch information
ndw authored May 14, 2024
2 parents 6568558 + b295d90 commit d35163f
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 1 deletion.
86 changes: 86 additions & 0 deletions specifications/xpath-functions-40/src/function-catalog.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28676,6 +28676,92 @@ return every($dl/*, fn($elem, $pos) {
<fos:version version="4.0">New in 4.0. Accepted 2022-09-20, subject to improving the description.</fos:version>
</fos:history>
</fos:function>

<fos:function name="graphemes" prefix="fn">
<fos:signatures>
<fos:proto name="graphemes" return-type="xs:string*">
<fos:arg name="value" type="xs:string?"/>
</fos:proto>
</fos:signatures>
<fos:properties arity="1">
<fos:property>deterministic</fos:property>
<fos:property>context-independent</fos:property>
<fos:property>focus-independent</fos:property>
</fos:properties>
<fos:summary>
<p>Splits the supplied string into a sequence of single-grapheme strings.</p>
</fos:summary>
<fos:rules>
<p>The function returns a sequence of strings. Each string in the sequence contains one or
more <termref def="character">character</termref>s that collectively constitute a single
extended grapheme cluster, as defined by <bibref ref="UNICODE-TR29"/>.</p>
<p>If <code>$value</code> is a zero-length string or the empty sequence, the function returns
the empty sequence.</p>

</fos:rules>
<fos:notes>
<p>The resultant sequence of strings are extended graphemes, not legacy graphemes (see
<bibref ref="UNICODE-TR29"/>).</p>
</fos:notes>
<fos:examples>
<fos:variable name="crlf" id="escaped-crlf-2"
><![CDATA[char('\r') || char('\n')]]></fos:variable>
<fos:example>
<fos:test>
<fos:expression>graphemes("a" || char(0x308) || "b")</fos:expression>
<fos:result>("a" || char(0x308), "b")</fos:result>
<fos:postamble>a + ◌̈ + b, three characters, two graphemes</fos:postamble>
</fos:test>
</fos:example>
<fos:example>
<fos:test>
<fos:expression>graphemes("")</fos:expression>
<fos:result>()</fos:result>
</fos:test>
</fos:example>
<fos:example>
<fos:test>
<fos:expression>graphemes(())</fos:expression>
<fos:result>()</fos:result>
</fos:test>
</fos:example>
<fos:example>
<fos:test use="escaped-crlf">
<fos:expression>graphemes($crlf)</fos:expression>
<fos:result>$crlf</fos:result>
<fos:postamble>Carriage return + line feed, two characters, one
grapheme</fos:postamble>
</fos:test>
</fos:example>
<fos:example>
<fos:test>
<fos:expression>graphemes(char(0x1F476) || char(0x200D)
|| char(0x1F6D1))</fos:expression>
<fos:result>char(0x1F476) || char(0x308) || char(0x1F6D1)</fos:result>
<fos:postamble>👶 +ZWJ + 🛑, three characters, one grapheme</fos:postamble>
</fos:test>
</fos:example>
<fos:example>
<fos:test>
<fos:expression>graphemes("कत")</fos:expression>
<fos:result>("क", "त")</fos:result>
<fos:postamble>क + त, two characters, two graphemes</fos:postamble>
</fos:test>
</fos:example>
<fos:example>
<fos:test>
<fos:expression>graphemes("क" || char(0x93C) || char(0x200D)
|| char(0x94D) || "त")</fos:expression>
<fos:result>"क" || char(0x93C) || char(0x200D)
|| char(0x94D) || "त"</fos:result>
<fos:postamble>क + ◌़ + ZWJ + ◌् + त, five characters, one grapheme</fos:postamble>
</fos:test>
</fos:example>
</fos:examples>
<fos:history>
<fos:version version="4.0">New in 4.0.</fos:version>
</fos:history>
</fos:function>



Expand Down
16 changes: 15 additions & 1 deletion specifications/xpath-functions-40/src/xpath-functions.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2687,6 +2687,9 @@ string conversion of the number as obtained above, and the appropriate <var>suff
<div3 id="func-characters" diff="add" at="A">
<head><?function fn:characters?></head>
</div3>
<div3 id="func-graphemes" diff="add" at="A">
<head><?function fn:graphemes?></head>
</div3>
<div3 id="func-concat">
<head><?function fn:concat?></head>
</div3>
Expand Down Expand Up @@ -10925,7 +10928,7 @@ Organization for Standardization, 2012. Available from: <loc href="http://www.is
<bibl id="fips180-4" key="FIPS 180-4">National Institute of Standards and Technology.
<emph>Secure Hash Standard (SHS)</emph>. FIPS PUB 180-4. August 2015.
See <loc href="http://dx.doi.org/10.6028/NIST.FIPS.180-4">http://dx.doi.org/10.6028/NIST.FIPS.180-4</loc>. </bibl>

<bibl id="UNICODE-TR15"
key="UAX #15"><emph>Unicode Standard Annex #15: Unicode Normalization
Forms</emph>.
Expand All @@ -10936,6 +10939,15 @@ Organization for Standardization, 2012. Available from: <loc href="http://www.is
<loc href="http://www.unicode.org/reports/tr15/">http://www.unicode.org/reports/tr15/</loc>.
</bibl>

<bibl id="UNICODE-TR29"
key="UAX #29"><emph>Unicode Standard Annex #29: Unicode Text Segmentation</emph>.
Ed. Josh Hadley, Unicode Consortium.
The current version is 15.1.0, dated 2023-08-16.
As with <bibref ref="Unicode"/>, the version to be used is <termref def="implementation-defined"/>.
Available at:
<loc href="http://www.unicode.org/reports/tr15/">http://www.unicode.org/reports/tr29/</loc>.
</bibl>

<bibl id="Unicode"
key="The Unicode Standard">
The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. <emph>The Unicode Standard</emph>
Expand Down Expand Up @@ -11800,6 +11812,8 @@ ISBN 0 521 77752 6.</bibl>
<item><p><code>fn:expanded-QName</code></p></item>
<item><p><code>fn:foot</code></p></item>
<item><p><code>fn:function-annotations</code></p></item>
<item><p><code>fn:graphemes</code></p></item>
<item><p><code>fn:hash</code></p></item>
<item><p><code>fn:highest</code></p></item>
<item><p><code>fn:identity</code></p></item>
<item><p><code>fn:in-scope-namespaces</code></p></item>
Expand Down

0 comments on commit d35163f

Please sign in to comment.