Skip to content

Commit

Permalink
Fix CSV parsing as per issue 1675
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelhkay committed Jan 4, 2025
1 parent 4eaeba1 commit 6dba8f3
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 32 deletions.
42 changes: 14 additions & 28 deletions specifications/xpath-functions-40/src/function-catalog.xml
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@

<ulist>
<item><p>With <code>"header":false()</code> (which is the default),
then the value is an empty sequence.</p></item>
the value is an empty sequence.</p></item>
<item><p>With <code>"header":true()</code>, the value is a sequence
of strings taken from the first row of the data. The strings have
leading and trailing whitespace trimmed, regardless of the value of the
Expand Down Expand Up @@ -294,10 +294,10 @@
option is <code>true</code>. If there are no data rows in the CSV, the
value will be an empty sequence.</p></fos:meaning>
</fos:field>
<fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string?" required="true">
<fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string" required="true">
<fos:meaning><p>A function providing ready access to a given field in a given
row. The <code>get</code> function has signature:</p>
<eg>function($row as xs:integer, $column as union(xs:string, xs:integer)) as xs:string?</eg>
<eg>function($row as xs:positiveInteger, $column as (xs:positiveInteger | xs:string)) as xs:string</eg>
<p>The function takes two arguments: the first is an
integer giving the row number (1-based), the second
identifies a column either by its name or by its 1-based
Expand All @@ -307,7 +307,7 @@
the function call <code>$csv?get($R, $C)</code>, where <code>$C</code>
is an integer, returns the value of <code>$csv?rows[$R] => array:get($C, fn { "" })</code>,
and the function call <code>$csv?get($R, $K)</code>, where <code>$K</code>
is a string, returns the value of <code>$csv?get($R, $csv?column-numbers($K))</code>.</p>
is a string, returns the value of <code>$csv?get($R, $csv?column-index($K))</code>.</p>

<p>The properties of the function are as follows:</p>
<glist>
Expand All @@ -321,7 +321,7 @@
</gitem>
<gitem>
<label>Signature</label>
<def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string?</code></p></def>
<def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string</code></p></def>
</gitem>
<gitem>
<label>Non-local variable bindings</label>
Expand Down Expand Up @@ -26019,26 +26019,10 @@ return json-to-xml($json, $options)]]></eg>
</fos:value>
</fos:values>
</fos:option>
<!--<fos:option key="normalize-newlines">
<fos:meaning>Determines whether CR and CRLF character sequences
are treated as equivalent to NL characters.</fos:meaning>
<fos:type>xs:boolean</fos:type>
<fos:default>false</fos:default>
<fos:values>
<fos:value value="false">No normalization takes place.
</fos:value>
<fos:value value="true">The character sequences CR (<char>U+000D</char>)
and CRLF (<char>U+000D</char>, <char>U+000A</char>) are treated as equivalent to the
character NL (<char>U+000A</char>), except when they appear within a quoted field.
The normalization is done prior to recognition of row delimiters, and happens
whether or not NL is used as the row delimiter.
</fos:value>
</fos:values>
</fos:option>-->
<fos:option key="header">
<fos:meaning>Determines whether the first row of the CSV should be treated as a list
of column names, or whether column names are being supplied by the caller.
The value must either be a single boolean, or a sequence of one or more strings.
The value must either be a single boolean, or a sequence of zero or more strings.
</fos:meaning>
<fos:type>item()*</fos:type>
<fos:default>false</fos:default>
Expand All @@ -26047,7 +26031,7 @@ return json-to-xml($json, $options)]]></eg>
first row of the CSV data.</fos:value>
<fos:value value="false">Column names are not available; all references
to columns are by ordinal position.</fos:value>
<fos:value value="xs:string+">Supplies explicit names for the columns. The <var>N</var>th
<fos:value value="xs:string*">Supplies explicit names for the columns. The <var>N</var>th
name in the list applies to the <var>N</var>th column after any filtering or rearrangement.
A zero-length string can be used when there is a column that requires no name.
</fos:value>
Expand Down Expand Up @@ -26395,7 +26379,7 @@ return (
the CSV string. An instance of
<code>xs:string</code> whose length is exactly one.
Defaults to a single newline character (<char>U+000A</char>).</fos:meaning>
<fos:type>xs:string+</fos:type>
<fos:type>xs:string</fos:type>
<fos:default>char('\n')</fos:default>
</fos:option>
<fos:option key="quote-character">
Expand All @@ -26406,14 +26390,14 @@ return (
</fos:option>
<fos:option key="trim-whitespace">
<fos:meaning>Determines whether leading and trailing whitespace
is removed from the content of fields.</fos:meaning>
is removed from the content of unquoted fields.</fos:meaning>
<fos:type>xs:boolean</fos:type>
<fos:default>false</fos:default>
<fos:values>
<fos:value value="false">Fields will be returned with any leading or trailing
<fos:value value="false">Unquoted fields will be returned with any leading or trailing
whitespace intact.
</fos:value>
<fos:value value="true">Fields will be returned with leading or trailing
<fos:value value="true">Unquoted fields will be returned with leading or trailing
whitespace removed, and all other whitespace preserved.
</fos:value>
</fos:values>
Expand Down Expand Up @@ -26659,7 +26643,9 @@ return document {
}</csv>
}]]></eg>

<p>The namespace prefix used in the names of elements (or its absence) is
<p>The elements in the returned XML are in the namespace
<code>http://www.w3.org/2005/xpath-functions</code>;
the namespace prefix that is used (or its absence) is
<termref def="implementation-dependent"/>.</p>

<p>If the function is called twice with the same arguments, it is <termref
Expand Down
13 changes: 9 additions & 4 deletions specifications/xpath-functions-40/src/xpath-functions.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7155,10 +7155,15 @@ Bob,2023-07-14,2.34
</csv>
]]></eg>

<p>If column names were not extracted, then implementations <rfc2119>should
not</rfc2119> include the <code><![CDATA[<header>]]></code> element, and
<code><![CDATA[<field>]]></code> elements <rfc2119>should not</rfc2119> have
the <code>column</code> attribute:</p>
<p>If no non-empty column names are available, then the <code>columns</code>
element and all <code>column</code> attributes are absent.
If non-empty column names are available for some columns but not for others,
then (a) an empty <code>column</code> element is included
within the <code>columns</code> element if and only if there is a subsequent
column with a non-empty name, and (b) the <code>column</code> attribute
for the corresponding <code>field</code> elements is absent.</p>

<p>For example (when no column names are available):</p>

<eg><![CDATA[
<csv xmlns="http://www.w3.org/2005/xpath-functions">
Expand Down

0 comments on commit 6dba8f3

Please sign in to comment.