Fix CSV parsing as per issue 1675

michaelhkay · Jan 4, 2025 · 6dba8f3 · 6dba8f3
1 parent 4eaeba1
commit 6dba8f3
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 32 deletions.
diff --git a/specifications/xpath-functions-40/src/function-catalog.xml b/specifications/xpath-functions-40/src/function-catalog.xml
@@ -226,7 +226,7 @@
 
                   <ulist>
                      <item><p>With <code>"header":false()</code> (which is the default),
-                     then the value is an empty sequence.</p></item>
+                     the value is an empty sequence.</p></item>
                      <item><p>With <code>"header":true()</code>, the value is a sequence
                      of strings taken from the first row of the data. The strings have
                      leading and trailing whitespace trimmed, regardless of the value of the
@@ -294,10 +294,10 @@
                   option is <code>true</code>. If there are no data rows in the CSV, the
                   value will be an empty sequence.</p></fos:meaning>
       </fos:field>
-      <fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string?" required="true">
+      <fos:field name="get" type="function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string" required="true">
          <fos:meaning><p>A function providing ready access to a given field in a given
       row. The <code>get</code> function has signature:</p>
-                     <eg>function($row as xs:integer, $column as union(xs:string, xs:integer)) as xs:string?</eg>
+                     <eg>function($row as xs:positiveInteger, $column as (xs:positiveInteger | xs:string)) as xs:string</eg>
                      <p>The function takes two arguments: the first is an
                      integer giving the row number (1-based), the second
                      identifies a column either by its name or by its 1-based
@@ -307,7 +307,7 @@
                   the function call <code>$csv?get($R, $C)</code>, where <code>$C</code>
                   is an integer, returns the value of <code>$csv?rows[$R] => array:get($C, fn { "" })</code>,
                   and the function call <code>$csv?get($R, $K)</code>, where <code>$K</code>
-                  is a string, returns the value of <code>$csv?get($R, $csv?column-numbers($K))</code>.</p>
+                  is a string, returns the value of <code>$csv?get($R, $csv?column-index($K))</code>.</p>
 
                <p>The properties of the function are as follows:</p>
                   <glist>
@@ -321,7 +321,7 @@
                      </gitem>
                      <gitem>
                         <label>Signature</label>
-                        <def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string?</code></p></def>
+                        <def><p><code>(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string</code></p></def>
                      </gitem>
                      <gitem>
                         <label>Non-local variable bindings</label>
@@ -26019,26 +26019,10 @@ return json-to-xml($json, $options)]]></eg>
                   </fos:value>
                </fos:values>
             </fos:option>
-            <!--<fos:option key="normalize-newlines">
-               <fos:meaning>Determines whether CR and CRLF character sequences
-                  are treated as equivalent to NL characters.</fos:meaning>
-               <fos:type>xs:boolean</fos:type>
-               <fos:default>false</fos:default>
-               <fos:values>
-                  <fos:value value="false">No normalization takes place.
-                  </fos:value>
-                  <fos:value value="true">The character sequences CR (<char>U+000D</char>)
-                  and CRLF (<char>U+000D</char>, <char>U+000A</char>) are treated as equivalent to the
-                  character NL (<char>U+000A</char>), except when they appear within a quoted field. 
-                  The normalization is done prior to recognition of row delimiters, and happens
-                  whether or not NL is used as the row delimiter.
-                  </fos:value>
-               </fos:values>
-            </fos:option>-->
             <fos:option key="header">
                <fos:meaning>Determines whether the first row of the CSV should be treated as a list
                   of column names, or whether column names are being supplied by the caller. 
-                  The value must either be a single boolean, or a sequence of one or more strings.
+                  The value must either be a single boolean, or a sequence of zero or more strings.
                </fos:meaning>
                <fos:type>item()*</fos:type>
                <fos:default>false</fos:default>
@@ -26047,7 +26031,7 @@ return json-to-xml($json, $options)]]></eg>
                      first row of the CSV data.</fos:value>
                   <fos:value value="false">Column names are not available; all references
                      to columns are by ordinal position.</fos:value>
-                  <fos:value value="xs:string+">Supplies explicit names for the columns. The <var>N</var>th
+                  <fos:value value="xs:string*">Supplies explicit names for the columns. The <var>N</var>th
                      name in the list applies to the <var>N</var>th column after any filtering or rearrangement.
                      A zero-length string can be used when there is a column that requires no name.
                   </fos:value>
@@ -26395,7 +26379,7 @@ return (
                   the CSV string. An instance of
                   <code>xs:string</code> whose length is exactly one.
                   Defaults to a single newline character (<char>U+000A</char>).</fos:meaning>
-               <fos:type>xs:string+</fos:type>
+               <fos:type>xs:string</fos:type>
                <fos:default>char('\n')</fos:default>
             </fos:option>
             <fos:option key="quote-character">
@@ -26406,14 +26390,14 @@ return (
             </fos:option>
             <fos:option key="trim-whitespace">
                <fos:meaning>Determines whether leading and trailing whitespace
-                  is removed from the content of fields.</fos:meaning>
+                  is removed from the content of unquoted fields.</fos:meaning>
                <fos:type>xs:boolean</fos:type>
                <fos:default>false</fos:default>
                <fos:values>
-                  <fos:value value="false">Fields will be returned with any leading or trailing
+                  <fos:value value="false">Unquoted fields will be returned with any leading or trailing
                      whitespace intact.
                   </fos:value>
-                  <fos:value value="true">Fields will be returned with leading or trailing
+                  <fos:value value="true">Unquoted fields will be returned with leading or trailing
                      whitespace removed, and all other whitespace preserved.
                   </fos:value>
                </fos:values>
@@ -26659,7 +26643,9 @@ return document {
    }</csv> 
 }]]></eg>
 
-         <p>The namespace prefix used in the names of elements (or its absence) is 
+         <p>The elements in the returned XML are in the namespace
+            <code>http://www.w3.org/2005/xpath-functions</code>;
+            the namespace prefix that is used (or its absence) is 
             <termref def="implementation-dependent"/>.</p>
 
          <p>If the function is called twice with the same arguments, it is <termref

diff --git a/specifications/xpath-functions-40/src/xpath-functions.xml b/specifications/xpath-functions-40/src/xpath-functions.xml
@@ -7155,10 +7155,15 @@ Bob,2023-07-14,2.34
 </csv>
 ]]></eg>
 
-               <p>If column names were not extracted, then implementations <rfc2119>should
-                  not</rfc2119> include the <code><![CDATA[<header>]]></code> element, and
-                  <code><![CDATA[<field>]]></code> elements <rfc2119>should not</rfc2119> have
-                  the <code>column</code> attribute:</p>
+               <p>If no non-empty column names are available, then the <code>columns</code> 
+                  element and all <code>column</code> attributes are absent. 
+                  If non-empty column names are available for some columns but not for others,
+                  then (a) an empty <code>column</code> element is included
+                  within the <code>columns</code> element if and only if there is a subsequent
+                  column with a non-empty name, and (b) the <code>column</code> attribute 
+                  for the corresponding <code>field</code> elements is absent.</p>
+
+               <p>For example (when no column names are available):</p>
 
                <eg><![CDATA[
 <csv xmlns="http://www.w3.org/2005/xpath-functions">