From be078f510ee3b0fbaf065e9065920afaff200d70 Mon Sep 17 00:00:00 2001 From: Michael Kay Date: Sat, 4 Jan 2025 18:31:30 +0000 Subject: [PATCH] Fix CSV parsing as per issue 1675 --- .../src/function-catalog.xml | 40 ++++++------------- .../src/xpath-functions.xml | 13 ++++-- 2 files changed, 22 insertions(+), 31 deletions(-) diff --git a/specifications/xpath-functions-40/src/function-catalog.xml b/specifications/xpath-functions-40/src/function-catalog.xml index 0d282d22a..c55026873 100644 --- a/specifications/xpath-functions-40/src/function-catalog.xml +++ b/specifications/xpath-functions-40/src/function-catalog.xml @@ -317,10 +317,10 @@ option is true. If there are no data rows in the CSV, the value will be an empty sequence.

- +

A function providing ready access to a given field in a given row. The get function has signature:

- function($row as xs:integer, $column as union(xs:string, xs:integer)) as xs:string? + function($row as xs:positiveInteger, $column as (xs:positiveInteger | xs:string)) as xs:string

The function takes two arguments: the first is an integer giving the row number (1-based), the second identifies a column either by its name or by its 1-based @@ -330,7 +330,7 @@ the function call $csv?get($R, $C), where $C is an integer, returns the value of $csv?rows[$R] => array:get($C, fn { "" }), and the function call $csv?get($R, $K), where $K - is a string, returns the value of $csv?get($R, $csv?column-numbers($K)).

+ is a string, returns the value of $csv?get($R, $csv?column-index($K)).

The properties of the function are as follows:

@@ -344,7 +344,7 @@ -

(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string?

+

(xs:positiveInteger, (xs:positiveInteger | xs:string)) => xs:string

@@ -26072,26 +26072,10 @@ return json-to-xml($json, $options)]]> - Determines whether the first row of the CSV should be treated as a list of column names, or whether column names are being supplied by the caller. - The value must either be a single boolean, or a sequence of one or more strings. + The value must either be a single boolean, or a sequence of zero or more strings. item()* false @@ -26100,7 +26084,7 @@ return json-to-xml($json, $options)]]> first row of the CSV data. Column names are not available; all references to columns are by ordinal position. - Supplies explicit names for the columns. The Nth + Supplies explicit names for the columns. The Nth name in the list applies to the Nth column after any filtering or rearrangement. A zero-length string can be used when there is a column that requires no name. @@ -26448,7 +26432,7 @@ return ( the CSV string. An instance of xs:string whose length is exactly one. Defaults to a single newline character (U+000A).
- xs:string+ + xs:string char('\n') @@ -26459,14 +26443,14 @@ return ( Determines whether leading and trailing whitespace - is removed from the content of fields. + is removed from the content of unquoted fields. xs:boolean false - Fields will be returned with any leading or trailing + Unquoted fields will be returned with any leading or trailing whitespace intact. - Fields will be returned with leading or trailing + Unquoted fields will be returned with leading or trailing whitespace removed, and all other whitespace preserved. @@ -26712,7 +26696,9 @@ return document { } }]]> -

The namespace prefix used in the names of elements (or its absence) is +

The elements in the returned XML are in the namespace + http://www.w3.org/2005/xpath-functions; + the namespace prefix that is used (or its absence) is .

If the function is called twice with the same arguments, it is ]]> -

If column names were not extracted, then implementations should - not include the ]]> element, and - ]]> elements should not have - the column attribute:

+

If no non-empty column names are available, then the columns + element and all column attributes are absent. + If non-empty column names are available for some columns but not for others, + then (a) an empty column element is included + within the columns element if and only if there is a subsequent + column with a non-empty name, and (b) the column attribute + for the corresponding field elements is absent.

+ +

For example (when no column names are available):