Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev normalize predicate #804

Merged
merged 16 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions modules/ROOT/pages/clauses/where.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,35 @@ The `name` and `age` for `Peter` are are returned because his name contains "ete
|===


[[match-string-is-normalized]]
=== Checking if a `STRING` `IS NORMALIZED`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"STRING" has mostly been written as "string" in other headers


The `IS NORMALIZED` operator (introduced in Neo4j 5.17) is used to check whether the given `STRING` is in the `NFC` Unicode normalization form:

.Query
[source, cypher]
----
MATCH (n:Person)
WHERE n.name IS NORMALIZED
RETURN n.name AS normalizedNames
----

The given `STRING` values contain only normalized Unicode characters, therefore all the matched `name` properties are returned.
For more information, see the section about the xref:syntax/operators.adoc#match-string-is-normalized[normalization operator].

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| normalizedNames
| 'Andy'
| 'Timothy'
| 'Peter'
2+|Rows: 1
|===

Note that the `IS NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NORMALIZED` returns `null`.

[[match-string-negation]]
=== String matching negation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,23 @@ RETURN normalize("string", NFC)
| Introduction of a xref::functions/string.adoc#functions-normalize[normalize()] function.
This function normalizes a `STRING` according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

a|
label:functionality[]
label:new[]

[source, cypher, role=noheader]
----
IS [NOT] [NFC \| NFD \| NFKC \| NFKD] NORMALIZED
----

[source, cypher, role=noheader]
----
RETURN "string" IS NORMALIZED
----

| Introduction of an xref::syntax/operators.adoc#match-string-is-normalized[IS NORMALIZED] operator.
The operator can be used to check if a `STRING` is normalized according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

|===

[[cypher-deprecations-additions-removals-5.16]]
Expand Down
1 change: 1 addition & 0 deletions modules/ROOT/pages/functions/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,7 @@ These functions are used to manipulate strings or to create a string representat
| `normalize(input :: STRING, normalForm = NFC :: [NFC, NFD, NFKC, NFKD]) :: STRING`
| Returns the given `STRING` normalized according to the specified normalization form. label:new[Introduced in 5.17]


JPryce-Aklundh marked this conversation as resolved.
Show resolved Hide resolved
1.1+| xref::functions/string.adoc#functions-replace[`replace()`]
| `replace(original :: STRING, search :: STRING, replace :: STRING) :: STRING`
| Returns a `STRING` in which all occurrences of a specified search `STRING` in the given `STRING` have been replaced by another (specified) replacement `STRING`.
Expand Down
4 changes: 4 additions & 0 deletions modules/ROOT/pages/functions/string.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,8 @@ RETURN normalize('\u212B') = '\u00C5' AS result

======

To check if a `STRING` is normalized, use the xref:syntax/operators.adoc#match-string-is-normalized[`IS NORMALIZED`] operator.


[[functions-normalize-with-normal-form]]
JPryce-Aklundh marked this conversation as resolved.
Show resolved Hide resolved
== normalize(), with specified normal form
Expand Down Expand Up @@ -319,6 +321,8 @@ RETURN normalize('\uFE64', NFKC) = '\u003C' AS result

======

To check if a `STRING` is normalized in a specific Unicode normal form, use the xref:syntax/operators.adoc#match-string-is-normalized-specified-normal-form[`IS NORMALIZED`] operator with a specified normalization form.

[[functions-replace]]
== replace()

Expand Down
101 changes: 100 additions & 1 deletion modules/ROOT/pages/syntax/operators.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This page contains an overview of the available Cypher operators.
| xref::syntax/operators.adoc#query-operators-comparison[Comparison operators] | `+=+`, `+<>+`, `+<+`, `+>+`, `+<=+`, `+>=+`, `IS NULL`, `IS NOT NULL`
| xref::syntax/operators.adoc#query-operators-comparison[String-specific comparison operators] | `STARTS WITH`, `ENDS WITH`, `CONTAINS`, `=~` (regex matching)
| xref::syntax/operators.adoc#query-operators-boolean[Boolean operators] | `AND`, `OR`, `XOR`, `NOT`
| xref::syntax/operators.adoc#query-operators-string[String operators] | `+` (string concatenation)
| xref::syntax/operators.adoc#query-operators-string[String operators] | `+` (string concatenation), `IS NORMALIZED`
| xref::syntax/operators.adoc#query-operators-temporal[Temporal operators] | `+` and `-` for operations between durations and temporal instants/durations, `*` and `/` for operations between durations and numbers
| xref::syntax/operators.adoc#query-operators-map[Map operators] | `.` for static value access by key, `[]` for dynamic value access by key
| xref::syntax/operators.adoc#query-operators-list[List operators] | `+` (list concatenation), `IN` to check existence of an element in a list, `[]` for accessing element(s) dynamically
Expand Down Expand Up @@ -543,6 +543,7 @@ RETURN number
The string operators comprise:

* concatenating strings: `+`
* checking if a string is normalized: `IS NORMALIZED`


[[syntax-concatenating-two-strings]]
Expand All @@ -563,6 +564,104 @@ RETURN 'neo' + '4j' AS result
|===


[[match-string-is-normalized]]
=== Checking if a `STRING` `IS NORMALIZED`

_This feature was introduced in Neo4j 5.17._

The `IS NORMALIZED` operator is used to check whether the given `STRING` is in the `NFC` Unicode normalization form:

gem-neo4j marked this conversation as resolved.
Show resolved Hide resolved
[NOTE]
====
Unicode normalization is a process that transforms different representations of the same string into a standardized form.
For more information, see the documentation for link:https://unicode.org/reports/tr15/#Norm_Forms[Unicode normalization forms].
====

.Query
[source, cypher]
----
RETURN "the \u212B char" IS NORMALIZED AS normalized
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| normalized
| false
2+|Rows: 1
|===

Because the given `STRING` contains a non-normalized Unicode character (`\u212B`), `false` is returned.

To normalize a `STRING`, use the xref:functions/string.adoc#functions-normalize[normalize()] function.

Note that the `IS NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NORMALIZED` returns `null`.

[[match-string-is-not-normalized]]
=== Checking if a `STRING` `IS NOT NORMALIZED`

_This feature was introduced in Neo4j 5.17._

The `IS NOT NORMALIZED` operator is used to check whether the given `STRING` is not in the `NFC` Unicode normalization form:

.Query
[source, cypher]
----
RETURN "the \u212B char" IS NOT NORMALIZED AS notNormalized
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| notNormalized
| true
2+|Rows: 1
|===

Because the given `STRING` contains a non-normalized Unicode character (`\u212B`), and is not normalized, `true` is returned.

To normalize a `STRING`, use the xref:functions/string.adoc#functions-normalize[normalize()] function.

Note that the `IS NOT NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NOT NORMALIZED` returns `null`.


[[match-string-is-normalized-specified-normal-form]]
==== Using `IS NORMALIZED` with a specified normalization type

It is possible to define which Unicode normalization type is used (the default is `NFC`).

The available normalization types are:

* `NFC`
* `NFD`
* `NFKC`
* `NFKD`

.Query
[source, cypher]
----
WITH "the \u00E4 char" as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
myString IS NFD NORMALIZED AS nfdNormalized
----

The given `STRING` contains the Unicode character: `\u00E4`, which is considered normalized in `NFC` form, but not in `NFD` form.

.Result
[role="queryresult",options="header,footer",cols="2*<m"]
|===
| nfcNormalized | nfdNormalized
| true | false
2+|Rows: 2
|===

It is also possible to specify the normalization form when using the negated normalization operator.
For example, `RETURN "string" IS NOT NFD NORMALIZED`.



[[query-operators-temporal]]
JPryce-Aklundh marked this conversation as resolved.
Show resolved Hide resolved
== Temporal operators

Expand Down