Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple label #2082

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

rafsun42
Copy link
Member

@rafsun42 rafsun42 commented Aug 29, 2024

Add support for multiple labels in CREATE, MERGE and MATCH clause.

Important Notes:

  • Before merging, please rerun the workflow again. Any tests that were added after this branch was last updated will fail because this PR makes minor changes in how vertex and edge objects are output.

Update:

  • Base of this PR is changed to a new remote branch multiple_label_16, which is based off master.

It represents label expression of different type: empty, single or multiple.
Previously, label field was char* type.

The change affected the type cypher_node, cypher_relationship
and cypher_target_node. As well as, any places where these
types are used.
Supports queries like-
    MATCH (v:A|B|C) RETURN v
    MATCH ()-[e:A|B|C]->() RETURN v
Some examples of supported multiple label queries:
	CREATE (:a:b)
	MERGE  (:a:b)
	MATCH  (:a:b)
	MATCH  (:a|b)

See regress/sql/multiple_label.sql for more details on what kind
of queries are supported.

Change summary:
---------------
* A new column `allrelations` is added to ag_label catalog
* Change in creating AGE relations logic
* Change in MATCH's transformation logic (related to building parse
  namespace item)
The logic for building vertex objects is updated. Agtype vertex objects can be
built from either a single label (as a cstring) or multiple labels (as an
agtype array). The following functions are updated to reflect this-
agtype_typecast_vertex, agtype_in and _agtype_build_vertex. if
_agtype_build_vertex is called from SQL, its label argument must be explicitly
cast to avoid ambiguity in function overload.

The `_label_names` function is added to extract label names from a vertex ID
as a list of string. It is used as a helper function to build vertex objects.
A new cache called `allrelations` is also added. This is used by _label_names
to search for all labels that are related to a given relation.

Multiple helper functions are added to extract label infromation from an entity
ID. For example, entity's relation ID, relation name, label names. These are
used by CREATE, DELETE, MERGE, VLE and SET executors for building a vertex's
object or updating its relation.

All test files are updated to show the label field as an array in the output.
In all test SQLs, _agtype_build_vertex's label argument is explicity cast.
It updates the function filter_vertices_on_label_id().

Additional changes:
-------------------
 - Add internal function _label_ids
Cache issues fixed:
-------------------
  - Use of wrong data type for cache entry in label relation cache (pre-existing)
  - Use of wrong update function for catalog table (related to multiple label)

Other changes:
--------------
  - The function _label_name() is unsupported for vertices
Changes:
--------
 - Update create_label_expr_relations() to return RangeVar. It removes
   redundant call to label_expr_relname() in the code that also calls
   this function.

 - Use deconstruct_array() to convert ArrayType* to List*

 - Update test files after rebase
This fixes some compile-time errors that occur if
PostgreSQL is configured with the --with-llvm option.
Changes:
-------
  * Include missing header files
  * Update newly added tests
  * Other minor changes
Following PRs are reapplied: 1465, 1509, 1514, and 1518.
@github-actions github-actions bot added the override-stale To keep issues/PRs untouched from stale action label Aug 29, 2024
@rafsun42 rafsun42 changed the base branch from multiple_label_16 to master August 29, 2024 19:52
@rafsun42 rafsun42 changed the base branch from master to multiple_label_16 August 29, 2024 19:53
@rafsun42 rafsun42 changed the base branch from multiple_label_16 to master August 29, 2024 19:56
* Regression tests
* Comments
* Minor overlook during rebasing
@MuhammadTahaNaveed
Copy link
Member

@rafsun42 The second CREATE command has AB and C as labels but the relation for AB and _agr_AB_C was not created.

multi-label-PR-review=# CREATE EXTENSION age;
CREATE EXTENSION
multi-label-PR-review=# SET search_path=ag_catalog;
SET
multi-label-PR-review=# SELECT create_graph('graph');
NOTICE:  graph "graph" has been created
 create_graph 
--------------
 
(1 row)

multi-label-PR-review=# SELECT * FROM cypher('graph', $$CREATE (x:A:B:C)$$) as (a agtype);
 a 
---
(0 rows)

multi-label-PR-review=# \dt+ graph.*
                                            List of relations
 Schema |       Name       | Type  |   Owner    | Persistence | Access method |    Size    | Description 
--------+------------------+-------+------------+-------------+---------------+------------+-------------
 graph  | A                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | B                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | C                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _ag_label_edge   | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _ag_label_vertex | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _agr_ABC         | table | taha-linux | permanent   | heap          | 16 kB      | 
(6 rows)

multi-label-PR-review=# SELECT * FROM cypher('graph', $$CREATE (x:AB:C)$$) as (a agtype);
 a 
---
(0 rows)

multi-label-PR-review=# \dt+ graph.*
                                            List of relations
 Schema |       Name       | Type  |   Owner    | Persistence | Access method |    Size    | Description 
--------+------------------+-------+------------+-------------+---------------+------------+-------------
 graph  | A                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | B                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | C                | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _ag_label_edge   | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _ag_label_vertex | table | taha-linux | permanent   | heap          | 8192 bytes | 
 graph  | _agr_ABC         | table | taha-linux | permanent   | heap          | 16 kB      | 
(6 rows)

@MuhammadTahaNaveed
Copy link
Member

@rafsun42 The second CREATE command has AB and C as labels but the relation for AB and _agr_AB_C was not created.

With merge it caused an assertion failure

multi-label-PR-review=# CREATE EXTENSION age;
CREATE EXTENSION
multi-label-PR-review=# SET search_path=ag_catalog;
SET
multi-label-PR-review=# SELECT create_graph('graph');
NOTICE:  graph "graph" has been created
 create_graph 
--------------
 
(1 row)

multi-label-PR-review=# SELECT * FROM cypher('graph', $$CREATE (x:A:B:C)$$) as (a agtype);
 a 
---
(0 rows)

multi-label-PR-review=# SELECT * FROM cypher('graph', $$MERGE (x:AB:C)$$) as (a agtype);
TRAP: failed Assert("IsA(sss, SubqueryScanState)"), File: "src/backend/executor/cypher_merge.c", Line: 312, PID: 42605
postgres: taha-linux multi-label-PR-review [local] SELECT(ExceptionalCondition+0x6e)[0x56187d565e11]
../postgres-16/pgsql/lib/age.so(+0x17d16)[0x7fb668ebdd16]
../postgres-16/pgsql/lib/age.so(+0x17fc7)[0x7fb668ebdfc7]

@rafsun42
Copy link
Member Author

@MuhammadTahaNaveed Thanks for reporting this. The logic for generating intersection relation was shortsighted. It appends the individual label names together without any separator. So, both CREATE (:A:B:C) and CREATE (:AB:C) generates the same relation name _agr_ABC. The expectation is they should generate different relation name.

One solution can be using a special separator character and disallowing it in label names. For example, the % character (it is already disallowed I believe). So, the intersection relations will be _agr_A%B%C and _agr_AB%C respectively.

What's your thoughts?

@MuhammadTahaNaveed
Copy link
Member

MuhammadTahaNaveed commented Sep 11, 2024

@MuhammadTahaNaveed Thanks for reporting this. The logic for generating intersection relation was shortsighted. It appends the individual label names together without any separator. So, both CREATE (:A:B:C) and CREATE (:AB:C) generates the same relation name _agr_ABC. The expectation is they should generate different relation name.

One solution can be using a special separator character and disallowing it in label names. For example, the % character (it is already disallowed I believe). So, the intersection relations will be _agr_A%B%C and _agr_AB%C respectively.

What's your thoughts?

@rafsun42 Yeah, I agree that there should be a seperator. Maybe hyphen(-) will be more appropriate than (%)? Hyphen is also not allowed in label names afaik.

This will create a distinction in intersection relation names between the following
combinations: (MN:P) and (M:N:P). The credit for finding out this edge case goes
to Muhammad Taha Naveed.

Additionally, a parameter is added to the create_label function to skip checking
valid label names. This is useful when creating intersection relations (the name
can be invalid due to containing the separator symbol).

Co-authored-by: Muhammad Taha Naveed <mtaha@apache.org>
@rafsun42
Copy link
Member Author

@MuhammadTahaNaveed Updated the PR to fix the issues.

@jrgemignani
Copy link
Contributor

@rafsun42 The PR has conflicts as well as not running the 7 checks for some reason.

Copy link
Member

@MuhammadTahaNaveed MuhammadTahaNaveed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a single comment. Looks good to me otherwise.

ag_label = table_open(ag_label_relation_id(), AccessShareLock);

/* TODO: for now, do seq scan */
scan_desc = systable_beginscan(ag_label, InvalidOid, false, NULL, 0, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small todo here. Scan key is not setup.

@Mocuto Mocuto mentioned this pull request Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master override-stale To keep issues/PRs untouched from stale action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants