Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand functions while resolving dependencies #5744

Merged
merged 1 commit into from
Mar 1, 2022

Conversation

velioglu
Copy link
Contributor

@velioglu velioglu commented Feb 24, 2022

Expand functions while resolving dependencies.

In order to handle function dependencies with relations, we need to expand relation graph to include functions on relation nodes as there is no direct dependency between relation and functions. We follow a similar logic to sequences.

  • Add tests

@velioglu velioglu force-pushed the velioglu/function_expand branch 9 times, most recently from 1db9b25 to b5e1eb9 Compare February 24, 2022 14:28
@velioglu velioglu requested a review from onderkalaci February 24, 2022 14:29
@velioglu velioglu force-pushed the velioglu/function_expand branch from b5e1eb9 to 6535298 Compare February 24, 2022 14:35
@codecov
Copy link

codecov bot commented Feb 24, 2022

Codecov Report

Merging #5744 (04b92aa) into master (b825232) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #5744   +/-   ##
=======================================
  Coverage   92.56%   92.56%           
=======================================
  Files         223      223           
  Lines       47315    47363   +48     
=======================================
+ Hits        43797    43843   +46     
- Misses       3518     3520    +2     

*/
if (deprec->classid == AttrDefaultRelationId &&
deprec->objsubid == 0 &&
deprec->refobjsubid != 0 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the meaning of deprec->objsubid == 0 && deprec->refobjsubid != 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objsubid is nonzero only for table columns and zero for anything else. Since we are trying to find a dependency from the column of a table to function we've added such a check. We may omit deprec->objsubid == 0 part , but I'm more inclined to keep it to not to get any unintentional dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you maybe add this to code? Because this is something we'd forget :)

bool IsColumnDependency = deprec->refobjsubid != 0

@velioglu velioglu force-pushed the velioglu/function_expand branch from 6535298 to 9d5ea21 Compare February 25, 2022 10:24
/*
* Add columns with constraint check
*/
if (deprec->classid == ConstraintRelationId &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I check pg_catalog, I see that "table constraint", "default value", "rule", "index" can be possible when refobjsubid != 0 .

Do we need anything for the latter two?

select pg_identify_object_as_address(classid, objid, objsubid), pg_identify_object_as_address(refclassid, refobjid, refobjsubid) from pg_depend where refobjsubid != 0  ORDER BY objid DESC;

Copy link
Member

@onderkalaci onderkalaci Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could add statistics here?

-- create statistics based on a valid function
BEGIN;

	CREATE OR REPLACE FUNCTION non_sense_func_for_index()
	RETURNS int
	LANGUAGE plpgsql IMMUTABLE AS
	$$
	BEGIN
	    return 1;
	END; 
	$$;

	CREATE TABLE t3 (
	    a   int,
	    b int
	);

	-- build ndistinct statistics on the pair of expressions (per-expression
	-- statistics are built automatically)
	CREATE STATISTICS s3 ON (a > non_sense_func_for_index()) FROM t3;

	SELECT create_distributed_table('t3', 'a');
ERROR:  function public.non_sense_func_for_index() does not exist
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
CONTEXT:  while executing command on localhost:9700

Where we seem to have a dependency:

│ pg_identify_object_as_address │ ("statistics object","{public,s3}",{})                                                                │
│ pg_identify_object_as_address │ ("table column","{public,t3,a}",{})                                                                   │

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly GENERATED ALWAYS:

-- generated always
BEGIN;

	CREATE OR REPLACE FUNCTION non_sense_func_for_generated_always()
	RETURNS int
	LANGUAGE plpgsql IMMUTABLE AS
	$$
	BEGIN
	    return 1;
	END; 
	$$;

	CREATE TABLE people (
	id int,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / non_sense_func_for_generated_always()) STORED
);

SELECT create_distributed_table('people', 'id');
DETAIL:  on server onderkalaci@localhost:9701 connectionId: 8
ERROR:  function public.non_sense_func_for_generated_always() does not exist
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
CONTEXT:  while executing command on localhost:9700

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic for finding dependent functions has been updated and tests above has been added.

@velioglu velioglu force-pushed the velioglu/function_expand branch from 9d5ea21 to 5ac0d4c Compare February 25, 2022 10:43
Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe go over the CREATE TABLE syntax once more and check if we need any more objects that might depend on a function?

My notes are the following, which you might double check:

  1. Partition by a function such as use something other than lower
CREATE TABLE measurement (                                                               city_id         int not null,                                                                                                  logdate         date not null,                                                                                                 peaktemp        int,                                                                                                           unitsales       int                                                                                                        ) PARTITION BY RANGE (lower(peaktemp::text));
  1. function on primary key:
BEGIN;
	CREATE TABLE TABLE (
	column_1 data_type,
	column_2 data_type,
	… 
        PRIMARY KEY (column_1, column_2 * func())
);
  1. Does this work fine with Citus local tables and/or foreign tables?
  2. Exclude constraints (seems to work fine)
  3. Indexes depending on functions (seems to work fine, would be good to have a test)
  4. Can we have functions as part of foreign keys? (e.g., ALTER TABLE table ADD CONSTRAINT fkey FOREIGN KEY (measureid) REFERENCES colocated_partitioned_table(func(measureid));
  5. Can we have unique constraints with function? Probably works fine, good to manaully test:
CREATE TABLE products (
    product_no integer,
    name text,
    price numeric,
    UNIQUE (func(product_no))
);

/*
* Add columns with constraint check
*/
if (deprec->classid == ConstraintRelationId &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly GENERATED ALWAYS:

-- generated always
BEGIN;

	CREATE OR REPLACE FUNCTION non_sense_func_for_generated_always()
	RETURNS int
	LANGUAGE plpgsql IMMUTABLE AS
	$$
	BEGIN
	    return 1;
	END; 
	$$;

	CREATE TABLE people (
	id int,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / non_sense_func_for_generated_always()) STORED
);

SELECT create_distributed_table('people', 'id');
DETAIL:  on server onderkalaci@localhost:9701 connectionId: 8
ERROR:  function public.non_sense_func_for_generated_always() does not exist
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
CONTEXT:  while executing command on localhost:9700

@velioglu velioglu force-pushed the velioglu/function_expand branch 9 times, most recently from 63f44d5 to 201d6a8 Compare February 28, 2022 09:46
@velioglu velioglu force-pushed the velioglu/function_expand branch 9 times, most recently from 7bf3629 to 5ec5e6a Compare February 28, 2022 13:35
INSERT INTO table_1_for_rule VALUES(7,7);
CREATE TABLE table_2_for_rule(id int, col_1 int);
INSERT INTO table_2_for_rule VALUES(7,7);
CREATE RULE rule_1 AS ON UPDATE TO table_1_for_rule DO ALSO UPDATE table_2_for_rule SET col_1 = col_1 * func_for_rule();
Copy link
Member

@onderkalaci onderkalaci Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with MX the rule should also exists on the workers (which does not happen now). Not related to here, just noting which is already listed here: #4812

It is good that we are prepared from the function propagation front

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems rules show up as Normal dependency. Lets fix that and improve the test here:

SELECT objid, pg_identify_object_as_address(refclassid, refobjid, refobjsubid),deptype from pg_depend where objid = 17001;
┌───────┬───────────────────────────────────────────────────────┬─────────┐
│ objid │ pg_identify_object_as_address │ deptype │
├───────┼───────────────────────────────────────────────────────┼─────────┤
│ 17001 │ (table,"{public,table_1_for_rule}",{}) │ a │
│ 17001 │ ("table column","{public,table_2_for_rule,col_1}",{}) │ n │
│ 17001 │ (table,"{public,table_1_for_rule}",{}) │ n │
│ 17001 │ (function,"{public,func_for_rule}",{}) │ n │
└───────┴───────────────────────────────────────────────────────┴─────────┘
(4 rows)
Time: 2.108 ms

---------------------------------------------------------------------
(0 rows)

CREATE TABLE table_to_check_func_index_dep (id int, col_2 int);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly sharing for reference.

As far as I can tell, this test already works on the master branch via the text search configuration changes.

And, note that the dependency type is Normal.

 SELECT pg_identify_object_as_address(classid, objid, objsubid),deptype from pg_depend where refobjid = 'func_for_index_predicate'::regproc;
┌───────────────────────────────────────────────────────────────┬─────────┐
│                 pg_identify_object_as_address                 │ deptype │
├───────────────────────────────────────────────────────────────┼─────────┤
│ (index,"{public,table_to_check_func_index_dep_col_2_idx}",{}) │ n       │
└───────────────────────────────────────────────────────────────┴─────────┘
(1 row)

Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one more test idea, other than this, looks pretty good:

BEGIN;
 CREATE OR REPLACE FUNCTION func_in_transaction_def_with_seq(val bigint)
RETURNS int
LANGUAGE SQL AS
$$
SELECT 2 * val;
$$;
 CREATE OR REPLACE FUNCTION func_in_transaction_def_with_func(val bigint)
RETURNS int
LANGUAGE SQL AS
$$
SELECT func_in_transaction_def_with_seq(val);
$$;
 -- Function shouldn't be propagated within transaction
SELECT pg_identify_object_as_address(classid, objid, objsubid) from citus.pg_dist_object where objid = 'func_in_transaction_def_with_seq'::regproc::oid;
 CREATE SEQUENCE myseq;
 CREATE TABLE table_to_prop_func(id int, col_1 int default func_in_transaction_def_with_func(func_in_transaction_def_with_seq(nextval('myseq'))));
SELECT create_distributed_table('table_to_prop_func','id');
 -- Function should be marked as distributed after distributing the table that depends on it
SELECT pg_identify_object_as_address(classid, objid, objsubid) from citus.pg_dist_object where objid = 'func_in_transaction_def_with_seq'::regproc::oid;
COMMIT;

* function we've added deprec->refobjsubid != 0 check.
*/
if (deprec->refobjsubid != 0 &&
deprec->deptype == DEPENDENCY_AUTO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reference, we discussed that we can also follow DEPENDENCY_NORMAL here. DEPENDENCY_AUTO will follow all the things that are dependent via column. Still, there seems no harm following DEPENDENCY_NORMAL just in case there are some complex SQL that we don't know

INSERT INTO table_1_for_rule VALUES(7,7);
CREATE TABLE table_2_for_rule(id int, col_1 int);
INSERT INTO table_2_for_rule VALUES(7,7);
CREATE RULE rule_1 AS ON UPDATE TO table_1_for_rule DO ALSO UPDATE table_2_for_rule SET col_1 = col_1 * func_for_rule();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems rules show up as Normal dependency. Lets fix that and improve the test here:

SELECT objid, pg_identify_object_as_address(refclassid, refobjid, refobjsubid),deptype from pg_depend where objid = 17001;
┌───────┬───────────────────────────────────────────────────────┬─────────┐
│ objid │ pg_identify_object_as_address │ deptype │
├───────┼───────────────────────────────────────────────────────┼─────────┤
│ 17001 │ (table,"{public,table_1_for_rule}",{}) │ a │
│ 17001 │ ("table column","{public,table_2_for_rule,col_1}",{}) │ n │
│ 17001 │ (table,"{public,table_1_for_rule}",{}) │ n │
│ 17001 │ (function,"{public,func_for_rule}",{}) │ n │
└───────┴───────────────────────────────────────────────────────┴─────────┘
(4 rows)
Time: 2.108 ms

@velioglu velioglu force-pushed the velioglu/function_expand branch 7 times, most recently from 04b92aa to d484744 Compare March 1, 2022 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants