Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPI for delegating row expression optimizer #24144

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tdcmeehan
Copy link
Contributor

@tdcmeehan tdcmeehan commented Nov 25, 2024

Description

As part of RFC-0006, we need to support out of process expression evaluation. Add support for pluggable expression optimization and planner support to utilize the new SPI.

Motivation and Context

RFC-0006. See #24126 for larger changes that include the Presto sidecar as described in the RFC.

Impact

No impact by default as the old in-memory evaluation is the default.

Test Plan

Tests have been added.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

SPI Changes
* Add ``CoordinatorPlugin#getExpressionOptimizerFactories`` to customize expression evaluation in the Presto coordinator. :pr:`24144`

@tdcmeehan tdcmeehan requested a review from presto-oss November 25, 2024 18:56
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 25, 2024
@prestodb-ci prestodb-ci requested review from a team, sh-shamsan and pdabre12 and removed request for a team November 25, 2024 18:56
Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tdcmeehan.
Just some initial comments

@tdcmeehan tdcmeehan force-pushed the expr-spi-init branch 12 times, most recently from 17fd8e1 to 563611e Compare December 2, 2024 17:34
Copy link

github-actions bot commented Dec 2, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 0720f17...8c8d1f4.

No notifications.

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high-level, the changes make sense. I had a few questions about the configuration and how we might be able to simplify it to be more straightforward for users

@@ -385,6 +388,7 @@ public TestingPrestoServer(
eventListenerManager = ((TestingEventListenerManager) injector.getInstance(EventListenerManager.class));
clusterStateProvider = null;
planCheckerProviderManager = injector.getInstance(PlanCheckerProviderManager.class);
expressionManager.loadExpressionOptimizerFactory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like having an additional method to configure/load the expression optimizer is a bit of an anti-pattern. It creates the potential for bugs if someone is trying to properly create the expression manager by requiring this additional method to be called. I haven't dug much deeper, but is there a reason we can't perform the loading of the factory in the constructor of the expressionManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the reason for this convention is these loading methods might entail something expensive, or that could fail, and it's helpful to do such things outside of Guice's bootstrapping process. Such failures are often very verbose because Guice returns the error in the context of where it failed during the bootstrapping process, and I think it's cleaner to do it outside of the Guice dependency graph so that the failure is clear and isolated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to load it in a more lazy manner? Such as using guava's Suppliers.memoize instead?

Otherwise, if we don't want to defer loading configuration like that, then I think we should do some runtime assertion when ExpressionOptimizerManager.getExpressionOptimizer is called to verify that loadExpressionOptimizerFactory method has been called first.

}

@Override
public Object optimize(RowExpression rowExpression, Level level, ConnectorSession session, Function<VariableReferenceExpression, Object> variableResolver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of having the duplicated code in these methods. Consider refactoring to a private method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult as it stands because these two methods return different types. I raised #24287 which would make it easier to share code between the two methods.

@tdcmeehan tdcmeehan force-pushed the expr-spi-init branch 3 times, most recently from a7b6402 to 2d609d0 Compare December 19, 2024 15:51
@tdcmeehan tdcmeehan mentioned this pull request Dec 19, 2024
6 tasks
Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments,

@tdcmeehan tdcmeehan force-pushed the expr-spi-init branch 2 times, most recently from 97d0c3b to 8c8d1f4 Compare December 20, 2024 20:57
The runtime should consolidate to the `ExpressionOptimizerProvider` factory
so that it can be customized without significant refactoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants