Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor path performance of JsonPathMatcher for Yaml ChangeKey and ChangeValue on large files #4650

Open
jrgleason opened this issue Nov 6, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@jrgleason
Copy link

What problem are you trying to solve?

I have a project with a bunch of YAML files. When I try to add a recipe that runs ChangeKey against a complex oldKeyPath it causes huge lags (in my case a dry run goes from about 6 mins to over 1 hour). To get around this I started creating a simple precondition that scans the whole file for a string (typically the key or another key that is more unique). I understand this has some pitfalls and also doesn't help is a 3rd party recipe uses it and doesn't upgrade so I was hoping we could come up with a better solution.

Describe the solution you'd like

Maybe a parameter to the recipe that allows us to pass a search string as well to run the regex before the matcher. It should be done before the matcher is run here...

Have you considered any alternatives or workarounds?

My work around, as I said, is to use a precondition to check the file for some text before running the matcher

Additional context

I found this issue through my company and I am working on trying to get approved to contribute. When this happens I should be able to provide an example. Till then I will try to recreate one on my own but in theory this should be true of any project with a lot of yaml files (maybe even just a lot of files).

Are you interested in contributing this feature to OpenRewrite?

Sure but need to finish clearance first

@jrgleason jrgleason added the enhancement New feature or request label Nov 6, 2024
@jrgleason
Copy link
Author

Problem is also true of ChangeValue

@jrgleason
Copy link
Author

Ok more info, it wasnt the amount of files but rather 1 file at 1.6mb. I would still recommend the optional regex param as a way to mitigate it as it tears through the file way faster.

@timtebeek
Copy link
Contributor

Thanks for the added details here @jrgleason ! Let us know if you're indeed cleared to contribute. Ideally we improve performance of that JsonPathMatcher.matches without any additional arguments, but I'm not sure if that has been profiled and tried before:

public boolean matches(Cursor cursor) {
Object cursorValue = new ReplaceAliasWithAnchorValueVisitor<Integer>().visit((Tree)cursor.getValue(), 0);
List<Object> cursorPath = cursor.getPathAsStream()
.map(cp -> {
if (cp instanceof Yaml) {
cp = new ReplaceAliasWithAnchorValueVisitor<Integer>().visit((Yaml) cp, 0);
}
return cp;
})
.collect(Collectors.toList());
return find(cursor).map(o -> {
if (o instanceof List) {
//noinspection unchecked
List<Object> l = (List<Object>) o;
return !disjoint(l, cursorPath) && l.contains(cursorValue);
} else {
return Objects.equals(o, cursorValue);
}
}).orElse(false);
}

@timtebeek timtebeek moved this to Backlog in OpenRewrite Nov 11, 2024
@timtebeek timtebeek changed the title Poor path performance of ChangeKey needs the ability to pass a variable to allow for file filtering before Matcher Poor path performance of JsonPathMatcher for Yaml ChangeKey and ChangeValue on large files Nov 11, 2024
@timtebeek
Copy link
Contributor

For context: did your large yaml file perhaps contain anchors? I see those are expanded, so they can also add to the complexity and run duration.

@knutwannheden
Copy link
Contributor

The support to resolve references to YAML anchors was added way back in #2272. I think this will definitely cause performance issues (for multiple reasons). I am thinking we should maybe just remove support for YAML references for now, because I get the impression that YAML references aren't correctly implemented anyway. Alternatively, we could look into fixing the performance issues it is causing.

knutwannheden added a commit that referenced this issue Nov 17, 2024
Only apply `ReplaceAliasWithAnchorValueVisitor` to `Yaml.Document` elements.

See: #4650
@knutwannheden
Copy link
Contributor

knutwannheden commented Nov 17, 2024

@jrgleason I've improved the performance of the JsonPathMatcher in the following commits: 0f76203, f71dc0d, ae90839, 0f8e755, f1f48c0, and f30cd55. These changes should make the matching quite a bit faster. Are you able to check if this improves things for you? There is still more that can be done.

MBoegers pushed a commit to MBoegers/rewrite that referenced this issue Dec 18, 2024
Only apply `ReplaceAliasWithAnchorValueVisitor` to `Yaml.Document` elements.

See: openrewrite#4650
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants