Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex stuff #529

Merged
merged 12 commits into from
Jul 11, 2023
Merged

Add regex stuff #529

merged 12 commits into from
Jul 11, 2023

Conversation

ketkarameya
Copy link
Contributor

@ketkarameya ketkarameya commented Jul 5, 2023

Implement the logic for matching code with regex.

Add tests for :

  • query as regex
  • filter as regex
  • propagate when hole is captured by regex
  • propagate built in rules when query is regex

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 5, 2023
ghstack-source-id: f3f20b58e8f3c0b8442e40cfa0f284b91410c7b1
Pull Request resolved: #529
[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: 828e81fead753bffffdf904767ed9673068f5e9e
Pull Request resolved: #529
[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: aaad598e2655a3ba76e46fc0dc969c0701fdaa5c
Pull Request resolved: #529
[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: 3f22a0a9c54a2a002591025d5c8b08e827f0e92c
Pull Request resolved: #529
[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: 1f8b03f76102b2b6b97797b0d1cdc2ff126cb366
Pull Request resolved: #529
[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: 6d0afcb97e8cd0580679d906f0147aafa7f9acb1
Pull Request resolved: #529
Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 7, 2023
ghstack-source-id: ff60d400e06ec0ec0fb0e0c357f513f750360963
Pull Request resolved: #529
}

impl Validator for CGPattern {
fn validate(&self) -> Result<(), String> {
if self.pattern().starts_with("rgx ") {
panic!("Regex not supported")
let mut _val = &self.pattern()[4..];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut _val = &self.pattern()[4..];
let mut _val = &self.extract_regex();


# The below three rules do a dummy type migration from List<Integer> to NewList

# Updates the import statement from `java.util.List` to `com.uber.NEwList`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Updates the import statement from `java.util.List` to `com.uber.NEwList`
# Updates the import statement from `java.util.List` to `com.uber.NewList`

replace_node = "m_name"
replace = "addToNewList"
holes = ["name"]
is_seed_rule = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every replace_node in this test suite is a constant, even though the surrounding context is matched by more complex regexps. For generality, we might want a regex that matches a more general pattern on the replaced node. How about converting OurMapOfX to HashMap<X> for various Xs (i.e. the code could have OurMapOfInteger, OurMapOfString, and OurMapOfOurMapOfLong)

Or is something like that not meant to be supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a brilliant idea ! We can absolutely do that!

Done ✅


// Will not get updated
List<String> b = getListStr();
Integer item = getItemStr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Integer item = getItemStr();
String itemStr = getItemStr();

Or something like that. I know this code doesn't get compiled, but it still should make sense types-wise :)

/// * `replace_node` - node to replace
///
/// # Returns
/// The range of the match in the source code and the corresponding mapping from tags to code snippets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Return seems to just be a vector of matches. Where is this mapping encoded?

pub(crate) fn get_all_matches_for_regex(
node: &Node, source_code: String, regex: &Regex, recursive: bool, replace_node: Option<String>,
) -> Vec<Match> {
// let code_snippet = node.utf8_text(source_code.as_bytes()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete commented out line

all_matches
}

// Creates an hashmap from the capture group(name) to the corresponding code snippet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Creates an hashmap from the capture group(name) to the corresponding code snippet.
// Creates a hashmap from the capture group (name) to the corresponding code snippet.

let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start()
&& node.end_byte() >= captures.get(0).unwrap().end();
if (recursive && range_matches_inside_node) || range_matches_node {
let group_by_tag = if let Some(ref rn) = replace_node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group by tag is either the matched code string (plus location info) corresponding to the match group for the replace node if present or a the matched code string matching the full regex match if not, correct? Why is the name group_by_tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! your understanding is correct. It represents the match corresponding to the replace node (if present) or the entire match.
renamed to - replace_node_match

range: Range::from_regex_match(mtch, source_code),
matches,
associated_comma: None,
associated_comments: Vec::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we populate associated comments later and does that work for regex matches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but still curious about this.

}

impl Validator for CGPattern {
fn validate(&self) -> Result<(), String> {
if self.pattern().starts_with("rgx ") {
panic!("Regex not supported")
let mut _val = &self.pattern()[4..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this logic here and in extract_regex() both? This probably should not be duplicated in case we ever change the identifier, and also should be something like len(REGEX_QUERY_PREFIX) and that prefix used elsewhere to match "rgx "

Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 7, 2023
ghstack-source-id: f18d07f89f9933171677aadfa5b5bb55f8bde4ae
Pull Request resolved: #529
Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 9, 2023
ghstack-source-id: d5311a1f54e83bbfe77612055a7e773cfd60be43
Pull Request resolved: #529
Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 10, 2023
ghstack-source-id: 1c70c5cc2e25f3d9d07bd6829d1d6ac8460c0e31
Pull Request resolved: #529
Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 10, 2023
ghstack-source-id: 257afebe517dc459c20aefa3624a830d3e583e55
Pull Request resolved: #529
@ketkarameya ketkarameya mentioned this pull request Jul 10, 2023
Copy link
Contributor

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of documentation/comment nits, but LGTM once those are addressed!


# Updates the import statement from `java.util.List` to `com.uber.NEwList`
# Updates the import statement from `java.util.List` to `com.uber.NewList`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Updates the import statement from `java.util.List` to `com.uber.NewList`
# Updates the import statement from `com.uber.OurListOfInteger` to `java.util.List`

is_seed_rule = false


# The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer>
# The below three rules do a dummy type migration from OurMapOf{T1}{T2} to HashMap<T1, T2>. For example, from OurMapOfStringInteger to HashMap<String, Integer>. This is to exercise non-constant regex matches for replace_node.

replace_node = "n"
replace = ""

# Adds Import to java.util.hashmap if absent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Adds Import to java.util.hashmap if absent
# Adds Import to java.util.HashMap if absent

range: Range::from_regex_match(mtch, source_code),
matches,
associated_comma: None,
associated_comments: Vec::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but still curious about this.

Copy link
Contributor Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah forgot to push my comments

range: Range::from_regex_match(mtch, source_code),
matches,
associated_comma: None,
associated_comments: Vec::new(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.

let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start()
&& node.end_byte() >= captures.get(0).unwrap().end();
if (recursive && range_matches_inside_node) || range_matches_node {
let group_by_tag = if let Some(ref rn) = replace_node {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! your understanding is correct. It represents the match corresponding to the replace node (if present) or the entire match.
renamed to - replace_node_match

replace_node = "m_name"
replace = "addToNewList"
holes = ["name"]
is_seed_rule = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a brilliant idea ! We can absolutely do that!

Done ✅

Implement the logic for matching code with regex. 

Add tests for : 
* query as regex 
* filter as regex 
* propagate when hole is captured by regex 
* propagate built in rules when query is regex 

-------




[ghstack-poisoned]
ketkarameya added a commit that referenced this pull request Jul 11, 2023
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6
Pull Request resolved: #529
@ketkarameya
Copy link
Contributor Author

Addressed comments 3ac21b0 and rebased

@ketkarameya ketkarameya merged commit 3ac21b0 into gh/ketkarameya/26/base Jul 11, 2023
ketkarameya added a commit that referenced this pull request Jul 11, 2023
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6
Pull Request resolved: #529
@ketkarameya ketkarameya deleted the gh/ketkarameya/26/head branch July 11, 2023 02:31
crystalninja5 added a commit to crystalninja5/solid-tribble that referenced this pull request Aug 11, 2024
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6
Pull Request resolved: uber/piranha#529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants