-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex stuff #529
Add regex stuff #529
Conversation
[ghstack-poisoned]
ghstack-source-id: f3f20b58e8f3c0b8442e40cfa0f284b91410c7b1 Pull Request resolved: #529
[ghstack-poisoned]
ghstack-source-id: 828e81fead753bffffdf904767ed9673068f5e9e Pull Request resolved: #529
[ghstack-poisoned]
ghstack-source-id: aaad598e2655a3ba76e46fc0dc969c0701fdaa5c Pull Request resolved: #529
[ghstack-poisoned]
ghstack-source-id: 3f22a0a9c54a2a002591025d5c8b08e827f0e92c Pull Request resolved: #529
[ghstack-poisoned]
ghstack-source-id: 1f8b03f76102b2b6b97797b0d1cdc2ff126cb366 Pull Request resolved: #529
[ghstack-poisoned]
ghstack-source-id: 6d0afcb97e8cd0580679d906f0147aafa7f9acb1 Pull Request resolved: #529
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: ff60d400e06ec0ec0fb0e0c357f513f750360963 Pull Request resolved: #529
src/models/capture_group_patterns.rs
Outdated
} | ||
|
||
impl Validator for CGPattern { | ||
fn validate(&self) -> Result<(), String> { | ||
if self.pattern().starts_with("rgx ") { | ||
panic!("Regex not supported") | ||
let mut _val = &self.pattern()[4..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut _val = &self.pattern()[4..]; | |
let mut _val = &self.extract_regex(); |
|
||
# The below three rules do a dummy type migration from List<Integer> to NewList | ||
|
||
# Updates the import statement from `java.util.List` to `com.uber.NEwList` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Updates the import statement from `java.util.List` to `com.uber.NEwList` | |
# Updates the import statement from `java.util.List` to `com.uber.NewList` |
replace_node = "m_name" | ||
replace = "addToNewList" | ||
holes = ["name"] | ||
is_seed_rule = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every replace_node
in this test suite is a constant, even though the surrounding context is matched by more complex regexps. For generality, we might want a regex that matches a more general pattern on the replaced node. How about converting OurMapOfX
to HashMap<X>
for various X
s (i.e. the code could have OurMapOfInteger
, OurMapOfString
, and OurMapOfOurMapOfLong
)
Or is something like that not meant to be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a brilliant idea ! We can absolutely do that!
Done ✅
|
||
// Will not get updated | ||
List<String> b = getListStr(); | ||
Integer item = getItemStr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer item = getItemStr(); | |
String itemStr = getItemStr(); |
Or something like that. I know this code doesn't get compiled, but it still should make sense types-wise :)
src/utilities/regex_utilities.rs
Outdated
/// * `replace_node` - node to replace | ||
/// | ||
/// # Returns | ||
/// The range of the match in the source code and the corresponding mapping from tags to code snippets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Return seems to just be a vector of matches. Where is this mapping encoded?
src/utilities/regex_utilities.rs
Outdated
pub(crate) fn get_all_matches_for_regex( | ||
node: &Node, source_code: String, regex: &Regex, recursive: bool, replace_node: Option<String>, | ||
) -> Vec<Match> { | ||
// let code_snippet = node.utf8_text(source_code.as_bytes()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete commented out line
src/utilities/regex_utilities.rs
Outdated
all_matches | ||
} | ||
|
||
// Creates an hashmap from the capture group(name) to the corresponding code snippet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Creates an hashmap from the capture group(name) to the corresponding code snippet. | |
// Creates a hashmap from the capture group (name) to the corresponding code snippet. |
src/utilities/regex_utilities.rs
Outdated
let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start() | ||
&& node.end_byte() >= captures.get(0).unwrap().end(); | ||
if (recursive && range_matches_inside_node) || range_matches_node { | ||
let group_by_tag = if let Some(ref rn) = replace_node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group by tag is either the matched code string (plus location info) corresponding to the match group for the replace node if present or a the matched code string matching the full regex match if not, correct? Why is the name group_by_tag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! your understanding is correct. It represents the match
corresponding to the replace node (if present) or the entire match
.
renamed to - replace_node_match
range: Range::from_regex_match(mtch, source_code), | ||
matches, | ||
associated_comma: None, | ||
associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we populate associated comments later and does that work for regex matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still curious about this.
src/models/capture_group_patterns.rs
Outdated
} | ||
|
||
impl Validator for CGPattern { | ||
fn validate(&self) -> Result<(), String> { | ||
if self.pattern().starts_with("rgx ") { | ||
panic!("Regex not supported") | ||
let mut _val = &self.pattern()[4..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this logic here and in extract_regex()
both? This probably should not be duplicated in case we ever change the identifier, and also should be something like len(REGEX_QUERY_PREFIX)
and that prefix used elsewhere to match "rgx "
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: f18d07f89f9933171677aadfa5b5bb55f8bde4ae Pull Request resolved: #529
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: d5311a1f54e83bbfe77612055a7e773cfd60be43 Pull Request resolved: #529
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: 1c70c5cc2e25f3d9d07bd6829d1d6ac8460c0e31 Pull Request resolved: #529
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: 257afebe517dc459c20aefa3624a830d3e583e55 Pull Request resolved: #529
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of documentation/comment nits, but LGTM once those are addressed!
|
||
# Updates the import statement from `java.util.List` to `com.uber.NEwList` | ||
# Updates the import statement from `java.util.List` to `com.uber.NewList` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Updates the import statement from `java.util.List` to `com.uber.NewList` | |
# Updates the import statement from `com.uber.OurListOfInteger` to `java.util.List` |
is_seed_rule = false | ||
|
||
|
||
# The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer> | |
# The below three rules do a dummy type migration from OurMapOf{T1}{T2} to HashMap<T1, T2>. For example, from OurMapOfStringInteger to HashMap<String, Integer>. This is to exercise non-constant regex matches for replace_node. |
replace_node = "n" | ||
replace = "" | ||
|
||
# Adds Import to java.util.hashmap if absent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Adds Import to java.util.hashmap if absent | |
# Adds Import to java.util.HashMap if absent |
range: Range::from_regex_match(mtch, source_code), | ||
matches, | ||
associated_comma: None, | ||
associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still curious about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah forgot to push my comments
range: Range::from_regex_match(mtch, source_code), | ||
matches, | ||
associated_comma: None, | ||
associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.
src/utilities/regex_utilities.rs
Outdated
let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start() | ||
&& node.end_byte() >= captures.get(0).unwrap().end(); | ||
if (recursive && range_matches_inside_node) || range_matches_node { | ||
let group_by_tag = if let Some(ref rn) = replace_node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! your understanding is correct. It represents the match
corresponding to the replace node (if present) or the entire match
.
renamed to - replace_node_match
replace_node = "m_name" | ||
replace = "addToNewList" | ||
holes = ["name"] | ||
is_seed_rule = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a brilliant idea ! We can absolutely do that!
Done ✅
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6 Pull Request resolved: #529
Addressed comments 3ac21b0 and rebased |
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6 Pull Request resolved: #529
ghstack-source-id: 78529c182f9c258ff3cf09f10a72856d1857e6f6 Pull Request resolved: uber/piranha#529
Implement the logic for matching code with regex.
Add tests for :
Stack from ghstack (oldest at bottom):