Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie-based matcher for Alt(Equals.(a_long_list)...) #20

Open
oxinabox opened this issue Oct 28, 2016 · 2 comments
Open

Trie-based matcher for Alt(Equals.(a_long_list)...) #20

oxinabox opened this issue Oct 28, 2016 · 2 comments

Comments

@oxinabox
Copy link
Collaborator

Hi,
In the grammar I am writing,
there are a few places where I need to match against one of a large selection of constants.
Which I will call a_long_list, it might have 20 elements, it might have 200.
In theory, I'm sure there are use cases for matching against one of thousands, or tens of thousands.

Alt(Equals.(a_long_list)...) works.
But I understand that it will be O(n*m)
for n the length of the list, and m the maximum length of any element of that list
Which honestly isn't too bad, I think.

But I figure it can be done better.
If a Trie is used, I think this becomes just O(m).
I might be screwing up my math here, but I think that in the process of finding the longest match, one automatically finds all the shorter matches, which can be saved for if backoff is required.
So there is no need to re-step through the source, if one fails.

I've started working on this, I have the Trie stuff working to return what strings match, I just need to like it into a matcher, with the trampoline stuffs, Success/Fail/Execute.

What I currently propose is a matcher:

EqualsOneOf(values; greedy::Bool=true)

Where values are the values that it could be equal to.
If greedy is false then this matches shortest-first.
And if it has to back-off, then gives the second shortest string in values that matches,
etc.
This is the natural order for a Trie.

If greedy is true, then it matches longest-first (the longest string that is in values first), and then if that needs to be backedoff from, match's the second longest, and so forth.
The is accomplished by collecting, all the values that match as Trie keys,
and then reverseing the order. (Which i guess does make this O(n+m))

This is a bit less expressive than Alt(Equals.(a_long_list)...) since that lets you choose a priority for the matchers, not just longestfirst or shortestfirst

What do you think?
When I have it working, should I make a PR?

One issue is that right now, Tries only support strings.
JuliaCollections/DataStructures.jl#220

@andrewcooke
Copy link
Owner

hi. i think that makes sense. if tries can only support strings then the matcher would need to have a more restricted type, but i don't see why it wouldn't still work for strings (but honestly it's been ages since i last used this library, or even julia). if you submit a pr i will certainly try to include it (but can't promise a fast response).

@oxinabox
Copy link
Collaborator Author

oxinabox commented Dec 10, 2016

Finally came back to this yesterday, and finished my prototype.
It took me a long time to get my head around your code.
(Things you call iter I would call iter_state)

Prototype is at https://github.com/oxinabox/ColoringNames.jl/blob/master/proto/TrieMatchers.ipynb

I want to make it lazy in the case of shortest first matching.
But that means not using Task based generator expressions.

I probably won't poke this code again for another month or so.

Right now it is very tied to strings.
And even onces the requirement on Tries is relaxed, it is going to take some work to make if functional for things that are not index/view-able.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants