Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Motif discovery to subsequences starting at specific locations #49

Open
peterdhansen opened this issue Sep 16, 2020 · 12 comments
Assignees
Labels
feature New feature or request

Comments

@peterdhansen
Copy link

When I am analyzing data that has daily fluctuations, I create a annotation vector that is 1 at midnight of each day and 0 everywhere else. This helps prioritize subsequences that start at midnight so each set of motifs have the same 24 hour structure.

The issue is that applying an annotation vector does not prevent the motif algorithm from picking a motif pair where one starts at midnight and the other does not. A new mechanism would have to be defined to restrict these.

Also distance profiles that are calculated inside the motif algorithm do not apply the annotation vector. This could be added and triggered when use_cmp = True without any new mechanisms.

I can write a custom motif finding code that does this, but if others would like the functionality I'd be happy to contribute.

@vanbenschoten
Copy link
Contributor

@peterdhansen I think that'd be a great contribution! We're looking to grow out more of our utility functions that go beyond the core algorithms. Feel free to make a PR and we can collaborate.

@tylerwmarrs
Copy link
Contributor

I think it would be good to have this functionality. As you mention, it seems fairly trivial to implement. The "harder" thing to do is to write a blog post explaining when the approach is useful. Are you interested in adding the code, unit tests, and a blog post? @peterdhansen

@peterdhansen
Copy link
Author

Sounds good. I'll give it a shot.
I should be able to contribute a blog post too. 😄

@peterdhansen
Copy link
Author

So, I just had another idea.
Would it be possible/useful to restrict the MP calculation to only consider certain indices?
I'm not sure what you would return for the MP then for the other indices.

@peterdhansen
Copy link
Author

Or is the snippets algorithm doing this (for regularly spaced index selection)

(Sorry for triple post)

@vanbenschoten
Copy link
Contributor

vanbenschoten commented Sep 17, 2020 via email

@tylerwmarrs
Copy link
Contributor

tylerwmarrs commented Sep 17, 2020

So, I just had another idea.
Would it be possible/useful to restrict the MP calculation to only consider certain indices?
I'm not sure what you would return for the MP then for the other indices.

We could use a similar approach to how missing data can be handled. The stomp implementation handles this right now and we are working on adding similar functionality to mpx. Essentially, provide a boolean array of indices to process or skip. I envision it working like annotation vector. All other distances in the profile can simply return nan.

Another approach could be to require users to have valid time domains using a Pandas time series or something. This way we can have users specify intervals of interest.

@tylerwmarrs
Copy link
Contributor

Or is the snippets algorithm doing this (for regularly spaced index selection)

(Sorry for triple post)

Snippets does not do this. It identifies k representative snippets and n neighbors. It helps to answer what is common in the series of interest.

@tylerwmarrs
Copy link
Contributor

@peterdhansen Any updates on this?

@peterdhansen
Copy link
Author

Sorry, not yet. I'll take a look in the next week or so.

@peterdhansen
Copy link
Author

Got my environment setup 😄

@vanbenschoten
Copy link
Contributor

@peterdhansen just wanted to circle back and see if you're still interested in contributing.

Happy Holidays!

@tylerwmarrs tylerwmarrs added the feature New feature or request label Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants