Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTA files with very large unwrapped records generate exceptions in agat_sp_extract_sequences.pl #150

Closed
pmagwene opened this issue Jul 15, 2021 · 4 comments · Fixed by #152

Comments

@pmagwene
Copy link

I've encountered a bug in AGAT's agat_sp_extract_sequences.pl function (latest version from Conda as of 15 July 2021) where a Fasta input file that has a large entry with unwrapped lines generates the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Failed validation of sequence '[unidentified sequence]'. Invalid characters 

When I reformatted the Fasta file so that lines have a maximum of 70 chars the run completed without exception.

I suspect this is hitting some character limit in Perl string size or something similar?

I can provide sample files that raise this exception, but I haven't had a chance to trim them down to a MWE, so they're somewhat large.

@pmagwene
Copy link
Author

I'm guessing this is the same issue discussed in #56

@Juke34
Copy link
Collaborator

Juke34 commented Jul 19, 2021

Difficult to say, I don't get why the bioperl message is not MSG: Each line of the file must be less than 65,536 characters. Line 2 is 67824 chars. then.
The problem may be something related.
Someone had a similar issue, see #37.

Happy to hear you found a way to solve this.

@Juke34 Juke34 closed this as completed Jul 19, 2021
@pmagwene
Copy link
Author

Would be useful to add a note to the documentation regarding wrapping of FASTA files as #37 and #56 seem to suggest that this isn't a rare hiccup folks encounter.

@Juke34
Copy link
Collaborator

Juke34 commented Jul 19, 2021

Good point, something to add in the Troubleshooting section.

Thank you for your feedback

@Juke34 Juke34 reopened this Jul 19, 2021
@Juke34 Juke34 mentioned this issue Jul 27, 2021
Merged
Juke34 added a commit that referenced this issue Jul 27, 2021
* fix #150 and move troubleshooting section from README to doc

* add how to cite section in doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants