Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: line with too many characters #51

Closed
pgoikoetxea opened this issue Dec 22, 2021 · 2 comments
Closed

ERROR: line with too many characters #51

pgoikoetxea opened this issue Dec 22, 2021 · 2 comments

Comments

@pgoikoetxea
Copy link

Dear Dr. Derrien: I find your pipeline very attractive for several reasons. I'm ttrying to run it on transcriptome data from a gymnosperm megagenome which is only partially sequenced.. I successfully run the first module in the pipeline (FILTER), but an error emerged when running the CODPOT module. The error is described in the header, and I paste here the first few lines, although I can send the complete output error if you wish

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the file must be less than 65,536 characters. Line 3510498 is 351472 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/pablo/Apps/Miniconda3/envs/feelnc/lib/site_perl/5.26.2/Bio/Root/Root.pm:447

.
By logic, the error has originated from my genome file (fasta), from which I have extracted several lines around the relevant one with head and taail.. I would like to know whether this can be fixed, and how, but it strikes me that the previous sequence has 322377 bp but has not triggered the error.
Thank you very much
Pablo

@tderrien
Copy link
Owner

Dear @pgoikoetxea

Thank you for using FEELnc!

Actually, it could be related to the genome .fasta file not being correctly formatted with one (big) line per sequence.

Maybe the best would be to reformat your .fasta file before running the feelnc_codpot.pl such as:

cat genome.fa
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

fold -b -w 70 genome.fa
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDF
TIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEE
KIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKI
VEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVR
RFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

Hope this helps.

Best regards,

Thomas

@pgoikoetxea
Copy link
Author

Thank you very much for your fast answer. And YES, my fasta file, downloaded from treegenes.db.org is formatted as in your first example.
Thank you very much for the code to format the lines.
Best wishes
Pablo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants