ERROR: line with too many characters #51

pgoikoetxea · 2021-12-22T10:03:43Z

Dear Dr. Derrien: I find your pipeline very attractive for several reasons. I'm ttrying to run it on transcriptome data from a gymnosperm megagenome which is only partially sequenced.. I successfully run the first module in the pipeline (FILTER), but an error emerged when running the CODPOT module. The error is described in the header, and I paste here the first few lines, although I can send the complete output error if you wish

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the file must be less than 65,536 characters. Line 3510498 is 351472 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/pablo/Apps/Miniconda3/envs/feelnc/lib/site_perl/5.26.2/Bio/Root/Root.pm:447

.
By logic, the error has originated from my genome file (fasta), from which I have extracted several lines around the relevant one with head and taail.. I would like to know whether this can be fixed, and how, but it strikes me that the previous sequence has 322377 bp but has not triggered the error.
Thank you very much
Pablo

The text was updated successfully, but these errors were encountered:

tderrien · 2021-12-22T10:23:45Z

Dear @pgoikoetxea

Thank you for using FEELnc!

Actually, it could be related to the genome .fasta file not being correctly formatted with one (big) line per sequence.

Maybe the best would be to reformat your .fasta file before running the feelnc_codpot.pl such as:

cat genome.fa
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

fold -b -w 70 genome.fa
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEGLVSVKVSDDF
TIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIPQFASRKQLSDAILKEAEE
KIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFYVMDDKKTVEQVIAEKEKEFGGKIKI
VEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGENLVVR
RFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

Hope this helps.

Best regards,

Thomas

pgoikoetxea · 2021-12-22T10:30:45Z

Thank you very much for your fast answer. And YES, my fasta file, downloaded from treegenes.db.org is formatted as in your first example.
Thank you very much for the code to format the lines.
Best wishes
Pablo

pgoikoetxea closed this as completed Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: line with too many characters #51

ERROR: line with too many characters #51

pgoikoetxea commented Dec 22, 2021

tderrien commented Dec 22, 2021

pgoikoetxea commented Dec 22, 2021

ERROR: line with too many characters #51

ERROR: line with too many characters #51

Comments

pgoikoetxea commented Dec 22, 2021

tderrien commented Dec 22, 2021

pgoikoetxea commented Dec 22, 2021