You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now we are conducting a project utilizing data from GTEx project. We are particularly interested in the resource presented in recount3 and would like to seek clarification on two specific points:
In your method, you mentioned that "When STAR performs spliced alignment, it outputs a high-confidence collection of splice-junction calls in a file named (SJ.out.tab)". And in the recount3, we could get the Matrix Market file. Can you confirm whether these aggregated files contain the information found in the last three columns of the SJ.out.tab file?
If so, is there a way to convert the Matrix Market file back to bed file with the counts of junction reads?
Your prompt response to these inquiries would be greatly appreciated. Thank you for your attention to this matter.
Best,
Jiapei
The text was updated successfully, but these errors were encountered:
For 1., the recount3 matrix market files are derived from the aggregate SJ.out.tab files across the samples for a particular study (or tissue in the case of GTEx v8). I'll have to double check if we did any additional filtering (since it's been a while), but the contents should be the vast majority of what was SJ.out.tab files.
For 2. given that you want the splice junctions in a bed file of counts you're probably best off using Snaptron's re-formatted version of the GTEx v8 junctions in recount3:
where the rail_id column (first column) is the sample ID that appears in the comma delimited nested list (field samples in the junctions file) for each junction to define which GTEx samples it appears in (has at least one read supporting). That field also contains the spliced read count of the junction for that sample, e.g. <sample_id>:<spliced_read_count>,...
Also, I should point out, the .bgz file is a gzip-compatible block-gzip format that can be read by gzip or pigz. But there's also the Tabix index file: https://snaptron.cs.jhu.edu/data/gtexv2/junctions.bgz.tbi which you can use to quickly query a genomic coordinate range of junctions as well.
Hi Ben and Kasper,
Now we are conducting a project utilizing data from GTEx project. We are particularly interested in the resource presented in recount3 and would like to seek clarification on two specific points:
Your prompt response to these inquiries would be greatly appreciated. Thank you for your attention to this matter.
Best,
Jiapei
The text was updated successfully, but these errors were encountered: