This repository details the cleaning process carried out for the Melastomataceae probe set.
See An updated and extended version of the Melastomataceae probe set for target capture (Dagallier & Michelangeli 2024).
The cleaning process is detailed here.
The new and clean probe set is available here:
PROBE_SET_CLEAN.FNA
(nucleotides version) and
PROBE_SET_CLEAN_prot.FAA
(amino acid version).
N.B. The purpose of this clean probe set is to be used bioinformatically to recover targeted sequences from sequencing reads, but not to physically target the DNA in vitro.
Additional note. It might be interesting to remove short sequences
from the probe set, e.g. with hybpiper fix_targetfile
(https://github.com/mossmatters/HybPiper/wiki/Troubleshooting,-common-issues,-and-recommendations#14-fixing-and-filtering-your-target-file)
See details here.
Figure 1. Summary of recovery statistics computed with HybPiper for the assemblies with the old probe set (blue) and the new probe set in nucleotide format (yellow), and with the new probe set in amino-acids format (orange). A: number of loci with mapped reads, B: number of loci with assembled sequences, and C: number of loci with assembled sequences equal or longer to 75% of the length of their locus reference in the probe set. Burrow-Wheeler aligner (bwa) was used to map the reads with nucleotide probe sets, and Diamond was used for the amino-acids probe set. Numbers right to the boxplots are the median value.
Please cite as: Dagallier L-PMJ, Michelangeli FA. 2024. An updated and extended version of the Melastomataceae probe set for target capture. Applications in Plant Sciences 12: e11564. https://doi.org/10.1002/aps3.11564