Few updates and fixes.
- Improve transcripts table creation
- Improve Docker image:
- Using Micromamba instead of mamba
- Reduce size by combining layers and cleaning caches
- Fix tag of howard tool and docker images
This release introduce 'BigWig' annotation, prioritization options and transcripts view, improve samples managment, INFO/tags rename, annotation databases generation and operations, configuration files in YAML format, and python packages stability.
- Add annotation method 'BigWig'
- Add prioritization options:
- SQL syntax available to define filters
- New 'Class' prioritization field
- New transcripts view:
- Create a transcript view, using a structure from multiple source type (e.g. snpEff, external annotation databases)
- Mapping between multiple transcript ID source (e.g. refSeq, Ensembl)
- Transcripts prioritization, using same prioritization process than variants
- Export transcripts table as a file, in multiple format such as VCF, TSV, Parquet
- Export with a specific sample list
- Rename or remove INFO/tags before exporting
- Configuration and parameters files in YAML format allowed
- Add dynamic transcript column for NOMEN calculation (using transcript prioritization column)
- Add plugins:
- 'update_databases'
- Improve snpEff annotations operations
- New option 'uniquify' for dbSNFP generation, identification of columns type
- Managment, check and export of samples columns
- Improve query type mode
- Improve splice annotation
- Improve NOMEN generation
- Genotype format detection
- Fix packages releases
- Fix parameters and configuration files options
- Fix calculations list and parametrization
- Fix empty file export
- Fix BED annotation with parquet method
- More explicite log messages
This release introduce splice annotation tool, and update duckDB python package for improve stability.
- Add splice tool with docker image
- Add plugins:
- 'genebe' (GeneBe annotation using REST API)
- 'minimalize' (Minimalize a VCF file, such as removing INFO/Tags or samples)
- DuckDB 1.0.0 stable Snow Duck (Anas Nivis) release
- Add API Documentation
- Improve tests
- Paths parameters check fixed (genome and genomes-folders)
- Fix snpEff download error with databases list
This release is a refactor of HOWARD (Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery) in Python, using Parquet and duckDB.
HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, translates files in multiple formats (e.g. vcf, tsv, parquet) and generates variants statistics.
See README and gitHub for more explanations.
See HOWARD gitHub for more information about previous releases.
- Script creation
- Add Prioritization and Translation
- Add snpEff annotation and stats
- Add Multithreading on Prioritization and Translation
- Add Calculation step
- Add generic file annotation through --annotation option
- No need to be in configuration file
- Need to be in ANNOVAR database folder (file 'ASSEMBLY_ANN.txt' for annotation 'ANN')
- Add options: --force , --split
- Add options for VCFanotation.pl: --show_annoataion, --show_annotations_full
- Add database download option nowget in VCFanotation.pl
- Fixes: multithreading, VAF calculation, configuration and check dependencies
- Replace VCFTOOLS command to BCFTOOLS command
- Release added into the output VCF
- Update SNPEff options
- Add VARTYPE, CALLING_QUALITY and CALLING_QUALITY_EXPLODE option on calculation
- Add description on calculations
- Improve VCF validation
- Fix snpEff annotation bug
- Add --vcf input vcf file option
- Create Output file directory automatically
- Improve Multithreading
- Improve Multithreading
- Input VCF compressed with BGZIP accepted
- Output VCF compression level
- Add VCF input sorting and multiallele split step (by default)
- Add VCF input normalization step with option --norm
- Bug fixes
- Multithreading improved
- Change default output vcf
- Input vcf without samples allowed
- VCF Validation with contig check
- Add multi VCF in input option
- Add --annotate option for BCFTOOLS annotation with a VCF and TAG (beta)
- Remove no multithreading part code to multithreading with 1 thread
- Remove --multithreading parameter, only --thread parameter to deal with multithreading
- Replace --filter and --format parameters by --prioritization and --translation parameters
- Add snpeff options to VCFannotation.pl
- Reorganization of folders (bin, config, docs, toolbox...).
- Improve Translation (TSV or VCF, sort on fields, selection of fields, filtering on fields), especially memory efficiency
- Change Number/Type/Description of new INFO/FORMAT header generated
- Remove snpEff option --snpeff and --snpeff_hgvs. SnpEff is used through --annotation option
- Add '#' to the TAB/TSV delimiter format header
- Update dbNSFP config annotation file script
- Change default configuration files for annotation (add dbSNFP 3.5a, update mcap and regspintron) and prioritization
- Bug fixed: file identification in annotation configuration
- Bug fixed: calculation INFO fields header, snpeff parameters options on multithreading
- Bug fixed: snpeff parameters in command line
- Rename HOWARD.sh to HOWARD.
- Add --nomen_fields parameter and update NOMEN calculation.
- Add --bcftools_stats and --stats parameter.
- Change PZScore, PZFlag, PZComment and PZInfos generation, adding default PZ and all PZ filters.
- Bug fixed: translation fields identification.
- Bug fixed: NOMEN calculation clear previous NOMEN values if using force option.
- Change --norm parameter by adding '--check-ref=s' in bcftools command.
- Add --norm_options parameter.
- force translation VCF by default.
- Change VAF_stats and add DP_stats.
- Remove --snpeff_threads parameter (for snpEff 5.0e compatibility) and improve --snpeff_stats.
- Add a check and rehead INFO fields if necessary (prevent some incorrect INFO header format).
- Fix --compress parameter and add --index parameter.
- Add INFO description Type option, with autodetection.
- Add prioritization mode 'VaRank'/'max' for score calculation.
- Add calculation DP, AD, GQ and associated stats.
- Fix INFO field type for VCF.
- Structural variant compatibility.
- NOMEN extraction and generation improved, with --nomen_pattern option (only for SNV and InDel).
- Improve translation with variant sorting.
- Improve error catching.