All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Changed filter-bam function into filter-file; Now accepts BED and CRAM in addition to BAM
- filter-file command can now accept a blacklist file
- Update various docstrings and help statements to no longer mention SAM files as an accepted format
- changed default args for
wps
would lead to errors. Nowwps
defaults to LWPS fragment lengths (120-180nt).
- made
finaeltoolkit.utils.typing
public. This is a module containing some useful type aliases - minor formatting and typing changes
- renamed
fraction_low
andfraction_high
tomin_length
andmax_length
for.utils.frag_array
and.frag.wps
.wps
retains the deprecated arg names but issues a warning.
- Added missing
-n
arg toend-motifs
. - Fixed incorrect
ValueError
regarding thenegative_strand
arg. - Incorrect function name for
wps
leading to errors when called from CLI.
- Additional tests for the CLI lazy loading implementation
- several modules containing implementations of fragmentomic features or
utiliy functions have been made internal. This means there is now only one
obvious import for each function. For example,
multi_wps
is imported fromfinaletoolkit.frag
, and no longer can be imported fromfinaletoolkit.frag.multi_wps
- The CLI now uses lazy importing, drastically speeding up finaletoolkit when called from a command line.
- Added
negative_strand
option for end motifs related functions. When used in conjunction withboth_strands
, only end motifs on the negative (Crick) strand are considered in calculations. - Renamed
fraction_high
andfraction_low
inutils.utils.frag_generator
tomin_length
andmax_length
.
- deprecated arguments for
end-motifs
had default values which could lead to an error. This is fixed.
- CLI no longer prints an error message if
finaletoolkit
is called without args. frag-length-bins
, when writing a file, now writes the interval betweenmin
andmax
as inclusive. That is, previously whenmin=1
andmax=2
, only fragments of length 1 are reported. Now when such a result is calculated, the interval given ismin=1
andmax=1
.- Updated some descriptions and docstrings.
adjust-wps
now has an option-S
or--exclude-savgol
to not perform Savitsky-Golay filtering.
- Several CLI options were renamed so that underscores become hyphens. This is for consistency and to simplify writing commands.
strand_location
arg fromagg_bigwig
cli_hist
module
- fixed bug involving tqdm progress bar in
frag_length_intervals
- some code formatting
- fixed bug involving arg names in
filter-bam
- add some missing args to CLI
- issues with running
cleavage-profile
(#115) - issues with writing to bigwig with
wps
- change default of arg
both_strands
ofend_motifs
to True to match behavior of original scripts - rename
fraction_high
andfraction_low
tomin_length
andmax_length
for all features, deprecating old args as aliases if needed. - numpy 2 compatible
- fragmentomics functions assume Tabix indexed files all follow the
FinaleDB Frag.gz file format. That is, columns are
chrom
,start
,stop
,score
, andstrand
. If more columns are detect, a warning is issued, and FinaleToolkit will attempt to parse the file as a BED6 format. - renamed
genome_file
tochrom_sizes
for most functions. multi_wps
andmulti_cleavage_profile
no longer return a value due to memory issues when attempting to calculate these genomewide. Instead, users should refer to the file specified withoutput_file
.
- internal
utils._typing
andutils._deprecation
modules - test for
delfi
delfi-gc-correct
command. GC-correction is performed automatically bydelfi
already.
finaletoolkit.frag.frag_length_bins
no longer has thecontig_by_contig
option. This never had any functionality.finaletoolkit.frag.frag_length_bins
no longer generates a text-based histogram.
contig_sizes
option included forcleavage-profile
CLI command.normalize
option forcoverage
fixed so it no longer normalizes twicenormalize=False
forcoverage
runs much faster- misc typehints and docstrings
finaletoolkit.frag.frag_length_bins
uses a dict based implementation that is more memory efficient.finaletoolkit.frag.frag_length_bins
andfinaletoolkit.frag.frag_length_intervals
now takemin_length
andmax_length
keyword args to only consider fragments of certain lengths.- flags for
frag-length-bins
andfrag-length-intervals
CLI commands updated to match Python API coverage
default argument fornormalize
changed toFalse
coverage
default argument forscale_factor
changed to 1.
finaletoolkit.frag.frag_length_bins
can generate a histogram figure- tests for
frag_length
module
- update docs, docstring, and help message for wps to mention that
site_bed
must be sorted.
normalize
keyword argument and--normalize
flag tofinaletoolkit.frag.coverage
function andfinaletoolkit coverage
subcommand, respectively. Setting this argument/flag to true results in the output being normalized by the total coverage, ignoringscale_factor
if specified.--intersect-policy
or-p
flag added tofinaletoolkit coverage
subcommand.
- subpackages can now be accessed when importing
finaletoolkit
. Previously, the following code resulted in an error:
>>> import finaletoolkit as ftk
>>> help(ftk.frag)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'finaletoolkit' has no attribute 'frag'
Now this is a valid way to access subpackages cli
, frag
, genome
, and
utils
.
- indexing issue in region_end_motifs that would misread strand information when calculating end motifs on forward-strand only.
- frag_generator now accepts fragment coordinates in bed.gz files
delfi
acceptsgap_file=None
- update prog for
delfi
to reflect compatibility with reference genomes other than hg19
- Added many tests for util functions
- Changed a nopython function to use numba compatible indexing
- Used "not" instead of "~" in an if statement
- Added a test for the coverage function
- Ensured that the coverage value returns the expected value (previously returned an empty generator)
- Included
output_file
as required argument forfinaletoolkit cleavage-profile
.
- Fixed incompatible types in min function through an explicit cast of chrom_sizes to integers.
- include
chrom_sizes
file as required argument forfinaletoolkit cleavage-profile
- Numpy dependency version set to <2 to avoid breaking changes from numpy 2. This will change in the future as we migrate to use numpy 2.
- Replaced all instances of
np.NaN
withnp.nan
.
- Fixed minor issues with typing in
finaletoolkit.genome.gaps
- Fixed issue where data files are not packaged with FinaleToolkit
- Brief description of modules in documentation under structure page
- Docstring
finaletoolkit.version
module containing single-source__version__
variableremove_nocov
option infinaletoolkit.frag.delfi
to toggle dropping two bins with low coverage. These bins are dropped in delfi_scripts but may not apply to fragment files not aligned to hg19.- tests for
finaletoolkit.frag.delfi_merge_bins
finaletoolkit.frag.delfi
changed to accept files aligned to almost any reference genome.finaletoolkit.frag.delfi_merge_bins
algorithm changed to be reference genome-agnostic and consistent with delfi_scriptsfinaletoolkit delfi
options-G
,-M
, and-R
to drop gc-correction, merging, and remove nocov bins, respectively.
- unused flags for
finaletools delfi
:-W
,--window-size
- redundant flags for
finaletools delfi
:-gc
,--gc-correct
,-m
,--merge-bins
utils.agg_bw
now supportsPathLike
for input- docstrings for
frag.end_motifs.EndMotifsIntervals
changed to be compatible with Sphinx
- added missing
gzip
import forutils.agg_bw
- tests for
utils.agg_bw
interval_size
argument foradjust_wps
adjust_wps
checks ifmedian_window
is larger than interval- remove default options from some private helper functions for better error catching/predictable behavior.
wps
related functions and subcommands
- fixed writing to
bed.gz
files when usingcoverage
- adjusted handling of contig, start, stop for
frag_generator
so thatcoverage
does not throw exceptions for genomewide intervals.
- test for
single_coverage
- add
__version
attribute finaletoolkit --version
displays package version- update PyPI page to include links
- Fixed intersect policy for
cleavage_profile
. Now it callsfrag_generator
with a policy ofany
. - Clean up some comments and docstrings
- Fixed logging from coverage function
- Added numerous util functions
- Added
left
andright
options tocleavage_profile
and CLIcleavage-profile
. - Added tests for cleavage profile and WPS.
- Minimum Python version 3.9
- Changed
filter_bam
to have same filters as FinaleDB utils.frag_generator
raisesValueError
ifstart
orstop
are specified withoutcontig
- Type hints changed to use literals when possible
- Removed
utils.get_contig_lengths
- Removed
data
,conda_envs
, andfigs
directories - Removed unused dependencies
click
,pybedtools
, and `cython - Remove some unused imports from module files
interval-mds
CLI subcommand calculates correctly without large negative values.interval-mds
CLI subcommand now correctly parses tsv files.
- Most end-motif related Python functions accept Path instances as inputs for files.
- Unit and function tests, especially for end-motif related functions.
- All instances of finaletools have been renamed to finaletoolkit
- All default tabular files are now TSV
- Update contacts in TOML
interval-mds
andmds
both calculate correctly when one motif has a frequency of 0
- Added
finaletools.interval_end_motifs
function to calculate end-motifs over genomic intervals. Stores results in an IntervalEndMotifs object. - Added CLI subcommand
interval-end-motifs
to calculate end-motifs over genomic intervals. - Added CLI subcommand
interval-mds
to calculate MDS over intervals from interval end-motifs table.
- Added
gc_correct
option todelfi_merge_bins
so that merging is possible without GC correction
delfi
can now be run withgc_correct=false
andmerge_bins=true
- fixed
cleavage_profile
import infrag
- Added
CHANGELOG.md
- Fixed bug in coverage where writing to non-bedgraph files would result in an error
finaletools.frag.coverage
accepts Frag.gz format files- update CLI help messages and docstrings for coverage and DELFI to reflect current and previous changes
- update docs
- Updated emails in
pyproject.toml