Releases
v2.1.0
Added
New fuzzy-rule for cheatography.com (#342 ), der-postillon.com (#330 ), iranwire.com (#363 )
Properly rewrite redirect target url when present in HTML tag (#237 )
New --encoding-aliases
argument to pass encoding/charset aliases (#331 )
Add support for SVG favicon (#148 )
Automatically index PDF content and use PDF title (#289 and #290 )
Changed
Upgrade to python-scraperlib 4.0.0
Generate fuzzy rules tests in Python and Javascript (#284 )
Refactor HTML rewriter class to make it more open to change and expressive (#305 )
Detect charset in document header only for HTML documents (#331 )
Use software
property from warcinfo
record to set ZIM Scraper
metadata (#357 )
Store ContentDate
as metadata, based on WARC-Date
(#358 )
Remove domain specific rules (#328 )
Revisit retrieve_illustration logic to prefer best favicons (#352 and #369 )
Upgrade dependencies (zimscraperlib 4.0.0, wombat.js 3.7.12 and others) (#376 )
Fixed
Handle case where the redirect target is bad / unsupported (#332 and #356 )
Fixed WARC files handling order to follow creation order (#366 )
Remove subsequent slashes in URLs, both in Python and JS (#365 )
Ignore non HTTP(S) WARC records (#351 )
Fix vimeo_cdn_fix
fuzzy rule for proper operation in Javascript (#348 )
Performance issue linked to new "extensible" HTML rewriting rules (#370 )
You can’t perform that action at this time.