ArchiveBox
was previously named Pocket Archive Stream
and then Bookmark Archiver
.
THIS PAGE HAS BEEN MOVED:
See the releases page for versioned source downloads and full changelog.
🍰 Many thanks to our 100+ contributors and everyone in the web archiving community! 🏛
Expand old release notes...
- v0.4.9 released
pip install archivebox
https://pypi.org/project/archivebox/docker run archivebox/archivebox
https://hub.docker.com/r/archivebox/archivebox- https://archivebox.readthedocs.io/en/latest/
- https://github.com/ArchiveBox/ArchiveBox/releases
- easy migration from previous versions
cd path/to/your/archive/folder archivebox init archviebox add 'https://example.com' archviebox add 'https://getpocket.com/users/USERNAME/feed/all' --depth=1
- full transition to Django Sqlite DB with migrations (making upgrades between versions much safer now)
- maintains an intuitive and helpful CLI that's backwards-compatible with all previous archivebox data versions
- uses argparse instead of hand-written CLI system: see
archivebox/cli/archivebox.py
- new subcommands-based CLI for
archivebox
(see below) - new Web UI with pagination, better search, filtering, permissions, and more
- 30+ assorted bugfixes, new features, and tickets closed
- for more info, see: https://github.com/ArchiveBox/ArchiveBox/releases/tag/v0.4.9
- v0.2.4 released
- better archive corruption guards (check structure invariants on every parse & save)
- remove title prefetching in favor of new FETCH_TITLE archive method
- slightly improved CLI output for parsing and remote url downloading
- re-save index after archiving completes to update titles and urls
- remove redundant derivable data from link json schema
- markdown link parsing support
- faster link parsing and better symbol handling using a new compiled URL_REGEX
- v0.2.3 released
- fixed issues with parsing titles including trailing tags
- fixed issues with titles defaulting to URLs instead of attempting to fetch
- fixed issue where bookmark timestamps from RSS would be ignored and current ts used instead
- fixed issue where ONLY_NEW would overwrite existing links in archive with only new ones
- fixed lots of issues with URL parsing by using
urllib.parse
instead of hand-written lambdas - ignore robots.txt when using wget (ssshhh don't tell anyone 😁)
- fix RSS parser bailing out when there's whitespace around XML tags
- fix issue with browser history export trying to run ls on wrong directory
- v0.2.2 released
- Shaarli RSS export support
- Fix issues with plain text link parsing including quotes, whitespace, and closing tags in URLs
- add USER_AGENT to archive.org submissions so they can track archivebox usage
- remove all icons similar to archive.org branding from archive UI
- hide some of the noisier youtubedl and wget errors
- set permissions on youtubedl media folder
- fix chrome data dir incorrect path and quoting
- better chrome binary finding
- show which parser is used when importing links, show progress when fetching titles
- v0.2.1 released with new logo
- ability to import plain lists of links and almost all other raw filetypes
- WARC saving support via wget
- Git repository downloading with git clone
- Media downloading with youtube-dl (video, audio, subtitles, description, playlist, etc)
- v0.2.0 released with new name
- renamed from Bookmark Archiver -> ArchiveBox
- v0.1.0 released
- support for browser history exporting added with
./bin/archivebox-export-browser-history
- support for chrome
--dump-dom
to output full page HTML after JS executes
- v0.0.3 released
- support for chrome
--user-data-dir
to archive sites that need logins - fancy individual html & json indexes for each link
- smartly append new links to existing index instead of overwriting
- v0.0.2 released
- proper HTML templating instead of format strings (thanks to https://github.com/bardisty!)
- refactored into separate files, wip audio & video archiving
- v0.0.1 released
- Index links now work without nginx url rewrites, archive can now be hosted on github pages
- added setup.sh script & docstrings & help commands
- made Chromium the default instead of Google Chrome (yay free software)
- added env-variable configuration (thanks to https://github.com/hannah98!)
- renamed from Pocket Archive Stream -> Bookmark Archiver
- added Netscape-format export support (thanks to https://github.com/ilvar!)
- added Pinboard-format export support (thanks to https://github.com/sconeyard!)
- front-page of HN, oops! apparently I have users to support now 😁?
- added Pocket-format export support
- v0.0.0 released: created Pocket Archive Stream 2017/05/05