Basically, it's a text file to WARC pipeline for grab-site (and technically ArchiveBot).
Prototype was coded on Windows and requires Python, 7-Zip & Docker. Untested on other platforms.
- Download and install Docker.
- Grab Dockerfile from Nold360/docker-grab-site and place into a folder in a directory (e.g.
D:\grab-site-data
,/home/user/grab-site-data/
).- This will become the data folder for the docker containers, where the WARCs will be saved. It's recommened to use a root directory with no spaces.
- Build the image with
docker build -t grab-site .
(Size of docker image is around 500 mb)- If you are on an ARM system (or Apple Silicon), it is recommended to add
--platform=linux/amd64
to all of these docker commands you run avoid issues with wget's WARC creation.
- If you are on an ARM system (or Apple Silicon), it is recommended to add
- Spin the container up with
docker run -d --rm -p29000:29000 -v DATA_FOLDER:/data --name grab-site-container grab-site
- Set
DATA_FOLDER
to the path of the above directory.
- Set
- Create a text file of a bunch of IDs you want the script to archive.
- To see what this program supports, see SUPPORTED.md
- Open a terminal in this repo directory.
- Run
python . DATA_FOLDER TEXTFILE ITEM_TYPE
DATA_FOLDER
is the directory above,TEXTFILE
is the text file andITEM_TYPE
is what type the items in the text file are.