-
Notifications
You must be signed in to change notification settings - Fork 35
Home
Simple Archive Format Builder, SAFBuilder, is a tool to package your content into a form suitable for batch import into DSpace.
INPUT: Directory containing CSV with metadata, and files. OUTPUT: Simple Archive Format package
To get started, read the README. https://github.com/peterdietz/SAFBuilder/blob/master/README
Getting Started / Installing on: Linux or Windows
There is also a Usage Guide on Simple Archive Format Packager at Duraspace.
git clone git://github.com/DSpace-Labs/SAFBuilder.git
cd SAFBuilder
./recompile.sh
./safbuilder.sh
It should return the usage syntax:
USAGE: BatchProcess /path/to/directory metadatafilename.csv
Hint -- directory: Use absolute path and no trailing slashes
Hint -- metadatafilename: needs to be in the directory, as do the content files
./safbuilder.sh /path/to/SAFBuilder/src/edu/osu/kb/sample_data AAA_batch-metadata.csv
This should then run the SAFBuilder over the included SampleData that came with this package. You can then inspect the SimpleArchiveFormat directory that was created, and that would then be suitable input for batch import to DSpace.
Here is the syntax of importing SAF Packages into DSpace using ItemImport. Basic-DSpace-Import-Process
sudo /dspace/bin/dspace import -a
-e dietz.72@osu.edu
-c 1811/49710
-s /home/peterdietz/Desktop/MelanieSeedsBatch/SimpleArchiveFormat/
-m /home/peterdietz/Desktop/MelanieSeedsBatch/seedsbatch1.map
Make a CSV with column headers of 'filename', 'dc.title', 'dc.creator', 'dc.date.issued', 'dc.desciption.abstract'.
For each row, put in your content.
- filename will contain a path to the filename, i.e. ARC_0112.pdf, or ARC/001.pdf depending on your organization.
- dc.something is your metadata using the Dublin Core name space. Other metadata namespaces are allowed. You can add or change the metadata fields.
- If you have multiple values for a field, such as multiple authors, separate each entry with two pipe characters. i.e || this was chosen as it is unlikely to exist in your content.
You specify files for content bundles by having a header of "filename". You can also specify files that they should be sent to a specified bundle with "filename__bundle:BUNDLENAME", where BUNDLENAME is whatever you want. This might be for when you have to upload files that are not destined for public consumption. An example use case is for uploading custom proxy licenses that are PDF's that don't go into the system license bundle. The SAFBuilder will automatically add the tab separator.
The import tool additionally allows the following as extra parameters for a file.
bundle:BUNDLENAME
permissions:PERMISSIONS
description:DESCRIPTION
primary:true
BUNDLENAME is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle, ORIGINAL.
PERMISSIONS is text with the following format: -[r|w] 'group name'
DESCRIPTION is text of the files description.
PRIMARY is used to specify the primary bitstream.
In SAFBuilder, you can use these by separating them with double underscore.
For example, to see a description and specify which bundle to put the bitstream into use:
filename__bundle:MySpecialBundle__primary:true__description:Something really cool
To have the bitstreams in that column be restricted to Administrator group READ-ONLY (i.e. no anonymous read):
filename__permissions:-r 'Administrator'
Thus, you can have multiple columns, have some objects go into the main bundle, and some objects going into a custom bundle and restricted to administrators.
filename | filename__bundle:PROXY-LICENSE__permissions:-r 'Administrator' |
---|---|
student-thesis.pdf | University-Legal-Proxy-License-signed.pdf |
One of the largest producers of content for us likes to give us about 500 records, and each record includes the content pre-ZIP'ed into a ZIP file. Instead of doing crazy amounts of manual processing, we have adjusted SAFBuilder to accept content in ZIP files, unpack it, and add each file within the ZIP to the record.
Instead of using a header of filename, use filegroup. The double underscores from above can also be used in conjunction. i.e. filegroup_bundle:THUMBNAIL