Allow for a better archive hierarchy #1

itkovian · 2019-05-24T22:31:07Z

We currently have a very flat format, i.e., job.<jobid>_script and job.<jobid>_environment. While this suffices for finding job scripts, it has several drawbacks.

there can be many jobs in the archive, meaning the number of entries in the single archival directory will become quite large.
users may not always recall the exact job ID (there might be a few) and looking for a time might help pin down the problematic job.

A better archive could be organised by

user
cluster
timestamps (e.g., yearly, monthly, daily, ...)

The text was updated successfully, but these errors were encountered:

itkovian · 2019-06-06T22:12:01Z

Not all environment files contain information about the user due to the --export=NONE setting when calling sbatch. This means we cannot reliably place the user name in the archived file name of directory structure.

itkovian · 2020-01-30T17:11:10Z

Adding the cluster in the hierarchy seems only useful if the archive resides on storage that is shared between masters.

kcgthb · 2020-05-14T21:54:28Z

For the job archival system we've developed locally, we use a multi-level hierarchy based on the job ids, not too different from what Slurm does in StateSaveLocation with the hash.{0..9} directories. That's the only way we found to store dozens of millions of file scripts in a POSIX filesystem.

The idea is to reverse the job id and slice it like this:
jobid 67043328 -> /archive/82/33/40/76/
jobid 10123 -> /archive/32/10/10/00/

This ensures that consecutive job ids get equally dispatched to the different end-level archive directories without overloading any particular one.

Maybe something similar could be used for sarchive?

itkovian · 2020-05-19T21:40:27Z

That's a nice suggestion, thanks. I would suggest not to take it to the lowest level, so maybe not start with /82/33 as in your example, but then you would have multiple consequential jobs in the same dir, though limited to e.g., 10K files or even 1K files if we use jobid div 1000.

In our usage, we do stick them in YYYYMMDD subdirs, which then get tarred and zipped after 7 days or so. So that may also avoid overloading, even though this lacks an equal distribution in numbers of files across the days.

itkovian added the enhancement New feature or request label May 24, 2019

itkovian mentioned this issue May 25, 2019

Archive #2

Merged

itkovian self-assigned this Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for a better archive hierarchy #1

Allow for a better archive hierarchy #1

itkovian commented May 24, 2019 •

edited

Loading

itkovian commented Jun 6, 2019

itkovian commented Jan 30, 2020

kcgthb commented May 14, 2020

itkovian commented May 19, 2020

Allow for a better archive hierarchy #1

Allow for a better archive hierarchy #1

Comments

itkovian commented May 24, 2019 • edited Loading

itkovian commented Jun 6, 2019

itkovian commented Jan 30, 2020

kcgthb commented May 14, 2020

itkovian commented May 19, 2020

itkovian commented May 24, 2019 •

edited

Loading