You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have a very flat format, i.e., job.<jobid>_script and job.<jobid>_environment. While this suffices for finding job scripts, it has several drawbacks.
there can be many jobs in the archive, meaning the number of entries in the single archival directory will become quite large.
users may not always recall the exact job ID (there might be a few) and looking for a time might help pin down the problematic job.
A better archive could be organised by
user
cluster
timestamps (e.g., yearly, monthly, daily, ...)
The text was updated successfully, but these errors were encountered:
Not all environment files contain information about the user due to the --export=NONE setting when calling sbatch. This means we cannot reliably place the user name in the archived file name of directory structure.
For the job archival system we've developed locally, we use a multi-level hierarchy based on the job ids, not too different from what Slurm does in StateSaveLocation with the hash.{0..9} directories. That's the only way we found to store dozens of millions of file scripts in a POSIX filesystem.
The idea is to reverse the job id and slice it like this: jobid 67043328 -> /archive/82/33/40/76/ jobid 10123 -> /archive/32/10/10/00/
This ensures that consecutive job ids get equally dispatched to the different end-level archive directories without overloading any particular one.
Maybe something similar could be used for sarchive?
That's a nice suggestion, thanks. I would suggest not to take it to the lowest level, so maybe not start with /82/33 as in your example, but then you would have multiple consequential jobs in the same dir, though limited to e.g., 10K files or even 1K files if we use jobid div 1000.
In our usage, we do stick them in YYYYMMDD subdirs, which then get tarred and zipped after 7 days or so. So that may also avoid overloading, even though this lacks an equal distribution in numbers of files across the days.
We currently have a very flat format, i.e.,
job.<jobid>_script
andjob.<jobid>_environment
. While this suffices for finding job scripts, it has several drawbacks.A better archive could be organised by
The text was updated successfully, but these errors were encountered: