-
-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment entry checksum mismatch (no I/O errors) #8576
Comments
If it complains about segment entry checksum mismatches it means that the crc32 stored with the data does not match the data anymore. So there is corruption in these segment files. Segments 3, 4, 8 are old segment files (it starts from 1 and the number is only increasing) likely written within the first (or first few) backup(s). That could likely be a hw issue:
In any case, do at least that before trying The lots of index mismatches (missing entries in the rebuilt index) borg check complains about is likely cause by corrupted segment entries or completely missing segment files. Why it only complains with borg prune or borg check and not create is because create does not read that old data and thus does not detect the issue. Even if it reuses such an old chunk, it determines its presence from the index and does not read it from disk. A sw issue can't be ruled out completely, of course, and you have a relatively complex stack of sw in the I/O path. I don't think it is related to compression. If you did recompress / rewrite the data contained in the borg segment files in the past, it could be that a malfunction in the involved hw back then corrupted the file contents while doing that and it even could be that the hw issue is now not present anymore. If you run |
Tagged it hardware-issue for now, the probability for that is rather high. Very likely an issue "below" borg. |
We shutdown the server, booted into rescue and performed an extensive hardware check for hours. Everything seems fine (unfortunatelly). Starting to think that we have something related to bcachefs and borg especially here.
|
Be careful with such "overall" tests (especially if it is unknown how thorough they do each of the tests they do). If these are optimized to not take extremely long, the individual tests might be rather on the quick side. I've long-term good experiences with On some linux dists, |
We completed some extra tests, besides those we did last week. We did one complete pass on No errors, we got a PASS on all of them.
I am reporting the above for the sake of completeness. I am worried that we might have something due to bcachefs and borg here, but again no problems have arised besides borg errors on the same storage. |
OK, so memory and disks on the server look ok (now). borg is just doing fs api calls, so there shouldn't be "borg specific" issues in the layers below borg. BUT, due to the amount of I/O borg is doing, some rare issues might get triggered rather by borg than by less intensive usage. |
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Bug/Issue
Your borg version (borg -V).
borg 1.2.8
Operating system (distribution) and version.
Fedora Linux 40
Hardware / network configuration, and filesystems used.
AMD Ryzen 7 3700X 8-Core Processor
64GB RAM
using bcachefs filesystem with encryption and compression, on mdadm block devices
(storage is using rotational md raid10 devices as slow storage and NVMe md raid1 devices as a fast storage)
How much data is handled by borg?
23 TB in about 160 borg repos
Full borg commandline that lead to the problem (leave away excludes and passwords)
Describe the problem you're observing.
checksum mismatches happening on all repos, only when running borg prune and borg check (no problem when creating a new backup).
There are not I/O errors at all.
Also fsck has been run on storage mount.
(the same storage is also in use for a different purpose with no errors)
The error shows up when
prune
is about to act.Note that we are using lz4 compression on bcachefs and zstd as background compression on bcachefs.
Borg is also using lz4 on backup creation.
Recently we disabled any compression on bcachefs to test if this is related - we are waiting days to pass to to have pruning on backups created after compression disable.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes, running
borg check --verify-data
shows the problem.Include any warning/errors/backtraces from the system logs
Two examples:
Running borg check on borg server
While pruning from a client (running for example borg 1.2.1 or borg 1.1.18):
The text was updated successfully, but these errors were encountered: