Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment entry checksum mismatch (no I/O errors) #8576

Open
gecon opened this issue Dec 3, 2024 · 6 comments
Open

Segment entry checksum mismatch (no I/O errors) #8576

gecon opened this issue Dec 3, 2024 · 6 comments

Comments

@gecon
Copy link

gecon commented Dec 3, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Bug/Issue

Your borg version (borg -V).

borg 1.2.8

Operating system (distribution) and version.

Fedora Linux 40

# uname -r
6.12.0-364.vanilla.fc40.x86_64

Hardware / network configuration, and filesystems used.

AMD Ryzen 7 3700X 8-Core Processor
64GB RAM
using bcachefs filesystem with encryption and compression, on mdadm block devices
(storage is using rotational md raid10 devices as slow storage and NVMe md raid1 devices as a fast storage)

How much data is handled by borg?

23 TB in about 160 borg repos

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check --verify-data /data/REPONAME/

Describe the problem you're observing.

checksum mismatches happening on all repos, only when running borg prune and borg check (no problem when creating a new backup).
There are not I/O errors at all.
Also fsck has been run on storage mount.
(the same storage is also in use for a different purpose with no errors)

The error shows up when prune is about to act.

Note that we are using lz4 compression on bcachefs and zstd as background compression on bcachefs.
Borg is also using lz4 on backup creation.
Recently we disabled any compression on bcachefs to test if this is related - we are waiting days to pass to to have pruning on backups created after compression disable.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, running borg check --verify-data shows the problem.

Include any warning/errors/backtraces from the system logs

Two examples:

Running borg check on borg server

# borg check --verify-data /data/REPO/
Killed stale lock HOSTNAME@88974377020853.2151601-0.
Removed stale exclusive roster lock for host HOSTNAME@88974377020853 pid 2151601 thread 0.
Removed stale exclusive roster lock for host HOSTNAME@88974377020853 pid 2151601 thread 0.
Data integrity error: Segment entry checksum mismatch [segment 3, offset 32562947]
Data integrity error: Segment entry checksum mismatch [segment 4, offset 138697937]
Index object count mismatch.
committed index: 74491 objects
rebuilt index:   67613 objects
ID: 3d755895921b2b339a99d569b9d8c695178a07351e809862e0697656f9dd8eff rebuilt index: <not found>      committed index: (4, 485889376)
ID: 4150284c604f64b27dfab33bcbaaadddd109ab49cc72fbc05c52e3eb48dccc4a rebuilt index: <not found>      committed index: (3, 427174282)
ID: f95dc35ea3a05c26262d77f126a25b9d03242bacb68227a40b1e00bed9806383 rebuilt index: <not found>      committed index: (4, 121038035)
ID: f5cb0743a3e47349b582fa3bd159834e9aaf2f1e8434a14ab656e144553b4dd3 rebuilt index: <not found>      committed index: (3, 131977378)
ID: d5f2578c0f0b393f4eee8df3e700428e51ebce068ef306fa7008499d58b79109 rebuilt index: <not found>      committed index: (3, 22605730)
ID: 49da9cbf0de145071ba89ca98d0f86dfa7527990835c5b05529c42e24d5dc246 rebuilt index: <not found>      committed index: (3, 21443603)
ID: 8594ccba168211384216f8bb09d8afc024174cb665bd7a780dc426627a2cf059 rebuilt index: <not found>      committed index: (3, 354985830)
ID: 1ecd04e201df15c09b9d01b3c5b6f944c6ddea4721b84ba255af551555628ab8 rebuilt index: <not found>      committed index: (4, 410433675)
ID: 5594300a0a7d889efc9e73b1e985c9bd3d71d1e0aa1eae61c83b1e8e08007618 rebuilt index: <not found>      committed index: (3, 72490141)
ID: f12d5fc56e7de514d68ed4432a3f104d332c185d9e927c09baa4c8551854470a rebuilt index: <not found>      committed index: (3, 151498567)
ID: 31befae03bee564b1128007b7a28e18b0aef7914037d6389dae7128eb9bb68ce rebuilt index: <not found>      committed index: (3, 291581395)
ID: dffe841110ff967aab923781a40bf6065af2d4d50fe2d81a46960d806e49e8b4 rebuilt index: <not found>      committed index: (3, 522419408)
ID: cc856f61bdfbc6b370473e45dad5c31da5f38016dacb6b6180011e49aa3acbdd rebuilt index: <not found>      committed index: (4, 291221170)
ID: b6d593adc630da02eaa9808526ede38becbd50a27e9436dc053c1c2d4c61c410 rebuilt index: <not found>      committed index: (4, 179729197)
ID: 5d6b4271a3818f72e2ffba88d659d5b6a3982e20982ff92ccea6dd066f9ba1da rebuilt index: <not found>      committed index: (4, 107289980)
ID: 4c27e08f4137d9af74b7c088bf033d37388dc88a06777142fe478272d2808451 rebuilt index: <not found>      committed index: (4, 367697211)
ID: 9d133736aee4727b3887ee0904588fb38a6001e42bd455272e9177b27e281154 rebuilt index: <not found>      committed index: (4, 2527836)
ID: fa5abeb655ee3dc29a1163f11d2c1f0480fe626f83fc2939c13e23ba21ac82e9 rebuilt index: <not found>      committed index: (3, 291471619)
ID: 7f4482d824b03f2146bf35d9040d130f988f8b1d3e2d1f53360402a708a7fa40 rebuilt index: <not found>      committed index: (4, 149870567)
ID: 6085bb1a05269fd6ca4fa872ba59f6d8f0bcd5cbd62d382ac8607674241d0946 rebuilt index: <not found>      committed index: (4, 415591989)
ID: 77cc39d0b46712aa0abab7042b77d2218d7752769c041c1640b1fb43f9f6c084 rebuilt index: <not found>      committed index: (4, 173543148)
ID: 1405dd11824bcdeb37289ce7d78c184b094d05e19d36e1c836458e26bf623b62 rebuilt index: <not found>      committed index: (3, 179611156)
ID: 255aa8b6f7f282b1161470ba627881b390fc09dfea236daea7ffcccb8b4bbfb6 rebuilt index: <not found>      committed index: (3, 161062396)
ID: 028717a863e342a398172176897c18906c52c2816401b1d2ca582b5a7150f7e6 rebuilt index: <not found>      committed index: (3, 480370460)
ID: 6571f55f32de16e8d5e5b1264459ae6f147c8b73942880a47ffdd4bcd397d821 rebuilt index: <not found>      committed index: (4, 277248785)
ID: 1a57d33a882d84e2b6b002072cba3dd96bb44db0bd1b0a651d43577e589da282 rebuilt index: <not found>      committed index: (3, 69051143)
ID: bde987f57a8db53f94fd303c159c23dfd2c68293cab8da23a756a91fd06ceb7e rebuilt index: <not found>      committed index: (4, 420151169)
ID: 11a2040bcea3347232ad687e9485820c0157a941b1cf1e925fb251937302d1dd rebuilt index: <not found>      committed index: (3, 262373979)
ID: 96f3f0332d3260ec6ad7d634ce6e169c3db10571823e768495ac5437e2d42a1e rebuilt index: <not found>      committed index: (4, 109203706)
ID: 7770e22aac2cb6118c071d0d5a2c2474951e69dc59e2d62f59ee57751c9a2996 rebuilt index: <not found>      committed index: (4, 275161019)
ID: f98b6a73782665ad32fe15ea765661d43bcd10d6f9fb7d12e8e128cbdc517307 rebuilt index: <not found>      committed index: (3, 392840345)
ID: 14e371446eec305a9fe8aac55688ec80f9ec37ca53861dc7d5d53e048d4f94f4 rebuilt index: <not found>      committed index: (4, 56966114)
ID: 43aacbcd3c13f5d6b944bd657729d29523b2eaff7a36a423b96291cf5cfda942 rebuilt index: <not found>      committed index: (3, 22529316)
ID: 8d7f80dcbbc413ba2d79405426710ea46541647c8ddcdbf6fb33f22c7bc278aa rebuilt index: <not found>      committed index: (4, 265425042)
ID: af283f148c89287385d03b0bb6293deb27fd5991b2c89a907f45eb8d319ca899 rebuilt index: <not found>      committed index: (4, 28232981)
ID: a662e629a59b0d59dc44def8db25709c08f1a7dbb8d827722610a256abcc0f8a rebuilt index: <not found>      committed index: (4, 136466467)
ID: ccf9a45772b04850b7a83b31ff047e530a2757d15d7a5a13900c9f23db63630f rebuilt index: <not found>      committed index: (3, 157121256)
ID: cd5a4e93a1f5e3424fa33c218c1401c4323110aeed49d6c3149af5c1263b47f2 rebuilt index: <not found>      committed index: (4, 113405716)
ID: c1de703288e092b6a67c3c9c361b0b76c00cb03a56f906d90f2fb1e16702733b rebuilt index: <not found>      committed index: (3, 22644657)
ID: e5fe4a7b24c28042a603f92ffa6fb484b77bef41bcbc6b27a588baa04cfc3dc9 rebuilt index: <not found>      committed index: (4, 265500138)
ID: f13d0604280adfe69f1f5bf80ce3c88b60ebfbb2c98266979ff2d1058a58b1ca rebuilt index: <not found>      committed index: (3, 22547721)
ID: b764042a804716e20bd05f826026b748790d19e725008d8df855006286ea625d rebuilt index: <not found>      committed index: (4, 273512714)
ID: 28b35bce6fa2d8988ca13814e54414ef5b06d8d5ec55553276df78524655a2ee rebuilt index: <not found>      committed index: (3, 218837391)
ID: 8f4908fc01aa49885801e0a75f236ba87de6bdad81ad019251e76683b4b70b3d rebuilt index: <not found>      committed index: (4, 167907316)
ID: 35802f84cb7c93faa3a7cb9ec2724a404a4afea1a3eedf35856634961f7d141e rebuilt index: <not found>      committed index: (4, 211399573)
ID: ef0cddc261fe7e6a1e80cbc03f0a2d71be14c10dd5841acd691070083239ef1a rebuilt index: <not found>      committed index: (4, 164660497)
ID: dd724d1986d73a449fa12d1bc4847fa718c5aea86aac93fb4fab3d693037e179 rebuilt index: <not found>      committed index: (3, 68494635)
ID: 672706b30630917a7c0f33fa11fd6d5632ce5c8b29f66959c7df4246c8bd19ae rebuilt index: <not found>      committed index: (3, 22632076)
ID: 28495ab55d7dd0d14b2f5293acb0a0a49c44c06e69c7c05741fb1b6a48f4406a rebuilt index: <not found>      committed index: (4, 276998367)
ID: 7d1bee2486a2602e37cb0581b940f9c0dcab0b41cc73fa53f7810c1c362faafc rebuilt index: <not found>      committed index: (3, 246888243)
ID: 13508292dbbdb244e1a9ae5f37eeb182f5f197b198f7a69288158d365812cf9a rebuilt index: <not found>      committed index: (3, 387751507)
ID: 0b2b5714640b68016a5271fbbc4d24e219a36ebfc678a330b185c39e3301bfe3 rebuilt index: <not found>      committed index: (4, 350425976)
ID: dbbba541682c59cac5d5c0ad569630ce658ffc44cbf661e7cdde04d62425baa3 rebuilt index: <not found>      committed index: (4, 279036419)
ID: a27a253d2f5384559cbc23e0786f51943b616df18bbc5a191cd43e468ef99d03 rebuilt index: <not found>      committed index: (4, 199643539)
ID: 6941c9e9c968977ee2e62b3f6d991ce5ad290e8ee14626eef1d57ad855900d54 rebuilt index: <not found>      committed index: (4, 478424121)
ID: 05f6498c81c94c08b8987ec07e1166d76f0aaf21ab5c910709ba828e3b164bf4 rebuilt index: <not found>      committed index: (3, 21758474)
ID: 10dded43c83da53d87a9b5346a311f5bb31e03050b7815cda9c71754f3abec22 rebuilt index: <not found>      committed index: (3, 70720220)
ID: 3311342473700a6dc7b3d9d6572144d3a20985b684d19ede25dd3c1e699399c2 rebuilt index: <not found>      committed index: (4, 275936761)
ID: 45417891bb9311663f1a39beaa2139949103af0d72afb331d66fbc808db9d981 rebuilt index: <not found>      committed index: (4, 277100259)
ID: d52197e74c8dbd9befb33b0341f96fa0a022db48a2897fc4f95cb1cf26e1f1ad rebuilt index: <not found>      committed index: (4, 172784)
ID: 577b6cffce9ef772574aed84c66ef52e7c9881b60fd5512e572e3db790c9e400 rebuilt index: <not found>      committed index: (3, 414133415)
ID: 9287a0b9ed9a64c4ee16cb427d5abab624a8573317e6d10417cfad648b53d43b rebuilt index: <not found>      committed index: (3, 62539773)
ID: 518cf98496c479eb3a758deddf686d95fdc1fe229358878c52a175bc38bb12a4 rebuilt index: <not found>      committed index: (3, 357142353)
ID: 8339a4038572b857c038ca6b6597ad2697cecc706eb6659ec9e7f0d8dfba5d5f rebuilt index: <not found>      committed index: (4, 65027294)
ID: 3ab8045318d1e5f9a3e15b49124b3ebf782f3ea1d606ea8b67776f9bf770d0b9 rebuilt index: <not found>      committed index: (4, 203315452)
ID: f6604ba5493097534d1d583d3c65e643d535f2e4650d7fb54bad41250e0739a6 rebuilt index: <not found>      committed index: (3, 433087484)
ID: 2e8d0f1b97748208fa497cd6df4349cc4e296a11d075950054e92ae2d5aedbc4 rebuilt index: <not found>      committed index: (4, 71012965)
ID: 2e2b35e8536c150887ca53e3c87b87dc844760204caa2638f19a9edc56768133 rebuilt index: <not found>      committed index: (4, 44031323)
ID: 5288a1eee54896748274bd15fc601582f09e102afc39b471c25c0f85150559e0 rebuilt index: <not found>      committed index: (3, 15489617)
ID: 2a7666b85ebbb521abff2d5f3b678ee7d57ffc42ecc2e9d83b46a0ec0d35c145 rebuilt index: <not found>      committed index: (3, 415013731)
ID: b1dea18a2519217ca26ecc8b0ac11374f1b27674c5c9cf391bedcc1e06e9fda8 rebuilt index: <not found>      committed index: (4, 232890394)
ID: 4b098bc2936a65f84068340ce0bb194562251c233e6ed5c95c5dd9f9357b9227 rebuilt index: <not found>      committed index: (4, 256010712)
ID: 7bb84dab3a99ad20c5883c26e98c846a37a0ef0deab0624aa67d8a3da40ebd26 rebuilt index: <not found>      committed index: (4, 477831989)
ID: 9a8f2985967b6dcb4d8c1ca9e49b611b8992719bacbdb92e70e7bb24778e0642 rebuilt index: <not found>      committed index: (4, 271110529)
ID: 020afd118b17a540fbaa63d000d9cc6581df026982fc85e320441c4bba236d0c rebuilt index: <not found>      committed index: (3, 250267486)
ID: 31ede9ae5c85c0537f2de738b7b1fa2b05811326261eccae4eb7bea04d03ddc3 rebuilt index: <not found>      committed index: (4, 302028075)
ID: a544fc4aadca1c1d31fc99c92abb4a3d3c79bab3b824b590fe949ebe96f648a8 rebuilt index: <not found>      committed index: (4, 321235456)
ID: 75406027c0c5946b93edacb47c34db63da110487811809628c509f50ff142dac rebuilt index: <not found>      committed index: (3, 72486651)
ID: 6576c13c4a2cb93984be705a2fd6387c703eb6d362881b0ac64b49723c569a51 rebuilt index: <not found>      committed index: (4, 507500593)
ID: 6a0bb158c3e3a253bc8ef1f9679b7bb1ec3f8d0e28af2c185a1e194cc9e48716 rebuilt index: <not found>      committed index: (3, 160974271)
ID: 08ab55aaabbbdc64fc5431adecdb55d80df886ef224ad9dd9ebf93344ede5885 rebuilt index: <not found>      committed index: (4, 163588750)
ID: 9d4fb580ffa90784fb5e8fd1d9b2514d21305de1e07a8b25a848a1b3a50be098 rebuilt index: <not found>      committed index: (3, 156650330)
ID: 958117ee0f74b65c976b2b972ba0700ac111addaf4ea1ea1689a715fc741d1f5 rebuilt index: <not found>      committed index: (3, 70266108)
ID: c83362fa1814d735211053db1bb8c04104f73ba00ae0782b183ea55cefaf08b3 rebuilt index: <not found>      committed index: (4, 24153297)
ID: 71c97a0d44a3efe360f0cc827acd7353bcf9e7d99d04acc8855656af7fe66555 rebuilt index: <not found>      committed index: (4, 186931413)
ID: 674d2bc748d5be929f47fcf04c9b2876ae46f95336509eccf2084b2699cc0ef2 rebuilt index: <not found>      committed index: (4, 515211010)
ID: 398e51a8cbfb092d38f999ce8573a333b6097fc4a846e68a33f4a4b147044033 rebuilt index: <not found>      committed index: (3, 70274054)
ID: 64b11c2f9903cee5d6d4105ca262a7dfc84263a7c1c834976a4ae96368ff5d2b rebuilt index: <not found>      committed index: (4, 190366332)
ID: 33b49987fd6697a888d6befc7d7c7a34f04f5860b73956e68ac4705d4f98728a rebuilt index: <not found>      committed index: (4, 466116741)
ID: d551d64b96c967b7eba4f3eb65a043a6d02d986f58c39e62994f546efd5f0188 rebuilt index: <not found>      committed index: (4, 280099299)
ID: 634485c6370630d91575f90866bd5632c317be7824c973ee017786494c860a68 rebuilt index: <not found>      committed index: (4, 118890493)
ID: 7ca537cd8c9d3316cf4b1cd4f37187920d1254e293a05f25c1b151d7fe8f3385 rebuilt index: <not found>      committed index: (3, 243460264)
ID: 7ea20d633598b51188c671022d938523f4423dc9197359ebbc62d9196240b2dd rebuilt index: <not found>      committed index: (3, 432062889)
ID: f457d6fc9263513714152e82e9fc241dea1643f5c953eafa6efbd8126f15bb39 rebuilt index: <not found>      committed index: (4, 317518304)
ID: 198b193e9d80bbc51f64684d4b609c6f4e532fd8273b02a193e1b0389b00c1a9 rebuilt index: <not found>      committed index: (3, 380040122)
ID: c4731f83b165e897c670cdd4b9c6a5974cb5e904f6735f12b96663bdfd7ce937 rebuilt index: <not found>      committed index: (4, 496262601)
ID: df3b93a297c9b8835561dfc65edd228c1d5491838f2de19a7352bcb4e3d5b433 rebuilt index: <not found>      committed index: (3, 429927242)
ID: 7212657c9bf81a9a98ca06d9853fb761423a3dcfd3fb33f4afebdc63b8137121 rebuilt index: <not found>      committed index: (4, 223955328)
ID: 63ad60df07b4fea4842f761be2384ca515e4bf210fa325b2b24ee7ecd2c6fa30 rebuilt index: <not found>      committed index: (4, 241356903)
ID: 4288e60816e5433d8689d2ef811cdd843757a08b2f8f03c5d771c7b77f34b3d5 rebuilt index: <not found>      committed index: (3, 506544930)
ID: de5eb317b36a1ed6050455c0749d6aaf06dda72f6edd548824170869f4e0a8f0 rebuilt index: <not found>      committed index: (3, 447379889)
ID: 1fac0052c90f9edf6e511c0442da6f43214c4dc307968f379b4cd35a00790099 rebuilt index: <not found>      committed index: (4, 474218836)
ID: 50c849427573497958f20ad6f1651bec4b61f04fca69e208f6ef1a14b4cf5c27 rebuilt index: <not found>      committed index: (3, 302193533)
ID: b4556605ab52516c273fbd9ce956c62229062b7727ef30f4c7d832a5ea1a6a08 rebuilt index: <not found>      committed index: (3, 490367157)
ID: f05447a44ad84b20ba7f376e68bab318dc7d28c6fc7cbd2e04f9ecea93646669 rebuilt index: <not found>      committed index: (4, 199357078)
ID: 6a39c22685563afff23295ff01d166d80f3b84722e8902c1d6cd76e1927c7e83 rebuilt index: <not found>      committed index: (4, 392784132)
ID: 1c78142b0f920b5f5f1dbb444d8ebd99c03308890af0b07ee3509ae3a871d290 rebuilt index: <not found>      committed index: (4, 369015550)
ID: b55992d99e4c8080e3d2afaadd7d2fb25437b4be11432eff6ded5c8d875c6cea rebuilt index: <not found>      committed index: (4, 316513787)
ID: 7a1922d597f75f1414563a747bccc9d034524c9627dd8521743d3f43a4a4a066 rebuilt index: <not found>      committed index: (4, 179005677)
ID: ecda900e5e37a3890e5e35fa72c9976b35bdc9580a81cbbcb817635689a12e93 rebuilt index: <not found>      committed index: (3, 154470304)
ID: 09ea2bf52c8aea6d90c040dd4752781e4c77af7c062b72b43faff2a718d5260e rebuilt index: <not found>      committed index: (4, 483815017)
ID: 5f2c1cb301006c34e9651cb11e6077407e4993a04328a50b77ec25bf7f1089c6 rebuilt index: <not found>      committed index: (3, 145569455)
ID: d3d1d4e1124278cebb4242dc45446ca81f944e88fdb884403a88c40487c3cf1b rebuilt index: <not found>      committed index: (4, 170586873)
ID: 9b23335a8a0aecf8ae1fcda944e30c4b930a42bf0d4bcfb58ba5ed1a2553eb52 rebuilt index: <not found>      committed index: (3, 172177812)
ID: ad775919dec4ff4304daa3357d577085ca2e224b1d199df0a6c1712d14c9d275 rebuilt index: <not found>      committed index: (3, 371703773)
ID: ee0e6f35831d917da97e251a7e65c4b02294064f869f7032aeaeb992cb6f4d84 rebuilt index: <not found>      committed index: (4, 71565994)
ID: 09b63e99dd3b13198654565cf7514be38a24334f807b930923baad8f23d7e98a rebuilt index: <not found>      committed index: (3, 21555390)
ID: c6261602adbd5061a8f1e9e1fff29f35b572bb959eafb105d233b14501d68f2d rebuilt index: <not found>      committed index: (4, 277561540)
ID: 26ed6d215f1fdb0ebcd9324c450b73e1f68ad3a90a2b6d07fd4ad1ce5c2a7f7c rebuilt index: <not found>      committed index: (4, 30380972)
ID: 62a900df7974e236ac4fa69d8554ecb7718fed11524ebf52c8e236c1a5c5e7a6 rebuilt index: <not found>      committed index: (4, 442930166)
ID: f64ba404a85f1b080bc5224417deef8802b5f25483ffb34f6218a4c2587e67f0 rebuilt index: <not found>      committed index: (3, 501016972)
ID: 3bd1c1ac366ac5f3ab1dbe8a6193ab0fbbf235fe63a19516ea9653903f5b310b rebuilt index: <not found>      committed index: (3, 155081188)
ID: b0778cdbcd6de8f77cf17e8f7159e923fa6b0a5a206d2a390686be7f37578ff5 rebuilt index: <not found>      committed index: (3, 62450223)
ID: cc219ed594c2f9c95dc9333a4425d5c83f4e112d17a0f274c678115397366b2d rebuilt index: <not found>      committed index: (4, 226856072)
ID: 58b4222f2416d141d9d51e4cdefb733aacd54819c53872cde18cff4e64800e4a rebuilt index: <not found>      committed index: (4, 40492097)
ID: 159b01a9393ad55c7f5114646f7f7ce0a06a7ecbd56e5b34f89b45c09715d7eb rebuilt index: <not found>      committed index: (3, 501834374)
ID: ae1d50d5f0b035ee33c235416e533bc581f9b9a67589f7bce28cc0863dc30397 rebuilt index: <not found>      committed index: (3, 409198737)
ID: 95d835adff563ca6b6d03f2415a74f4958e44944815805a0d5ac98e437ed3bfd rebuilt index: <not found>      committed index: (3, 70740667)
ID: 64e74079a1dcba7af6e115ea8503d4afd88acfc2ab036e428061991c24abe4c2 rebuilt index: <not found>      committed index: (3, 379085808)
ID: 32d5fe777ba26fd4ef51006e2fc75876236514b03d2e7e35c1b3242eba7fcafa rebuilt index: <not found>      committed index: (4, 274245004)
ID: 1d8077603ff9eed3d4ffdf16a2924cce26cecaef4ad3b245404d18aebeaeb0ac rebuilt index: <not found>      committed index: (3, 64189720)
ID: ecef8256b00b0ed0d2dfc57ee5aa601dfa454e79b1cac550072c819e31155ea8 rebuilt index: <not found>      committed index: (3, 22373647)
ID: e7eb980f5338c83bde23a4b0030cbc0f3e733f5e83bac516527849278f300a14 rebuilt index: <not found>      committed index: (4, 318903292)
ID: 87d385892254efa0f1537c3d200a10edad4bcefb0f96223c45cb38886a9d043e rebuilt index: <not found>      committed index: (4, 279120577)
ID: 06fd1634aa4a476faeacc3a50427b90a61c4577de930ac0bd4a483a3a7153742 rebuilt index: <not found>      committed index: (4, 475937993)
ID: 3f9bf87328cb1ea7c9e9d16eac83cb3d0cb33d3654993226a52588b77def2303 rebuilt index: <not found>      committed index: (3, 21532564)
ID: c9f8138cd0f357949e7d40bbefddd75d96db24ad5cc993f7224dbceba58f2001 rebuilt index: <not found>      committed index: (4, 338826452)
ID: ee9ff850a9cc08943b34a392bb38645f15f33342caebda60e5098de94a34a573 rebuilt index: <not found>      committed index: (3, 70714036)
ID: d734dfa1b16b787d635bc41e0a7de86e61d614b508e737024d9035c3d85ffbde rebuilt index: <not found>      committed index: (4, 269809977)
ID: 761ea5c372a6bcb0731b67cff6a6fb2148654614b3a009c71af4d88fcbdf0de0 rebuilt index: <not found>      committed index: (4, 273967746)
ID: 2d5175eaecb96983ef779675bf1b377e27c6f7586c1465f0472ef32272979011 rebuilt index: <not found>      committed index: (3, 15741929)
ID: aac5e076644382b179a68ec39e94ca06ce29b2c9cce1abe30f25b9e24e7863bc rebuilt index: <not found>      committed index: (3, 405014587)
ID: 3d17864738082677d2cab1c5dbb0792d977c3b5a5889fb25eabb47bc3f820ba4 rebuilt index: <not found>      committed index: (4, 161118072)
ID: 3305e3c1b5b01088bd26cf7ec45ad2d9a05045ee204b5d8b7cffdad038c04c11 rebuilt index: <not found>      committed index: (3, 486103507)
ID: 1c674cda114183b38f4af9bf2c5cd0185c6a79a927e5723a0d78178c66e64119 rebuilt index: <not found>      committed index: (4, 391727238)
ID: 6a369eb3542ab64bbcccdf7897f00cebb4c61d4290bd9489d8a0a8dc97f87ff7 rebuilt index: <not found>      committed index: (4, 208236964)
ID: a96a2fec22369dc8a966fa81a582c3958009cb719aca9c1d4e316733a63f3e8c rebuilt index: <not found>      committed index: (3, 269377440)
ID: d507128dd792df5df0c46ffd452e0f04e48567c333d78d4dd1e78d5278858975 rebuilt index: <not found>      committed index: (3, 21738911)
ID: 7b2569e72a47dacb2fdbae65ca8e9ce19c562dbe7c29bd5f699b77dbdc193251 rebuilt index: <not found>      committed index: (3, 70248925)
ID: 378d4496de4ec5292c5e0bc5a05aad0d13a62f294037e3bd81f071f1b64277af rebuilt index: <not found>      committed index: (4, 186760844)
ID: f35911813a8bf8c5d3e10fd94c3fe4bbba4da172be247cf2897e0ff40d5e1ea4 rebuilt index: <not found>      committed index: (4, 343514998)
... REDUCTED. ..
... REDUCTED. ..
ID: 58246ed67c3b119e4727146d43df90c111fc8e6b2f0179e4fa5019b19a7b5d71 rebuilt index: <not found>      committed index: (3, 158276185)
ID: eb9bbd3f71f8909555ec28ec24d03fbc6ea76a5cd195e53e6bf08150427096b5 rebuilt index: <not found>      committed index: (4, 491439149)
ID: 0d077735a44adc2e87045029483f7ab4194f037bd9ce237845c4d7add143ee3a rebuilt index: <not found>      committed index: (4, 303670071)
Finished full repository check, errors found.

While pruning from a client (running for example borg 1.2.1 or borg 1.1.18):

Warning: Data integrity error: Segment entry checksum mismatch [segment 8, offset 216518218] Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4886, in main exit_code = archiver.run(args) File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4818, in run return set_ec(func(args)) File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 178, in wrapper return method(self, args, repository=repository, **kwargs) File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 1705, in do_prune repository.commit(save_space=args.save_space) File "/usr/lib64/python3.6/site-packages/borg/remote.py", line 477, in do_rpc return self.call(f.__name__, named, **extra) File "/usr/lib64/python3.6/site-packages/borg/remote.py", line 712, in call for resp in self.call_many(cmd, [args], **kw): File "/usr/lib64/python3.6/site-packages/borg/remote.py", line 780, in call_many handle_error(unpacked) File "/usr/lib64/python3.6/site-packages/borg/remote.py", line 740, in handle_error raise IntegrityError(args[0].decode()) borg.helpers.IntegrityError: Data integrity error: Segment entry checksum mismatch [segment 8, offset 216518218] Platform: Linux HOSTNAME 5.15.0-210.163.7.el8uek.x86_64 #2 SMP Tue Sep 10 18:31:09 PDT 2024 x86_64 Linux: Oracle Linux Server 8.10 Borg: 1.1.18 Python: CPython 3.6.8 msgpack: 0.5.6.+borg2 PID: 1731465 CWD: /root sys.argv: ['/bin/borg', 'prune', '--keep-weekly=1', '--keep-monthly=2', '--keep-yearly=1', '--keep-within=12d', 'ssh://adminbu@HOSTNAME/data/REPO'] SSH_ORIGINAL_COMMAND: None
Warning: Failed removing old backups.
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 3, 2024

If it complains about segment entry checksum mismatches it means that the crc32 stored with the data does not match the data anymore. So there is corruption in these segment files.

Segments 3, 4, 8 are old segment files (it starts from 1 and the number is only increasing) likely written within the first (or first few) backup(s).

That could likely be a hw issue:

  • check memory, run memtest86+ for at least one full pass
  • run smartctl -t long on the storage devices and after that, check result with smartctl -a

In any case, do at least that before trying borg check --repair, to make sure your hw works correctly!

The lots of index mismatches (missing entries in the rebuilt index) borg check complains about is likely cause by corrupted segment entries or completely missing segment files.

Why it only complains with borg prune or borg check and not create is because create does not read that old data and thus does not detect the issue. Even if it reuses such an old chunk, it determines its presence from the index and does not read it from disk.

A sw issue can't be ruled out completely, of course, and you have a relatively complex stack of sw in the I/O path. I don't think it is related to compression.

If you did recompress / rewrite the data contained in the borg segment files in the past, it could be that a malfunction in the involved hw back then corrupted the file contents while doing that and it even could be that the hw issue is now not present anymore.

If you run borg check with or without --verify-data periodically, you should notice such problems though. Even "without", it does a crc32 check of all data stored in the segment files.

@ThomasWaldmann
Copy link
Member

Tagged it hardware-issue for now, the probability for that is rather high.
Could be also some software sitting in the I/O path between the hw and borg.

Very likely an issue "below" borg.

@gecon
Copy link
Author

gecon commented Dec 5, 2024

We shutdown the server, booted into rescue and performed an extensive hardware check for hours.

Everything seems fine (unfortunatelly).

Starting to think that we have something related to bcachefs and borg especially here.

Hardware Check report

CPU check: OK
  CPU 1: OK
    Temperature: OK
    Clock speed: OK
Memory module check: OK
  DIMM 1 `026C4688`: OK
  DIMM 2 `026C3F05`: OK
Disk check: OK
  NVMe SSD `S676NU0W570341`: OK
    S.M.A.R.T Tests: OK
    Error counters: OK
  NVMe SSD `S676NU0W570354`: OK
    S.M.A.R.T Tests: OK
    Error counters: OK
  SATA HDD `2GJ2KENS`: OK
    S.M.A.R.T Tests: OK
    S.M.A.R.T Self-Test: OK
    S.M.A.R.T Health self assessment: OK
    Error counters: OK
  SATA HDD `ZGG4ZGTA`: OK
    S.M.A.R.T Tests: OK
    S.M.A.R.T Self-Test: OK
    S.M.A.R.T Health self assessment: OK
    Error counters: OK
  SATA HDD `ZGG4LS4B`: OK
    S.M.A.R.T Tests: OK
    S.M.A.R.T Self-Test: OK
    S.M.A.R.T Health self assessment: OK
    Error counters: OK
  SATA HDD `ZGG4ZL5A`: OK
    S.M.A.R.T Tests: OK
    S.M.A.R.T Self-Test: OK
    S.M.A.R.T Health self assessment: OK
    Error counters: OK
NIC check: OK
  PCI-E NIC `50:eb:f7:22:ed:b5`: OK
    Negotiated speed: OK
    Error counters: OK
    PCI error counters: OK
Stresstest: OK
System log check: OK
-------------------------------------------------

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 5, 2024

Be careful with such "overall" tests (especially if it is unknown how thorough they do each of the tests they do). If these are optimized to not take extremely long, the individual tests might be rather on the quick side.

I've long-term good experiences with memtest86+ and smartctl -t long, so that's why I recommended them.

On some linux dists, memtest86+ is available from the bootloader menu (but if one boots via UEFI, one needs a rather recent version, the older ones only support legacy booting). Recent versions are available from https://memtest.org .

@gecon
Copy link
Author

gecon commented Dec 11, 2024

We completed some extra tests, besides those we did last week.

We did one complete pass on memtest86+ and smartctl -t long on all disks.

No errors, we got a PASS on all of them.

* smartctl long test completed:

sda, sdb, sdc, sdd:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4879         -
# 1  Extended offline    Completed without error       00%      4878         -
# 1  Extended offline    Completed without error       00%      4963         -
# 1  Extended offline    Completed without error       00%      4963         -

nvme0, nvme1:
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed without error                2708            -     -   -   -    -
 0   Extended          Completed without error                2677            -     -   -   -    -


Memtest86+
image

I am reporting the above for the sake of completeness.

I am worried that we might have something due to bcachefs and borg here, but again no problems have arised besides borg errors on the same storage.

@ThomasWaldmann
Copy link
Member

OK, so memory and disks on the server look ok (now).

borg is just doing fs api calls, so there shouldn't be "borg specific" issues in the layers below borg. BUT, due to the amount of I/O borg is doing, some rare issues might get triggered rather by borg than by less intensive usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants