-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcachefs: fix initial page state after falloc #249
base: testing
Are you sure you want to change the base?
Conversation
An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
this is used in only one place now, so just inline it into the caller. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Change fsck code to always put btree iterators - also, make some flow control improvements to deal with lock restarts better, and refactor check_extents() to not walk extents twice for counting/checking i_sectors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
comparison was wrong Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Prep work for snapshots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
…dvance The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree node iterators need to obey the regular btree node invarionts w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have less that it needs to check. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
More prep work for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
It had some silly redundancies. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
External (to the btree iterator code) users of bch2_btree_iter_traverse expect that on success the iterator will be pointed at iter->pos and have that position locked - but since we split iter->pos and iter->real_pos, that means it has to update iter->real_pos if necessary. Internal users don't expect it to modify iter->real_pos, so we need two separate functions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
After the v5.12 rebase, we started oopsing when truncate was passed ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This refactors things so that truncate/extend finish by using bch2_setattr_nonsize(), which solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Small improvements to some percpu utility code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Ensure that iter->should_be_locked value is set to true before we call bch2_trans_update in ec_stripe_update_ptrs. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
1778607
to
6ac3b1f
Compare
6ac3b1f
to
b755ab3
Compare
Updated and seems to pass xfstests. |
It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We weren't checking bch2_btree_node_read_done() for errors, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We need to ensure that packed formats can't represent fields larger than the unpacked format, which is a bit tricky since the calculations can also overflow a u64. This patch fixes a shift and simplifies the overall calculations. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The value of f_bfree and f_bavail should be the same. The value of f_bfree is not currently scaled by the availability factor. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Avoid calling kfree on the returned error pointer if bch2_acl_from_disk fails. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
When initializing the page state, set the page sectors state to that of what exists in the btree. This allows us to check if the backing extent was already allocated before adding to the inode i_blocks value. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
b755ab3
to
3fe6e97
Compare
Updated to only run the btree check if the page is not uptodate. I kept the check in the page state creation to ensure this check isn't run often. |
I think i have an idea for a better implementation of this. We don't seem to call write_begin for a buffered write. I think if we add that, i might be able to work something out there so that we essentially run the extra code added to |
…frontend" As reported by Thomas Voegtle <tv@lio96.de>, sometimes a DVB card does not initialize properly booting Linux 6.4-rc4. This is not always, maybe in 3 out of 4 attempts. After double-checking, the root cause seems to be related to the UAF fix, which is causing a race issue: [ 26.332149] tda10071 7-0005: found a 'NXP TDA10071' in cold state, will try to load a firmware [ 26.340779] tda10071 7-0005: downloading firmware from file 'dvb-fe-tda10071.fw' [ 989.277402] INFO: task vdr:743 blocked for more than 491 seconds. [ 989.283504] Not tainted 6.4.0-rc5-i5 #249 [ 989.288036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 989.295860] task:vdr state:D stack:0 pid:743 ppid:711 flags:0x00004002 [ 989.295865] Call Trace: [ 989.295867] <TASK> [ 989.295869] __schedule+0x2ea/0x12d0 [ 989.295877] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 989.295881] schedule+0x57/0xc0 [ 989.295884] schedule_preempt_disabled+0xc/0x20 [ 989.295887] __mutex_lock.isra.16+0x237/0x480 [ 989.295891] ? dvb_get_property.isra.10+0x1bc/0xa50 [ 989.295898] ? dvb_frontend_stop+0x36/0x180 [ 989.338777] dvb_frontend_stop+0x36/0x180 [ 989.338781] dvb_frontend_open+0x2f1/0x470 [ 989.338784] dvb_device_open+0x81/0xf0 [ 989.338804] ? exact_lock+0x20/0x20 [ 989.338808] chrdev_open+0x7f/0x1c0 [ 989.338811] ? generic_permission+0x1a2/0x230 [ 989.338813] ? link_path_walk.part.63+0x340/0x380 [ 989.338815] ? exact_lock+0x20/0x20 [ 989.338817] do_dentry_open+0x18e/0x450 [ 989.374030] path_openat+0xca5/0xe00 [ 989.374031] ? terminate_walk+0xec/0x100 [ 989.374034] ? path_lookupat+0x93/0x140 [ 989.374036] do_filp_open+0xc0/0x140 [ 989.374038] ? __call_rcu_common.constprop.91+0x92/0x240 [ 989.374041] ? __check_object_size+0x147/0x260 [ 989.374043] ? __check_object_size+0x147/0x260 [ 989.374045] ? alloc_fd+0xbb/0x180 [ 989.374048] ? do_sys_openat2+0x243/0x310 [ 989.374050] do_sys_openat2+0x243/0x310 [ 989.374052] do_sys_open+0x52/0x80 [ 989.374055] do_syscall_64+0x5b/0x80 [ 989.421335] ? __task_pid_nr_ns+0x92/0xa0 [ 989.421337] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421339] ? do_syscall_64+0x67/0x80 [ 989.421341] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421343] ? do_syscall_64+0x67/0x80 [ 989.421345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 989.421348] RIP: 0033:0x7fe895d067e3 [ 989.421349] RSP: 002b:00007fff933c2ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [ 989.421351] RAX: ffffffffffffffda RBX: 00007fff933c2c10 RCX: 00007fe895d067e3 [ 989.421352] RDX: 0000000000000802 RSI: 00005594acdce160 RDI: 00000000ffffff9c [ 989.421353] RBP: 0000000000000802 R08: 0000000000000000 R09: 0000000000000000 [ 989.421353] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 989.421354] R13: 00007fff933c2ca0 R14: 00000000ffffffff R15: 00007fff933c2c90 [ 989.421355] </TASK> This reverts commit 6769a0b. Fixes: 6769a0b ("media: dvb-core: Fix use-after-free on race condition at dvb_frontend") Link: https://lore.kernel.org/all/da5382ad-09d6-20ac-0d53-611594b30861@lio96.de/ Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after sg_miter_next() call. The driver then increments it as bytes are read, so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop(): WARN_ON(miter->consumed > miter->length); WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249 Hardware name: Generic DT based system Workqueue: events_freezable mmc_rescan Call trace:. unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x44/0x5c dump_stack_lvl from __warn+0x78/0x16c __warn from warn_slowpath_fmt+0xb0/0x160 warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c sg_miter_stop from moxart_request+0xb0/0x468 moxart_request from mmc_start_request+0x94/0xa8 mmc_start_request from mmc_wait_for_req+0x60/0xa8 mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150 mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420 mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c mmc_attach_sd from mmc_rescan+0x1e0/0x298 mmc_rescan from process_scheduled_works+0x2e4/0x4ec process_scheduled_works from worker_thread+0x1ec/0x24c worker_thread from kthread+0xd4/0xe0 kthread from ret_from_fork+0x14/0x38 This patch adds initial zeroing of sgm->consumed. It is then incremented as bytes are read or written. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Fixes: 3ee0e7c ("mmc: moxart-mmc: Use sg_miter for PIO") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240422153607.963672-1-saproj@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
On a new page reservation check if the backing extent was already
allocated before adding to the number of sectors we will allocate.
It is possible to write to a page that is beyond the inode i_size
that is also backed by an allocated extent. This can happen when
fallocate is used with FALLOC_FL_KEEP_SIZE.
Signed-off-by: Dan Robertson dan@dlrobertson.com