Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial reboot succeeds, leaves console hung. Successive reboots fail and hang system in unusable state. #9

Open
haroules opened this issue Dec 20, 2024 · 2 comments

Comments

@haroules
Copy link

This is an awesome idea and is likely to be a very desirable feature, especially for platform engineers, DevOps, and IT admins. Thank you for publishing.

Environment:
OS: Ubuntu 24 Server (Noble) minimal install.
Physical hardware (intel based).

Ran the following to install (0 issues reported):
sudo apt install --no-install-recommends cryptsetup-initramfs kexec-tools ruby strace systemd
sudo gem install crypt_reboot

first pass:

sudo cryptreboot
[sudo] password for name: 
Extracting initramfs... To speed things up, future versions will employ cache.
Please unlock disk pvpart-crypt: 

Broadcast message from root@host.example.com on pts/1 (Thu 2024-12-19 18:40:50 EST):

The system will kexec now!

system reboots as expected, however the console shows nothing (kvm directly attached). The system however is accessible via ssh. Reports it was rebooted checking uptime.
Then while checking dmesg over ssh shows page fault:

[Thu Dec 19 18:40:59 2024] BUG: unable to handle page fault for address: ffffa07500900000
[Thu Dec 19 18:40:59 2024] #PF: supervisor read access in kernel mode
[Thu Dec 19 18:40:59 2024] #PF: error_code(0x0000) - not-present page
[Thu Dec 19 18:40:59 2024] PGD 100000067 P4D 100000067 PUD 10029f067 PMD 10a650067 PTE 0
[Thu Dec 19 18:40:59 2024] Oops: 0000 [#1] PREEMPT SMP PTI
[Thu Dec 19 18:40:59 2024] CPU: 7 PID: 602 Comm: (udev-worker) Not tainted 6.8.0-49-generic #49-Ubuntu
[Thu Dec 19 18:40:59 2024] RIP: 0010:ioread32+0x3a/0x80
[Thu Dec 19 18:40:59 2024] Code: 76 0e 89 fa ed 31 d2 31 f6 31 ff c3 cc cc cc cc 8b 05 da e2 e4 01 85 c0 75 1d b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc <8b> 07 31 d2 31 f6 31 ff c3 cc cc cc cc 55 83 e8 01 48 89 fe 48 c7
[Thu Dec 19 18:40:59 2024] RSP: 0018:ffffa07501e63178 EFLAGS: 00010292
[Thu Dec 19 18:40:59 2024] RAX: ffffffffc14e6150 RBX: ffff88e5c5fb4600 RCX: ffff88e5c5fb46b0
[Thu Dec 19 18:40:59 2024] RDX: 0000000000000000 RSI: ffffa07500900000 RDI: ffffa07500900000
[Thu Dec 19 18:40:59 2024] RBP: ffffa07501e63180 R08: 0000000000000000 R09: 0000000000000000
[Thu Dec 19 18:40:59 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa07501e631c4
[Thu Dec 19 18:40:59 2024] R13: ffffa07501e632c0 R14: ffff88e5c5fb4608 R15: ffffa07501e631d0
[Thu Dec 19 18:40:59 2024] FS:  00007bd5006628c0(0000) GS:ffff88ecfe380000(0000) knlGS:0000000000000000
[Thu Dec 19 18:40:59 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Dec 19 18:40:59 2024] CR2: ffffa07500900000 CR3: 0000000103fc6004 CR4: 00000000003706f0
[Thu Dec 19 18:40:59 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Thu Dec 19 18:40:59 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Thu Dec 19 18:40:59 2024] Call Trace:
[Thu Dec 19 18:40:59 2024]  <TASK>
[Thu Dec 19 18:40:59 2024]  ? show_regs+0x6d/0x80
[Thu Dec 19 18:40:59 2024]  ? __die+0x24/0x80
[Thu Dec 19 18:40:59 2024]  ? page_fault_oops+0x99/0x1b0
[Thu Dec 19 18:40:59 2024]  ? kernelmode_fixup_or_oops.isra.0+0x69/0x90
[Thu Dec 19 18:40:59 2024]  ? __bad_area_nosemaphore+0x19d/0x2c0
[Thu Dec 19 18:40:59 2024]  ? bad_area_nosemaphore+0x16/0x30
[Thu Dec 19 18:40:59 2024]  ? do_kern_addr_fault+0x7b/0xa0
[Thu Dec 19 18:40:59 2024]  ? exc_page_fault+0x1a4/0x1b0
[Thu Dec 19 18:40:59 2024]  ? asm_exc_page_fault+0x27/0x30
[Thu Dec 19 18:40:59 2024]  ? __pfx_nv50_instobj_rd32+0x10/0x10 [nouveau]
[Thu Dec 19 18:40:59 2024]  ? ioread32+0x3a/0x80
[Thu Dec 19 18:40:59 2024]  ? nv50_instobj_rd32+0x15/0x20 [nouveau]
[Thu Dec 19 18:40:59 2024]  gp102_acr_wpr_patch+0xc3/0x1f0 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_acr_oneinit+0x41f/0x6c0 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_subdev_oneinit_+0x53/0x130 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_subdev_init_+0x40/0x150 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_subdev_init+0x50/0x70 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_device_init+0x17c/0x310 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_udevice_init+0x50/0x60 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_object_init+0x3f/0x1e0 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_ioctl_new+0x192/0x2e0 [nouveau]
[Thu Dec 19 18:40:59 2024]  ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
[Thu Dec 19 18:40:59 2024]  ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_ioctl+0x132/0x2b0 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvkm_client_ioctl+0xe/0x20 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvif_object_ctor+0x10a/0x1a0 [nouveau]
[Thu Dec 19 18:40:59 2024]  nvif_device_ctor+0x22/0x90 [nouveau]
[Thu Dec 19 18:40:59 2024]  nouveau_cli_init+0x163/0x650 [nouveau]
[Thu Dec 19 18:40:59 2024]  ? nouveau_drm_device_init+0x5e/0x370 [nouveau]
[Thu Dec 19 18:40:59 2024]  nouveau_drm_device_init+0xba/0x370 [nouveau]
[Thu Dec 19 18:40:59 2024]  nouveau_drm_probe+0x137/0x280 [nouveau]
[Thu Dec 19 18:40:59 2024]  local_pci_probe+0x44/0xb0
[Thu Dec 19 18:40:59 2024]  pci_call_probe+0x55/0x1a0
[Thu Dec 19 18:40:59 2024]  pci_device_probe+0x84/0x120
[Thu Dec 19 18:40:59 2024]  really_probe+0x1c4/0x410
[Thu Dec 19 18:40:59 2024]  __driver_probe_device+0x8c/0x180
[Thu Dec 19 18:40:59 2024]  driver_probe_device+0x24/0xd0
[Thu Dec 19 18:40:59 2024]  __driver_attach+0x10b/0x210
[Thu Dec 19 18:40:59 2024]  ? __pfx___driver_attach+0x10/0x10
[Thu Dec 19 18:40:59 2024]  bus_for_each_dev+0x8a/0xf0
[Thu Dec 19 18:40:59 2024]  driver_attach+0x1e/0x30
[Thu Dec 19 18:40:59 2024]  bus_add_driver+0x14e/0x290
[Thu Dec 19 18:40:59 2024]  driver_register+0x5e/0x130
[Thu Dec 19 18:40:59 2024]  ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
[Thu Dec 19 18:40:59 2024]  __pci_register_driver+0x5e/0x70
[Thu Dec 19 18:40:59 2024]  nouveau_drm_init+0x177/0xff0 [nouveau]
[Thu Dec 19 18:40:59 2024]  do_one_initcall+0x5b/0x340
[Thu Dec 19 18:40:59 2024]  do_init_module+0x97/0x290
[Thu Dec 19 18:40:59 2024]  load_module+0xba1/0xcf0
[Thu Dec 19 18:40:59 2024]  init_module_from_file+0x96/0x100
[Thu Dec 19 18:40:59 2024]  ? init_module_from_file+0x96/0x100
[Thu Dec 19 18:40:59 2024]  idempotent_init_module+0x11c/0x2b0
[Thu Dec 19 18:40:59 2024]  __x64_sys_finit_module+0x64/0xd0
[Thu Dec 19 18:40:59 2024]  x64_sys_call+0x1d6e/0x25c0
[Thu Dec 19 18:40:59 2024]  do_syscall_64+0x7f/0x180
[Thu Dec 19 18:40:59 2024]  ? vfs_read+0x2c7/0x390
[Thu Dec 19 18:40:59 2024]  ? vfs_read+0x2c7/0x390
[Thu Dec 19 18:40:59 2024]  ? rseq_get_rseq_cs+0x22/0x280
[Thu Dec 19 18:40:59 2024]  ? rseq_ip_fixup+0x90/0x1f0
[Thu Dec 19 18:40:59 2024]  ? syscall_exit_to_user_mode+0x86/0x260
[Thu Dec 19 18:40:59 2024]  ? do_syscall_64+0x8c/0x180
[Thu Dec 19 18:40:59 2024]  ? vfs_read+0x2c7/0x390
[Thu Dec 19 18:40:59 2024]  ? vfs_read+0x2c7/0x390
[Thu Dec 19 18:40:59 2024]  ? rseq_get_rseq_cs+0x22/0x280
[Thu Dec 19 18:40:59 2024]  ? rseq_ip_fixup+0x90/0x1f0
[Thu Dec 19 18:40:59 2024]  ? syscall_exit_to_user_mode+0x86/0x260
[Thu Dec 19 18:40:59 2024]  ? do_syscall_64+0x8c/0x180
[Thu Dec 19 18:40:59 2024]  ? irqentry_exit+0x43/0x50
[Thu Dec 19 18:40:59 2024]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[Thu Dec 19 18:40:59 2024] RIP: 0033:0x7bd50052725d

system otherwise seems functional aside from console hung/unresponsive. a second attempt at a reboot, and the system becomes entirely inaccessible (both console and ssh).

hard reboot (power button) and system is normal.

i was hoping to use this for automated reboots as part of ansible playbook, to which i already have an install task defined, and an asynch reboot/poll task which appears to work once only.

this occurs whether i manually type the reboot over ssh, or have ansible doing it.

@pepawel
Copy link
Collaborator

pepawel commented Dec 21, 2024

Hi, @haroules thank you for a detailed bug report and kind words, I really appreciate that!

I'm busy with other things right now, so I don't have much time left to debug this issue currently.
However, what you described suggests a kernel-level issue. Cryptreboot doesn't do any fancy kernel-level stuff. It just appends a cpio archive with 2 or 3 files to initramfs which is a standard way of extending it (most initramfs are composed of at least 2 cpio archives).

Therefore I suspect general kexec failure. I mean performing the raw kexec (without patching initramfs):

kexec -al /boot/vmlinuz --initrd /boot/initrd.img --reuse-cmdline

will lead to the same issues with the exception that the system will require you to provide the passphrase during boot.

If that's the case, the task is to find the kexec/kernel bug report and check what can be done, or if there is no report - create one. As I said, I can't do it right now, but I would be grateful for any info on this.

@haroules
Copy link
Author

haroules commented Dec 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants