Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: Xavier AGX, NX eMMC and NX SD on L4T 35.4.1 #424

Closed

Conversation

acostach
Copy link
Collaborator

@acostach acostach commented Feb 19, 2024

Jetson migration testing

The sources in this PR can be used for testing the takeover project to migrate the following device-types:

  • Jetson AGX Xavier - yocto machine: jetson-xavier
  • Jetson Xavier NX Devkit eMMC - yocto machine: jetson-xavier-nx-devkit-emmc
  • Jetson Xavier NX Devkit SD-CARD yocto machine: jetson-xavier-nx-devkit

The following steps are necessary for building a Jetson Xavier AGX balenaOS image based on L4T 35.4.1. The resulting build artifact can then be used as a target OS image for triggering migration from a balenaOS running L4T 32.X to L4T 35.4.1

The following instructions assume that:

  • A Jetson Xavier Devkit running balenaOS is used as a lab device for testing the migration process
  • You have a setup for building balenaOS yocto images
  • Power is not cut at any time from your device, until the migration is completed successfully and the device boots into the new OS
  • All user data has been backed-up before the migration. No application data stored on the internal medium is preserved during the migration. The supervisor will re-download the application once the migration is complete and the device connects to the cloud

Building balenaOS for your Jetson device

Clone this repository:

git clone --recurse-submodules git@github.com:balena-os/balena-jetson-orin.git

Checkout the branch used in this PR:

git checkout wip_add_xavier_agx_nxsd_nxemmc_35_4_1_non_flasher
git submodule update --init --recursive

Build a development Yocto image (use -m jetson-xavier-nx-devkit for Xavier NX Devkit SD-CARD and -m jetson-xavier-nx-devkit-emmc for the eMMC device-type) :

./balena-yocto-scripts/build/barys -b build_agx_xavier_35_4 -m jetson-xavier --shared-sstate sstatedir/ --shared-downloads /yocto-downloads -d

Once the build finishes successfully, the balenaOS Yocto image will be generated in your build directory. Please note that the build timestamp will differ from one build to another:

balena-jetson-orin$ ls -l build_agx_xavier_35_4/tmp/deploy/images/jetson-xavier/balena-image-jetson-xavier-20240517145940.rootfs.balenaos-img

Compress this image uzing gzip and place it on your Jetson device which runs balenaOS, in the /mnt/data/ directory. The image must be gzipped for takeover to be able to use it:

root@ba8de4f:/mnt/data# gzip balena-image-jetson-xavier-20240517145940.rootfs.balenaos-img
root@ba8de4f:/mnt/data# ls balena-image-jetson-xavier-20240517145940.rootfs.balenaos-img.gz

Obtaining takeover

Download the aarch64 takeover archive, extract the binary copy it to your Jetson device, placing it in the /mnt/data/ directory.

Triggering the migration process

Execute the takeover binary on your Jetson board:

./takeover -i balena-image-jetson-xavier-20240517145940.rootfs.balenaos-img.gz --no-nwmgr-check --no-os-check --log-file /mnt/data/stage1.log --log-level trace -l /dev/sdb1 --s2-log-level trace
If you want to preserve migration logs, insert a USB stick in the Jetson board and pass the partition for storing the logs using `-l <usb_key_partition>`.

If the migration completed successfully, you should see the new L4T version in the kernel version string on your device:

root@ba8de4f:~# uname -r
5.10.120-l4t-r35.4.ga

Args description:

  • --no-nwmgr-check ensures no new NetworkManager connection files are created from the current active NetworkManager connections. All connection files in /mnt/boot/system-connections/ have to also exist in /etc/NetworkManager/system-connections, otherwise takeover will refuse to continue. By default, balenaOS copies these connections between the boot partition to the NetworkManager directory at startup. However, if you manually created a connection in /etc/NetworkManager and have not transferred it to the boot partition, takeover will refuse to run. This is used to ensure that the device will be able to connect to the cloud after the migration is completed.
  • --no-os-check allows the migration to start regardless of the version currently running
  • --log-file /mnt/data/stage1.log is the path to where the stage1 migration log will be stored
  • -l /dev/sdb1 indicates the path to the external partition where the stage2 logs will be stored. Can be omitted.
  • --log-level and --s2-log-level sets logging level to trace for both stages of the migration

Use ./takeover --help for more details on the supported takeover arguments

@acostach acostach force-pushed the wip_add_xavier_agx_nxsd_nxemmc_35_4_1_non_flasher branch from d87011a to 354b31e Compare February 19, 2024 16:33
@flowzone-app flowzone-app bot enabled auto-merge February 19, 2024 16:36
@acostach acostach changed the title [WIP]: Xavier AGX, NX eMMC an NX SD 35.4.1 [WIP]: Xavier AGX, NX eMMC and NX SD on L4T 35.4.1 Feb 28, 2024
acostach added 27 commits March 6, 2024 10:15
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Changelog-entry: resin-init-flasher: Update recipes
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
and also add kernel cmdline arguments to prevent boot crashes

Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
to cleanup duplicate code

Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
as well as notes on how it has been generated

Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
acostach added 4 commits March 6, 2024 12:04
…nd NX eMMC

Signed-off-by: Alexandru Costache <alexandru@balena.io>
…evkit eMMC

Signed-off-by: Alexandru Costache <alexandru@balena.io>
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Because we now use a single UEFI capsule for the AGX Xavier, NX SD and
NX eMMC.

Signed-off-by: Alexandru Costache <alexandru@balena.io>
@acostach acostach force-pushed the wip_add_xavier_agx_nxsd_nxemmc_35_4_1_non_flasher branch from 354b31e to 84dde00 Compare March 6, 2024 13:00
@cbecker
Copy link

cbecker commented Jul 17, 2024

@acostach I'd be happy to give this a try on Xavier AGX and Xavier NX.

Should it all work if I burn balenaOS directly, instead of using takeover?

@acostach
Copy link
Collaborator Author

@cbecker yes, for that can you try use this PR balena-os/jetson-flash#163 in jetson-flash?

@cbecker
Copy link

cbecker commented Jul 19, 2024

@acostach I managed to build the balena os image and proceeded to flash it with jetson-flash, the branch you mentioned above, but it's hanging, maybe I'm doing something wrong?

The command I run is ./bin/cmd.js -m jetson-xavier-nx-devkit-emmc -f /data/images/balena-image-jetson-xavier-nx-devkit-emmc.balenaos-img -l yes

and this is the last part of the output:

[   4.7580 ] Warning: pub_key.key is not found
[   4.7443 ] tegrahost_v2 --chip 0x19 0 --updatesigheader blob_tegra194-p3668-0001-p3509-0000_aligned_sigheader.dtb.encrypt blob_tegra194-p3668-0001-p3509-0000_aligned_sigheader.dtb.hash zerosbk
[   4.7536 ] tegrahost_v2 --chip 0x19 --generateblob blob.xml blob.bin
[   4.7553 ] number of images in blob are 11
[   4.7556 ] blobsize is 6500424
[   4.7556 ] Added binary blob_nvtboot_recovery_cpu_t194_sigheader.bin.encrypt of size 232976
[   4.7599 ] Added binary blob_nvtboot_recovery_t194_sigheader.bin.encrypt of size 206016
[   4.7603 ] Added binary blob_preboot_c10_prod_cr_sigheader.bin.encrypt of size 24016
[   4.7606 ] Added binary blob_mce_c10_prod_cr_sigheader.bin.encrypt of size 145184
[   4.7610 ] Added binary blob_mts_c10_prod_cr_sigheader.bin.encrypt of size 3430416
[   4.7630 ] Added binary blob_bpmp-2_t194_sigheader.bin.encrypt of size 1007392
[   4.7643 ] Added binary blob_tegra194-a02-bpmp-p3668-a00_lz4_sigheader.dtb.encrypt of size 36176
[   4.7648 ] Added binary blob_spe_t194_sigheader.bin.encrypt of size 95232
[   4.7654 ] Added binary blob_tos-optee_t194_sigheader.img.encrypt of size 977664
[   4.7657 ] Added binary blob_eks_t194_sigheader.img.encrypt of size 5136
[   4.7659 ] Added binary blob_tegra194-p3668-0001-p3509-0000_sigheader.dtb.encrypt of size 340032
[   4.7696 ] Sending bootloader and pre-requisite binaries
[   4.7728 ] tegrarcm_v2 --download blob blob.bin
[   4.7746 ] Applet version 01.00.0000
[   4.7960 ] Sending blob
[   4.7962 ] [................................................] 100%
[   5.7563 ] tegrarcm_v2 --boot recovery
[   5.7580 ] Applet version 01.00.0000
[   6.7864 ] tegrarcm_v2 --isapplet

@acostach
Copy link
Collaborator Author

@cbecker do you see any logs on the serial console of the device, do they show where the process stops?

@cbecker
Copy link

cbecker commented Jul 22, 2024

@acostach thanks, it was my bad: I was trying to burn the emmc version to a jetson nx devkit, and also did not realize that build.sh in jetson-flash needs the device as an argument to be passed.

Right now I finished flashing and registering a device with the Jetson NX devkit and seems to work well so far,

~# uname -a
Linux 171f17a 5.10.120-l4t-r35.4.ga #1 SMP PREEMPT Fri Aug 4 11:16:53 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

will do a few more tests, thanks for taking care of the port!

@cbecker
Copy link

cbecker commented Aug 12, 2024

@acostach I managed to test this successfully, and able to use the NX gpu for processing.

Thanks for taking care of this :)

Do you have an estimate of when this will be merged/available officially?

@acostach
Copy link
Collaborator Author

acostach commented Aug 13, 2024

Hi @cbecker and thanks for the update! Could you please let me know if you managed to use takeover too and migrate from the previous release? How did it work for you?

@cbecker
Copy link

cbecker commented Aug 14, 2024

@acostach I flashed the board directly with jetson-flash.

From what I understand, takeover is useful if I have an aready-flashed board with the sdkmanager, to install balena on it, is that correct? In our case, we wanted to do flash straight away, thus I used jetson-flash.

@cbecker
Copy link

cbecker commented Aug 15, 2024

@acostach , I see that for Xavier NX, the kernel is the one release with JetPack 5.1.2, if I'm not mistaken. Is it straightforward/possible to update it to the 5.1.3 one, that is the latest supported for Xavier NX?

@dadaroce
Copy link

Hi @acostach, I've tested it on a Xavier AGX running the Balena version 2.99.27+rev1, and it looks like the supervisor versions should match the BalenaOS versions. So, I upgraded the supervisor version to 16.1.0 (I am unsure if this is the correct version to set it). After some failure at the initialization (the device kept restarting and got unexpected behavior), it worked as expected, so we ran some ML models on it. However, whenever we try to "Reboot" the Device from the Balena button, the Device remains in a fail state. Sometimes, it starts and goes offline after 1-2 minutes.

  • Do you have any idea about it? Right now, I don't have physical access to the Device, but I request to extract the debug logs from it
  • Do you think a fresh flash better fits this approach?

@dadaroce
Copy link

dadaroce commented Aug 15, 2024

@acostach with a fresh flash also happens; using the reboot button causes a fail status. I don't have the converter to read the debug port still, but once I could got the logs I'll show you

@acostach
Copy link
Collaborator Author

Hi @dadaroce , can you please give me the full steps you used for testing? Also, can you please let me know what type of Xavier AGX you are using? RAM, eMMC size and if it's a Devkit or if it's using a custom carrier board. Thank you

@acostach
Copy link
Collaborator Author

@acostach I flashed the board directly with jetson-flash.

From what I understand, takeover is useful if I have an aready-flashed board with the sdkmanager, to install balena on it, is that correct? In our case, we wanted to do flash straight away, thus I used jetson-flash.

@cbecker takerover is useful for the AGX Xavier, Orin NX eMMC and Orin NX SD if you want to migrate from balenaOS running Jetpack 4 - L4T 32.7 - to a balenaOS running Jetpack 5 - L4T 35.X. This because there is a partition layout change between the two Jetpack versions

@cbecker
Copy link

cbecker commented Aug 16, 2024

@acostach I flashed the board directly with jetson-flash.
From what I understand, takeover is useful if I have an aready-flashed board with the sdkmanager, to install balena on it, is that correct? In our case, we wanted to do flash straight away, thus I used jetson-flash.

@cbecker takerover is useful for the AGX Xavier, Orin NX eMMC and Orin NX SD if you want to migrate from balenaOS running Jetpack 4 - L4T 32.7 - to a balenaOS running Jetpack 5 - L4T 35.X. This because there is a partition layout change between the two Jetpack versions

Thanks for the explanation. I was using Xavier NX so probably that's why I didn't need it. I flashed it directly to the device.

@acostach
Copy link
Collaborator Author

acostach commented Oct 9, 2024

This integration has been moved to https://github.com/balena-os/balena-jetson-jp5

@acostach acostach closed this Oct 9, 2024
auto-merge was automatically disabled October 9, 2024 11:47

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants