Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless Podman container cannot resolve DNS after enabling systemd-resolved #282361

Closed
thefossguy opened this issue Jan 20, 2024 · 9 comments
Closed
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@thefossguy
Copy link
Member

Describe the bug

As the title mentions, a rootless podman container is unable to resolve DNS after enabling systemd-resolved via the services.resolved module.

Since home-manager does not have a module for rootless podman containers, I am manually creating a systemd service to start a Gitea container at boot (source). This Gitea container handles my personal Gitea instance where I have a push mirror to my GitHub repositories. I push to Gitea which in turn pushes to GitHub.

With services.resolved.enable = true; in my configuration.nix file, I get the following error in the container's logs:

2024/01/20 19:04:25 ...irror/mirror_push.go:164:func1() [E] Error pushing /data/git/repositories/thefossguy/dotfiles.git mirror[4] remote remote_mirror_XXXXXXXXXX: push failed: exit status 128 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com
 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com
 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com

2024/01/20 19:04:25 ...irror/mirror_push.go:112:SyncPushMirror() [E] SyncPushMirror [mirror: 4][repo: <Repository 5:thefossguy/dotfiles>]: push failed: exit status 128 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com
 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com
 - fatal: unable to access 'https://github.com/thefossguy/dotfiles/': Could not resolve host: github.com

Since aardvark-dns is the DNS server that the upstream (Podman) documentation recommends, I also tried putting aardvark-dns in the PATH environment for the systemd user service like so:

-          Environment = [ "PODMAN_SYSTEMD_UNIT=%n" ];
+          Environment = [
+            "PODMAN_SYSTEMD_UNIT=%n"
+            "PATH=\"${pkgs.aardvark-dns}/bin\""
+          ];

But this doesn't seem to work either.

Steps To Reproduce

  1. Ensure that services.resolved is disabled/unset.
  2. Use my nix file to create a systemd service for the container. (Or just run apache/httpd instead of copying unnecessary lines of code.)
  3. Enter container using podman exec -it gitea-govinda sh.
  4. From the container's shell, running nslookup github.com should give you a familiar, positive-feeling output.
  5. Enable services.resolved in the NixOS configuration and reboot (haven't tried with a nixos-rebuild switch).
  6. Enter container using podman exec -it gitea-govinda sh.
  7. From the container's shell, running nslookup github.com would result in some sort of error. It says ;; connection timed out; no servers could be reached on my end.

Expected behavior

I expect the DNS lookup to work since the distros (RHEL and Fedora) where Podman is the first class citizen does use systemd-resolved and DNS lookup does work there.

Screenshots

N/A

Additional context

N/A

Notify maintainers

Pinging the last three people that modified this module: @benaryorg @RaitoBezarius @roberth

Metadata

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"aarch64-linux"`
 - host os: `Linux 6.1.73, NixOS, 23.11 (Tapir), 23.11.3326.d2003f2223cb`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-23.11"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a 👍 reaction to issues you find important.

@thefossguy thefossguy added the 0.kind: bug Something is broken label Jan 20, 2024
@RaitoBezarius
Copy link
Member

Hi there, I don't do container alas, I am afraid I cannot help.

@thefossguy
Copy link
Member Author

That's not an issue. Do you have any idea how to "debug" this? I assumed adding aardvark-dns to the PATH might prevent it but as I mentioned, it doesn't work. Comparing two NixOS versions with systemd-resolved enabled and disabled (with nvd) show me that openresolv gets added when systemd-resolved is disabled.

$ nvd diff /nix/var/nix/profiles/system-3{4,5}-link
<<< /nix/var/nix/profiles/system-34-link
>>> /nix/var/nix/profiles/system-35-link
Selection state changes:
[C+]  #1  openresolv  3.13.2
Added packages:
[A.]  #1  X-Restart-Triggers-resolvconf    <none>
[A.]  #2  unit-network-setup.service       <none>
[A.]  #3  unit-resolvconf.service          <none>
[A.]  #4  unit-script-network-setup-start  <none>
Removed packages:
[R.]  #1  X-Restart-Triggers-systemd-resolved  <none>
[R.]  #2  etc-systemd-resolved.conf            <none>
[R.]  #3  unit-systemd-resolved.service        <none>
Closure size: 1382 -> 1383 (25 paths added, 24 paths removed, delta +1, disk usage +3.0KiB).

@dotlambda dotlambda added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Jan 21, 2024
@benaryorg
Copy link
Contributor

benaryorg commented Jan 21, 2024

@thefossguy what does /etc/resolv.conf point to inside the container?
Without further testing I'd assume that either in that specific setting (or maybe just the general default) the host resolv.conf gets copied into the container verbatim which would break DNS resolution if the container has its own network namespace (i.e. anything but --net=host) since it cannot reach the host's localhost (since it has its own).

Looking at podman-run(1) it says this:

Several files will be automatically created within the container. These include /etc/hosts, /etc/hostname, and /etc/resolv.conf to manage networking. These will be based on the host’s version of the files, though they can be customized with options (for example, --dns will override the host’s DNS servers in the created resolv.conf).

Which would point the same direction.

Both DNS and networking in general are a very complex topic in any kind of infrastructure that is not Link-Layer bound, which is the case for qemu using slirp and containers of any variety (LXC, OCI, nspawn).
As a rule of thumb all virtualization software has knobs to turn and twist for these cases and to find a solution that fits your use-case usually hinges more on the virtualization software than the host system, so it's often more viable to check with the requirements pertaining to network on the virtualization side before looking at the host.

For instance in this case if you're using something on top of podman (such as a form of *-compose) then that solution probably has some orchestration abilities to aid with this, otherwise podman itself has the above mentioned options to tweak it manually on invocation. Any host-side solution offering both the host-local caching and/or proxying so desired in mobile scenarios will likely require rebinding (ideally with appropriate ACLs or firewalling since DNS is known to be used for amplification attacks) so as to not bind on a local-scoped address (e.g. switching from ::1/127.0.0.1 to ULA/RFC 1918 addresses) and deploying a resolv.conf which remains valid even when copied into the container. Or you could run your container in the host network namespace if you don't require separation on that level.

@benaryorg
Copy link
Contributor

Use my nix file to create a systemd service for the container. (Or just run apache/httpd instead of copying unnecessary lines of code.)

I see you are specifying a network in your nix configuration however as far as I can see that network is not managed with Nix (or am I mistaken?).
My quick tests on a Nix machine with resolved yield proper DNS resolution using a podman provided RFC1918 address when run without arguments and when creating a network with default parameters (which sets the DNSEnabled flag to true by default, see also the above mentioned man-page, which lists this command to check: podman network inspect -f {{.DNSEnabled}} <name>).

@thefossguy
Copy link
Member Author

thefossguy commented Jan 21, 2024

@thefossguy what does /etc/resolv.conf point to inside the container?

$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search .

$ podman exec -it gitea-govinda sh
/ # nslookup github.com
;; connection timed out; no servers could be reached

/ # nslookup google.com
;; connection timed out; no servers could be reached

/ # ls -l /etc/resolv.conf
-rw-r--r--    1 root     root            39 Jan 21 11:33 /etc/resolv.conf
/ # cat /etc/resolv.conf
search dns.podman
nameserver 10.89.0.1

EDIT 1: Update after rebooting into a generation which has systemd-resolved disabled, following is the output:

$ podman exec -it gitea-govinda sh
/ # ls -l /etc/resolv.conf
-rw-r--r--    1 root     root            53 Jan 21 12:28 /etc/resolv.conf
/ # cat /etc/resolv.conf
search dns.podman
nameserver 10.89.0.1
options edns0

Use my nix file to create a systemd service for the container. (Or just run apache/httpd instead of copying unnecessary lines of code.)

I see you are specifying a network in your nix configuration however as far as I can see that network is not managed with Nix (or am I mistaken?).

Yes, the networks get setup by an "init" service which calls this shell script. This "init" script sets up a network if one doesn't exist already. (The network name--the entire systemd user service to be honest--is influenced purely by whatever podman-compose systemd create-unit -f docker-compose.yaml resulted in when I was on RHEL 9.)

My quick tests on a Nix machine with resolved yield proper DNS resolution using a podman provided RFC1918 address when run without arguments and when creating a network with default parameters (which sets the DNSEnabled flag to true by default, see also the above mentioned man-page, which lists this command to check: podman network inspect -f {{.DNSEnabled}} <name>).

$ podman network inspect -f {{.DNSEnabled}} containers_default
true

I created a test VM with the same config (and enabled systemd-resolved in it) and it works there. But still fails on the physical hardware (Raspberry Pi 4B with PFTF's EFI).

EDIT 2: I forgot to enable systemd-resolved in the VM. Sorry, my bad! After enabling it, the same DNS issue persists in the VM too. @benaryorg are you interested in replicating my config in a VM? If yes, I'll add a comment (shortly) on the steps. I see you took time to research a lot and I feel guilty about that (but I'm testing my patches as we speak).

@benaryorg
Copy link
Contributor

@thefossguy okay, I stand corrected, this is an actual issue within podman then, with this configuration it should work as far as I can tell.
As you're on the regular release channel you should be getting virtually identical packages to me (on a flake-based setup), though verification would be nice, could you send the output of readlink /nix/var/nix/profiles/per-user/root/channels/nixos as well as the path of podman (e.g. mine is: /nix/store/mbmy1201f8lnsai9fj0mw0jijfi0zs44-podman-4.7.2) just so we're sure we're using the same version.

If we are on the same version then adding aardvark-dns to your path won't be necessary since it's already pulled in via podman indirectly (nix run nixpkgs#nix-tree -- /run/current-system should show […] ⇒ podman-4.7.2 ⇒ podman-helper-binary-wrapper ⇒ { netavark-1.7.0, aardvark-dns-1.8.0 } to be somewhere in there.

You can check whether aardvark-dns is being used by looking in your host-side process list when a container is running, when using --network to run a container there should be an aardvark-dns process.

I tried stracing this on my end, here are some snippets:

# getting the initial request (this was `apt update` running in a debian container)
[pid 16373] 15:29:28.456799 [  45] recvfrom(9, "\270Y\1\0\0\1\0\0\0\0\0\0\3deb\6debian\3org\0\0\34\0\1", 4096, 0, {sa_family=AF_INET, sin_port=htons(58803), sin_addr=inet_addr("10.89.0.3")}, [128 => 16]) = 32
# creating an outbound socket
[pid 16382] 15:29:28.457109 [  41] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 10
# and binding to all IPv4s
[pid 16382] 15:29:28.457199 [  49] bind(10, {sa_family=AF_INET, sin_port=htons(53033), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
# sending the request onwards
sendto(10, "\224\204\1\0\0\1\0\0\0\0\0\0\3deb\6debian\3org\0\0\1\0\1", 32, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 32

The 10.0.2.3 is not actually in use on my machine (the machine's got two interfaces, one purely IPv6 sitting in a /64, one a single routed /32 IPv4).
10.0.2.3 is what slirp usually uses, which you might also know from running qemu with user networking.
When stracing the slirp4netns process, I also see the request showing up as a read(), followed by this:

openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY) = -1 ENOENT (No such file or directory)

On a hunch I decided to remove the /etc/resolv.conf symlink and replace it with a file with the exact same content (note: do not use in production, this is just for testing: (cat /etc/resolv.conf && sudo rm /etc/resolv.conf) | sudo sponge /etc/resolv.conf).

The openat() now succeeds and is followed by this:

sendto(3, "\377r\1\0\0\1\0\0\0\0\0\0\3deb\6debian\3org\0\0\34\0\1", 32, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16)
34625

It seems that the culprit here is that the symlink is copied rather than the underlying file which causes the slirp side of things to break.
However this continues one step further.
With resolved on NixOS the /etc/resolv.conf symlink doesn't just point at /run/systemd/resolve/stub-resolv.conf, No, instead it points at /etc/static/resolv.conf and from there points to the one in /run.
Since /etc/static however is also a symlink to the nix store, this is where things probably break.
You see, if /etc/resolv.conf is a file it works, if /etc/resolv.conf is a symlink directly to /run it also works, yet as soon as /etc/static and thereby /nix is involved it does not work anymore.
Now if you look at the slirp documentation I linked above you'll find that there's an interesting --enable-sandbox, one which the slirp4netns process is actually using on my machine:

(since v0.4.0) enter the user namespace and create a new mount namespace where only /etc and /run are mounted from the host.

This sandbox seems to be 100% hardcoded to those directories.

I do not see a lot of solutions here to be honest.
The obvious one is disabling the sandbox (which should be possible for a user in the podman configs, I haven't checked), however that's disabling a meaningful sandbox.
The other one would be to literally patch /nix/store into those mounts on the NixOS side of things.
We could also report this to slirp4netns upstream and see if there's anything there to be done about this, ideally adding a flag for additional directories in the sandbox.
And then there'd be the last one, which is definitely actionable on the NixOS side of things, but I'm not sure if it's feasible, and that's to make the symlink at /etc/resolv.conf point directly at /run instead of via the static symlink part, however that mechanism exists for a reason.

@benaryorg
Copy link
Contributor

benaryorg commented Jan 21, 2024

Wow, okay, I should've just checked the issues in slirp4netns for a split second there. An upstream bug report exists (https://github.com/rootless-containers/slirp4netns/issues/333) and this is most likely a duplicate for #231191, which contains a workaround.

@thefossguy
Copy link
Member Author

thefossguy commented Jan 21, 2024

@benaryorg, please use my NixOS config + installer script to create a VM of your own and test it (if you are willing to do so). Please follow these steps to install NixOS with my config in a VM:

  1. Create the VM's disk with a minimum size of 64 GB.
  2. From the live ISO, do a git clone https://github.com/thefossguy/prathams-nixos && cd prathams-nixos && git checkout de8bae6faab3e4cba523f8eb41656bcb12f9fc8a.
  3. Apply this patch to my NixOS config repo, so things like unnecessary containers for testing aren't created (Gitea depends on Caddy and Caddy will fail it it can't find valid certificates! This patch removes those containers.) and the pratham user has the password asdf.
  4. Start the installer script with this command: sudo ./install.sh <drive> reddish virt 2>&1 | tee install.log. ("syntax" for the installer script is in the README. Please make sure that you use the hostname reddish or the podman services will not be enabled.)
  5. Once the installer finishes, reboot and login as pratham with the password asdf. (Enjoy sudo without password, disabled with the aforementioned patch.)
  6. The Gitea server will start after podman-init.service exits cleanly. If podman-init.service hasn't automatically started on the first boot, just reboot. This is a minor issue with user services managed by home-manager.

(Warning: This is a very opinionated NixOS config so there will be some weird things like the first launch of Neovim will install the plugins and the sort. Sorry about that.)

With the checked-out commit, systemd-resolved is disabled. So, with the first generation, DNS resolution will work form the container(s). After adding the line services.resolved.enable = true; to the configuration.nix file and rebooting into the new generation, DNS resolution fails. I tested the patches from my end and hopefully they replicate on your side too. :)

EDIT: Please go through the edits on my previous comment. :)

@thefossguy
Copy link
Member Author

Wow, okay, I should've just checked the issues in slirp4netns for a split second there. An upstream bug report exists (https://github.com/rootless-containers/slirp4netns/issues/333) and this is most likely a duplicate for #231191, which contains a workaround.

Okay this did end up working! 🎉

So yeah, for whatever reason, /etc/resolv.conf is not a symlink to anything with systemd-resolved disabled (openresolv) but it is a link to /run/systemd/resolve/stub-resolv.conf with systemd-resolved enabled.

No need to replicate my NixOS config. This is resolved. Thank you @benaryorg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

4 participants