Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns/rfc2136: delete cache files if nsupdate failed #4057

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

perryflynn
Copy link

See #4055.

Similar to #2752, updating my DynDNS Domain via the rfc2136 plugin does not work. I added some log lines to the plugin code and it looks like the update fails at rc.bootup because the internet connection is not established yet. Later on rc.newwanip the plugin reports the IP was not changed and the nsupdate call is skipped.

My workaround is to delete the cache files when the exit code of the nsupdate command is not zero.

@perryflynn
Copy link
Author

@AdSchellevis already mentioned in the issue that here a background process should be used. How can I use a background process here and get the exit code to decide if the cache file should be deleted?

Does mwexec_bg support shell pipes? Then I could do something like this:

$cmd .= " /var/etc/nsupdatecmds{$i}";
$cmd .= " || rm -f ".rfc2136_cache_file($dnsupdate, 4)." ".rfc2136_cache_file($dnsupdate, 6);

/* invoke nsupdate */
$cmd = "/usr/local/bin/nsupdate -k {$keyfile}";
if (isset($dnsupdate['usetcp'])) {
$cmd .= " -v";
}

$cmd .= " /var/etc/nsupdatecmds{$i}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, this stuff could really use a rewrite. The current code actually has a race condition if you ask me, the cache files(s) should only be written after an successful update, which currently isn't the case.

A short term solution could be to keep the mwexec_bg($cmd); and wrap the update action in a new shell script (src/bin/nsupdate.sh), although Ideally one should then also change the rfc2136_cache_file function to something easily predictable using the same key. If these cache files are not used anywhere else, it might be an option to calculate one key which is then used for all files (e.g. {$dnsupdate['interface']}_{$dnsupdate['host']}_{$dnsupdate['server']}{$ipver}).

The files being /var/etc/nsupdatecmds{$key}, /var/etc/nsupdatekey{$key}, /var/cache/rfc2136_{$key}_4.cache and /var/cache/rfc2136_{$key}_6.cache.

I'm not sure how to feed the complete action to /usr/sbin/daemon, but if we push it in a simple script, we only need to pass the new key and the value of usetcp, after which we could do something like:

/usr/local/bin/nsupdate -k ... || rm {both cache files}

I'm not using this myself (also don't have a test setup), but this is a direction that could work in my opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, creating the cache files after the nsupdate was successful would be a better way. I try to implement that in the next week.

@perryflynn
Copy link
Author

I added now a wrapper for handling the cache files in background, but now it is not updating on bootup or newwanip hook. I added in addition to that a cron:

image

The cron was executed 13 minutes after newwanip and then the update ran through.

OPNSense "General" Log:

2024-06-29T13:15:00	Error	opnsense	/usr/local/etc/rc.rfc2136: Dynamic DNS ...
2024-06-29T13:02:22	Error	opnsense	/usr/local/etc/rc.newwanip: Dynamic DNS ...
2024-06-29T13:01:44	Error	opnsense	/usr/local/etc/rc.bootup: Dynamic DNS ...

Output of nsupdate (I added a 2>&1 >> /var/log/nsupdate.log to the wrapper script):

Sat Jun 29 13:01:44 CEST 2024
; Communication with 94.247.xx.xx#53 failed: operation canceled
could not reach any name server
Sat Jun 29 13:02:22 CEST 2024
couldn't get address for 'ns1.example.com': not found
syntax error
Sat Jun 29 13:15:00 CEST 2024
Result: 0

First I thought it is a DNS resolution issue and I added my DNS server as override to unbound. But it's still not working when the hooks are triggering the nsupdate.

For me it looks like as the hooks are having race conditions.

  • On bootup it resolves the IP with unbound but cannot connect to the DNS server
  • On newwanip it cannot resolve the DNS server anymore
  • ~15 minutes later the cron triggers and all is working fine.

DNS setup on my OPNSense box:

  • 8.8.8.8 / 8.8.4.4 as hardcoded dns servers
  • Prefer IPv4 over IPv6
  • Do not allow overriding DNS server list by PPPoE

What could be the issue here? What could I do here?

@AdSchellevis
Copy link
Member

At first glance I'm afraid your overcomplicating the code now, when the hook fires multiple times, the command executed maybe different know due the conditionals.

I would really opt for a simpler setup, less code, less chance of regressions. Maybe keeping it as simple as removing the cache files when execution failed is an easier approach (when restructuring the filenames, the script is almost a one-liner).
Trying ddclient as alternative might also be worth the effort.

@perryflynn
Copy link
Author

No, it triggers extactly as before. See #4055. I think the problem was from the beginning, that the logic does not wait for a working internet connection.

The cron is just a workaround for that problem.

I would really opt for a simpler setup, less code, less chance of regressions.

Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.

Trying ddclient as alternative might also be worth the effort.

I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection? Should this maybe the case when the newwanip hook is triggered and maybe there is a problem somwhere else?

@AdSchellevis
Copy link
Member

Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.

I see. Maybe it's better to use a local workaround for your use-case, maybe a late syshook start script (https://docs.opnsense.org/development/backend/autorun.html) removing the cache files and manually restarting the process.

To guide this into a mergable state will likely cost more time than I can offer (maybe even more than engineering a solution myself, in which case adding the functionality to our native backend in the ddclient plugin likely would make more sense).

I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection?

Not really, working is a fluid concept, when nsupdate returns an error code, I would use that, if it doesn't you might try to resolve the remote host, but this might complicate the situation even further.

@Monviech
Copy link
Member

I just want to throw this into the mix as alternative:
https://docs.opnsense.org/manual/how-tos/caddy.html#use-dynamic-dns-in-client-mode-only
It has this compiled in ready to use:
https://github.com/caddy-dns/rfc2136

@perryflynn
Copy link
Author

@Monviech

Interesting, will try that.

@AdSchellevis

What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?

If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?

The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.

@AdSchellevis
Copy link
Member

What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?

newwanip is merely an event you can subscribe to when a network device receives an address, it doesn't guarantee internet connectivity in any way (https://docs.opnsense.org/development/backend/legacy.html#configure)

If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?

I'm afraid not as it moves the action in the foreground, peoples setups might freeze during boot as unintended side affect.

The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.

I don't mind the plugin being bad code, as we don't support it officially anyway, but when we merge changes which breaks it for others, it will cost time which would be better spend elsewhere. For us this currently just isn't a priority, if we could push it forward with some tips and review work, that's ok, but only within reasonable limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants