-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
corndog: use systemd-sysctl for sysctl options #333
base: develop
Are you sure you want to change the base?
Conversation
I don't speak Rust natively so if there's anything in here that's not idiomatic, feel free to let me know. |
// We don't fail because sysctl keys can vary between kernel versions and depend on | ||
// loaded modules. It wouldn't be possible to deploy settings to a mixed-kernel fleet | ||
// if newer sysctl values failed on your older kernels, for example, and we believe | ||
// it's too cumbersome to have to specify in settings which keys are allowed to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of this PR doesn't maintain the functionality mentioned in this comment, but we could pretty easily achieve this by prepending -
to all the sysctls passed in by default (which would then remove the need for the backup-modify-restore shuffle, which might make things simpler overall), or by documenting the ability of users to do so. Open to feedback/direction here.
sources/api/corndog/src/main.rs
Outdated
config.push_str(&format!("{}={}\n", key.as_ref(), value)); | ||
} | ||
|
||
fs::write(SYSCTL_CONFIG, config).context(error::WriteSysctlConfigSnafu)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do some double-checking around:
- Does this actually append to the file as written, or does it clobber it?
- If we append and we write duplicate sysctls to the same config file, how does systemd-sysctl handle that? (Does the most recent sysctl win?)
I haven't yet been able to get a local build working (turns out my hard drive is smaller than I thought!) but I'll investigate these shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answers:
- This clobbers the file (must fix)
- systemd-sysctl will use the last value in a file for the same sysctl. For example, a config file with the contents:
dev.raid.speed_limit_max = 199999
dev.raid.speed_limit_max = 199998
garbage
dev.raid.speed_limit_max = 199997
and running systemctl restart systemd-sysctl
will set dev.raid.speed_limit_max
to 199997
and report an error when attempting to parse the "garbage" line - the return code of the systemd unit will be 1 but all of the correctly specified sysctl k/v pairs will have been applied.
systemd-sysctl also will deduplicate multiple lines and only write when there are changes to be made, so in theory endlessly appending to a file should be a reasonable way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output from running systemd-sysctl with the above file in support of the above conclusions:
bash-5.1# SYSTEMD_LOG_LEVEL=debug /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/systemd-sysctl
Parsing /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/sysctl.d/50-default.conf
Parsing /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/sysctl.d/50-pid-max.conf
Parsing /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/sysctl.d/80-release.conf
Parsing /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/sysctl.d/90-kubelet.conf
Parsing /etc/sysctl.d/90-primary_interface.conf
Parsing /etc/sysctl.d/99-zzz-override_cilium.conf
Overwriting earlier assignment of net/ipv4/conf/all/rp_filter at '/etc/sysctl.d/99-zzz-override_cilium.conf:7'.
Parsing /etc/sysctl.d/99-zzz-zzz-testing.conf
Overwriting earlier assignment of dev/raid/speed_limit_max at '/etc/sysctl.d/99-zzz-zzz-testing.conf:2'.
/etc/sysctl.d/99-zzz-zzz-testing.conf:3: Line is not an assignment, ignoring: garbage
Overwriting earlier assignment of dev/raid/speed_limit_max at '/etc/sysctl.d/99-zzz-zzz-testing.conf:4'.
...
Setting '/proc/sys/dev/raid/speed_limit_max' to '199997'
No change in value '199997', suppressing write
Sorry for the delay, @aetimmes. And thanks so much for opening the PR for this. I will take a look at this PR and get back to you soon. I will also help verify the behavior. |
fn validate_sysctl_entry<K>(key: K, value: &str) -> Result<()> | ||
where | ||
K: AsRef<str>, | ||
{ | ||
let key = key.as_ref(); | ||
|
||
// Basic validation of key format | ||
if !key.chars().all(|c| c.is_ascii_alphanumeric() || c == '.' || c == '_' || c == '-' || c == '*') { | ||
return error::InvalidSysctlKeySnafu { key: key.to_string() }.fail(); | ||
} | ||
|
||
// Ensure value doesn't contain newlines which could break the config file format | ||
if value.contains('\n') { | ||
return error::InvalidSysctlValueSnafu { | ||
key: key.to_string(), | ||
value: value.to_string() | ||
}.fail(); | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We perform validation when parsing the key from userdata already. See https://github.com/bottlerocket-os/bottlerocket-settings-sdk/blob/0cf857357bc799bf036b5f5bf865d3c4af4e9256/bottlerocket-settings-models/modeled-types/src/shared.rs#L703-L733.
We may need to account for a more strict validation there though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to adjust in a couple directions - systemd-sysctl also allows for wildcards in the keys, which it looks like the current format doesn't allow for. I can open a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the additional validation here? Is something being allowed in as a setting that would be rejected by systemd-sysctl
?
The settings API is intended to be the first and best line of defense - the only time checks should be needed at apply-time is if there's invalid input that the API isn't currently capable of detecting, like two mutually exclusive or dependent settings.
If we do need some extra validation for compatibility, then it'd be better to warn and remove the setting from the output, so that it's not a hard error. But even that is a potential compatibility break on upgrade that's best avoided, if there were sysctls that would have been applied before but that can't be applied now because of more restrictive rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the additional validation here
No purpose - I was unaware of the bottlerocket-settings-sdk repo when I took the first pass at this PR; I agree that the validation in this MR should be removed.
I would still like to update the validation in the bottlerocket-settings-sdk repo to allow for wildcards where possible since they're supported by systemd-sysctl in general, but given that that PR would be allowing for extra functionality, I don't consider it a blocker for this change.
where | ||
K: AsRef<str>, | ||
{ | ||
// TODO: make the priority configurable? | ||
const SYSCTL_CONFIG: &str = "/etc/sysctl.d/95-corndog.conf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this constant up (somewhere in L26-L28) so that we don't recreate the constant every time we invoke this function.
Also maybe add _PATH
as the suffix suggesting it is the path of the config not the actual content of the config?
for (key, value) in &sysctls { | ||
validate_sysctl_entry(key, value)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again validation already happened when parsing the keys.
} | ||
|
||
// Create config string | ||
let mut config = String::new(); | ||
for (key, value) in sysctls { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had conversation that iterating through the map entry is not ideal since it does not guarantee an ordering here.
One way to address that is to sort the sysctl entries first by the keys and then iterate based on the sorted keys. You can do something like
let mut sysctl_vec: Vec<_> = sysctls
.iter()
.map(|(k,v)| (k.as_ref(), v))
.collect();
sysctl_vec.sort();
for (key, value) in sysctl_vec {
...
}
On another note, we may also want to preserve the order the entries are specified in the userdata. But that would require a different struct for sysctls
that can preserve the order. For example BTreeMap, but it has performance trade-offs since the insertions, lookups, and deletions would not be as fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - we definitely want to maintain ordering becuase of the interaction between wildcards and overrides. I'm not overly worried about performance here since the upper bound of the size of this structure is probably on the order of thousands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like using a BTreeMap would complicate the serde bits a fair amount since BTreeMap doesn't implement several of the interfaces needed by serde - how do we feel about a Vec
here instead?
let mut file = fs::OpenOptions::new() | ||
.create(true) | ||
.append(true) | ||
.open(SYSCTL_CONFIG) | ||
.context(error::OpenSysctlConfigSnafu)?; | ||
|
||
file.write_all(config.as_bytes()) | ||
.context(error::WriteSysctlConfigSnafu)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs::write("path/to/file.txt", config).context(error::WriteSysctlConfigSnafu)?;
This would try to create the file if it does not exist.
let status = Command::new("systemctl") | ||
.args(["restart", "systemd-sysctl"]) | ||
.status() | ||
.context(error::SystemctlSnafu)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't really need to restart the whole sysctld service.
/usr/lib/systemd/systemd-sysctl /etc/sysctl.d/99-corndog.conf
Would apply this specific file to sysctld.
And could you wrap the binary path and args into constants? Something like below and put it along with the definition of the SYSCTL_CONFIG_PATH
.
const SYSTEMD_SYSCTL_BIN: &str = "/usr/lib/systemd/systemd-sysctl";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to confirm this, but I believe only applying the specific file may not necessarily respect the ordering of the rest of the config files, so sysctl restart systemd-sysctl
is correct IMO.
For example, the Cilium sysctl.conf file has a prefix of 99-zzz
, so applying the 95
priority corndog file by itself may clobber any entries that are also set in the Cilium conf file.
If we kept sysctl restart systemd-ctl
, would we still want these wrapped into constants?
if !status.success() { | ||
return error::SystemctlFailedSnafu.fail(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When hitting this case, it would be also good to print the errors from the output. So I would get the output of the command and check the status and the stderr.
let output = Command::new(SYSTEMD_SYSCTL_BIN)
.arg(SYSCTL_CONFIG_PATH)
.output()
.context(ExecutionSnafu{command: format!("{} {}", SYSTEMD_SYSCTL_BIN, SYSCTL_CONFIG_PATH)})?;
if !output.status.success() {
warn!(
"Failed to apply sysctl config: {}",
String::from_utf8_lossy(&output.stderr) # account for invalid utf8 encoding
);
}
Left some comments, the change looks good overall. I will run some test with the change to verify the behavior. I will get back to you on that. |
fn validate_sysctl_entry<K>(key: K, value: &str) -> Result<()> | ||
where | ||
K: AsRef<str>, | ||
{ | ||
let key = key.as_ref(); | ||
|
||
// Basic validation of key format | ||
if !key.chars().all(|c| c.is_ascii_alphanumeric() || c == '.' || c == '_' || c == '-' || c == '*') { | ||
return error::InvalidSysctlKeySnafu { key: key.to_string() }.fail(); | ||
} | ||
|
||
// Ensure value doesn't contain newlines which could break the config file format | ||
if value.contains('\n') { | ||
return error::InvalidSysctlValueSnafu { | ||
key: key.to_string(), | ||
value: value.to_string() | ||
}.fail(); | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the additional validation here? Is something being allowed in as a setting that would be rejected by systemd-sysctl
?
The settings API is intended to be the first and best line of defense - the only time checks should be needed at apply-time is if there's invalid input that the API isn't currently capable of detecting, like two mutually exclusive or dependent settings.
If we do need some extra validation for compatibility, then it'd be better to warn and remove the setting from the output, so that it's not a hard error. But even that is a potential compatibility break on upgrade that's best avoided, if there were sysctls that would have been applied before but that can't be applied now because of more restrictive rules.
/// Restarts systemd-sysctl service | ||
fn restart_sysctl() -> Result<()> { | ||
let status = Command::new("systemctl") | ||
.args(["restart", "systemd-sysctl"]) | ||
.status() | ||
.context(error::SystemctlSnafu)?; | ||
|
||
if !status.success() { | ||
return error::SystemctlFailedSnafu.fail(); | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restarting a service can lead to unexpected behavior if units depend on that service, or if for whatever reason it hasn't started yet or failed to start successfully. This is also using the blocking form which can lead to trouble in early boot.
It'd be better to avoid all of those issues and just invoke systemd-sysctl
directly. I read your other comment and understand that we can't pass just the one config file, which makes sense, but the unit just runs /usr/lib/systemd/systemd-sysctl
with no arguments which we can do here also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me; I'll adjust accordingly.
@@ -18,6 +18,11 @@ use std::path::{Path, PathBuf}; | |||
use std::str::FromStr; | |||
use std::string::String; | |||
use std::{env, process}; | |||
use std::process::Command; | |||
use tempfile::NamedTempFile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be used, but it would be good to use it and follow the usual pattern:
- create a tempfile in the same directory as the target file
- write all the content to it
- persist the tempfile to the final target once it's ready
(We want to avoid changing the original file if some other process might have it open, and we also want to avoid repeatedly appending to the same file on subsequent runs, which could lead to unexpected behavior or resource exhaustion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to avoid repeatedly appending to the same file on subsequent runs
To be explicitly clear - if I make a new api client request to set sysctl userdata on a host that already has an existing /etc/sysctl.d/95-corndog.conf
file, which of these is the desired behavior?
- append the new sysctls to the existing list of sysctls (allowing for duplicates; gets expensive if many calls are made but systemd-sysctl handles deduping for us)
- upsert the new sysctls into the existing list of sysctls (if a previously detected entry exists, remove the existing key and append the new k/v - more expensive but no duplicates)
- replace the old sysctls with the new ones (seems like a breaking change)
In the case of the first two options, is the suggestion above to make a copy of the existing sysctl conf file as the tempfile (as opposed to operating on the original)?
(Apologies for being pedantic; I just want to make sure I understand the requirements here)
Thanks for the reviews! The feedback all makes sense (modulo the couple of open questions I have) - I'm weirdly struggling to get my local Rust install to play nice with the pre-existing tests so my follow-up will be slightly delayed while I debug my machine. |
Issue number:
Closes bottlerocket-os/bottlerocket#4314
Description of changes:
Change corndog's sysctl function to utilize systemd-sysctl instead of writing values ephemerally to tmpfs, so that other system functions (eg Ciliums sysctl-fixup) don't completely clobber the sysctls set by corndog.
Testing done:
Unit tests added inline
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.