-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added config.ssh.retries and config.ssh.retry_interval to allow for SSH retries #13047
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
It looks like the winrm
communicator plugin has similar functionality with max_tries and retry-delay. It would be nice to have consistency between the names of these options, how they are implemented, and their default values.
This also needs unit tests. They are in the tests subdir.
@@ -192,6 +192,8 @@ | |||
optional :connect_timeout, :int64, 15 | |||
proto3_optional :ssh_command, :string, 16 | |||
proto3_optional :proxy_command, :string, 17 | |||
proto3_optional :retries, :int64, 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generated file, it should not be edited
@@ -400,7 +400,16 @@ def connect(**opts) | |||
raise Vagrant::Errors::SSHNotReady if ssh_info.nil? | |||
|
|||
# Default some options | |||
opts[:retries] = 5 if !opts.key?(:retries) | |||
opts[:retries] = if ssh_info.key?(:retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ssh_info
should probably already have the default value for the retries (coming from the config). So, there is no need to default this again. Instead, this can be opts[:retries] = ssh_info[:retries] if !opts.key?(:retries)
(same for the retry_interval below). This also implies that the internal caller of this method can override the user defined config.
This also means that the ssh_info
method in the Machine
class needs to be updated to account for this new config.
@@ -134,6 +142,26 @@ def validate(machine) | |||
given: @connect_timeout.to_s) | |||
end | |||
|
|||
if !@retries.is_a?(Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the validate
method for the winrm communicator. It has some better validations steps. For example, it's ok to not have any retries or to have a retry interval between 0 and 1.
@@ -2,6 +2,8 @@ module VagrantPlugins | |||
module Kernel_V2 | |||
class SSHConnectConfig < Vagrant.plugin("2", :config) | |||
DEFAULT_SSH_CONNECT_TIMEOUT = 15 | |||
DEFAULT_SSH_RETRIES = 5 | |||
DEFAULT_SSH_RETRY_INTERVAL = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number can probably come down some. The winrm communicator uses 2.
Worth noting that the current implementation has no retries at all due to this line (it is always 1 and counter isn't reset), probably you'll be keen to address this quite old observation in your work. Cheers! https://github.com/hashicorp/vagrant/blob/v2.4.1/plugins/communicators/ssh/communicator.rb#L90 |
Hi @nmaludy! I've just merged in #13048, which modified 120 MDX files and added a new workflow that would be good to have run on this PR. Since this is from a fork, I'm not sure I have permissions to rebase on |
hello all, could you please suggest what can we do to finally close this issue? |
Closes #10776
When provisioning RHEL VMs, i noticed that the hosts become pingable before the SSH daemon is available. Watching the vagrant logs with
--debug
i noticed that the SSH connection was not being retried properly. Digging into the code, i also discovered that even with a properly setup retry counter, the retries were happening as fast as possible, not allowing time for the SSH daemon to become available.To mitigate these problems, i added
config.ssh.retries
to allow for multiple SSH connection retry attempts before failing (default=5). I also addedconfig.ssh.retry_interval
to insert a sleep time (seconds) between each connection attempt, allowing time for the SSH daemon to come up.This is my first PR to Vagrant. I tried to do the right things, but i'm sure i'm missing some stuff. If you all could help guide me through the contribution process, i would very much appreciate it!