[ubuntu_bootstrap] Sometimes discovered slave is unavailable via SSH
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
Critical
|
Albert Syriy | ||
8.0.x |
Confirmed
|
Critical
|
Albert Syriy | ||
Mitaka |
Fix Committed
|
Critical
|
Albert Syriy |
Bug Description
Sometimes discovered slave nodes are unavailable via SSH, because the daemon is stopped (see attached screenshot). I happens because SSH is restarted when network interface state is changed. Currently we have a hack in our Ubuntu bootstrap for enabling predictable interface naming (which is not fully supported by Ubuntu 14.04) - we perform `ifdown eth0` on startup and re-trigger udev events:
https:/
According to the logs it causes SSH daemon failure:
syslog:
2015-12-
2015-12-
2015-12-
...
2015-12-
2015-12-
auth.log:
2015-12-
2015-12-
2015-12-
I tried to reproduce it manually on already bootstrapped node (restarted enp0s3 interface a lot of times), but with no luck - SSH daemon hasn't died. So I believe we get some kind of race condition while disabling network interface on startup. IMHO we could try to backport 'udev' from Ubuntu vivid (which seems supports predictable interface naming) and configure upstart to run 'networking' after 'udev' in order to get rid from 'let-rename'. If it doesn't work, then one more hack is needed in rc.local - check that SSH is running.
Changed in fuel: | |
status: | New → Incomplete |
Changed in fuel: | |
assignee: | nobody → asyriy (asyriy) |
tags: | added: area-linux |
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: same-as-1529631 |
tags: | added: tech-debt |
Changed in fuel: | |
status: | In Progress → Fix Committed |
Actually, init config of sshd contain "respawn" option.
I believe, your issue related to something other then network restart.
Please, try to reproduce issue(maybe some trace\dumps also was generated?)
(Also, i didn't find any auth logs in snapshot)