Fuel-nailgun-agent execution expired
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Confirmed
|
Medium
|
Fuel Sustaining | ||
Mitaka |
Fix Committed
|
High
|
Oleksiy Molchanov | ||
Newton |
Confirmed
|
Medium
|
Fuel Sustaining |
Bug Description
MOS 9.2 controller nodes on VM's.
After successful deployment fuel-UI shows that some nodes has gone away and back online many times.
Today
09:19:29
Node 'controller' is back online
09:19:13
Node 'controller' has gone away
09:14:09
Node 'controller3' is back online
09:13:13
Node 'controller' is back online
09:13:13
Node 'controller3' has gone away
09:12:43
Node 'controller' has gone away
09:11:32
Node 'controller2' is back online
09:10:13
Node 'controller2' has gone away
09:09:13
Node 'controller' is back online
09:09:12
Node 'controller' has gone away
08:58:30
Node 'controller3' is back online
08:58:24
Node 'controller2' is back online
08:58:12
Node 'controller3' has gone away
08:58:12
Node 'controller2' has gone away
08:57:27
Node 'controller' is back online
08:57:11
Node 'controller' has gone away
08:49:13
Node 'controller3' is back online
08:49:11
Node 'controller3' has gone away
08:43:23
Node 'controller3' is back online
08:43:17
Node 'controller2' is back online
08:43:10
Node 'controller3' has gone away
08:42:40
Node 'controller2' has gone away
08:42:18
Node 'controller' is back online
08:41:40
Node 'controller' has gone away
08:33:48
Node 'controller2' is back online
08:33:40
Node 'controller2' has gone away
08:29:45
Node 'controller' is back online
.....
in /var/log/
E, [2017-02-
I, [2017-02-
at depth 0 - 18: self signed certificate
I, [2017-02-
at depth 0 - 18: self signed certificate
E, [2017-02-
I, [2017-02-
at depth 0 - 18: self signed certificate
I, [2017-02-
at depth 0 - 18: self signed certificate
..........
affects: | designate → fuel |
Changed in fuel: | |
milestone: | none → 10.1 |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Fuel Sustaining (fuel-sustaining-team) |
tags: | added: customer-found support |
Changed in fuel: | |
milestone: | 10.1 → 11.0 |
tags: | added: area-python |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Alexey Stupnikov (astupnikov) |
I have used strace to find out what is the reason of 'udevadm settle' timeouts. It turns out that it is a known udev issue described here [1] and kernel developers are fine with it. On the other hand, even if they will solve this issue, the patch they will write will be non-backportable for us, so there is no way to fix this issue directly. I think that the best WA will be calling 'udevadm settle' with reasonable timeout, say 15 seconds, that will allow us to fix the original bug with mpath devices being not ready at agent's startup, but will also allow us to use current agent's timeouts without flaps.
[1] https:/ /lists. gt.net/ linux/kernel/ 1524376