ironic jobs sometimes fail with node callback timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic |
Invalid
|
High
|
Unassigned |
Bug Description
This was on stable/juno in the gate:
Details: (ServerAddresse
The last time that instance uuid shows up in the nova-compute logs is here:
And there are a ton of these it seems:
2015-01-06 18:54:36.732 21016 DEBUG nova.virt.
Looking at the ironic conductor logs, the only time I see the instance_uuid is here:
Shortly after that the node is powered on:
There are a ton of rpc timeout debug messages in the ironic conductor log, I don't know if that's normal or not.
I'm not familiar with the ironic code but it looks like something is hung here and it seems we need some warning/info logging in here to trace when we think we're hung or hitting some timeout because right now there are no errors in the nova or ironic logs so I can't fingerprint on anything in logstash to track this race.
Changed in ironic: | |
importance: | Undecided → Medium |
status: | New → Triaged |
summary: |
- gate-grenade-dsvm-ironic-sideways fails with instance build timeout + ironic jobs sometimes fail with instance build timeout |
summary: |
- ironic jobs sometimes fail with instance build timeout + ironic jobs sometimes fail with node callback timeout |
Ah there is a kernel panic in the baremetal logs:
http:// logs.openstack. org/60/ 144760/ 1/gate/ gate-grenade- dsvm-ironic- sideways/ 5d08c02/ logs/ironic- bm-logs/ baremetalbrbm_ 2_console. txt.gz
[ 422.148000] Kernel panic - not syncing: Fatal exception in interrupt