Nova api call to get instance details returned <class 'oslo_db.exception.DBConnectionError'> (HTTP 500)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Bogdan Dobrelya | ||
Mitaka |
Invalid
|
High
|
Bogdan Dobrelya |
Bug Description
fuel version - 9.0-mos-416
Steps:
1. Deploy next cluster - Neutron Vlan, ceph for volumes and images, ceph replication factor - 2, 2 controller, 1 controller+ceph, compute+ceph, compute
2. Run ha, sanity OSTF suites
3. Create instance
4. Get instance details from nova
Expected result - instance details are returned
Actual result - ClientException: Unexpected API Error. Please report this at http://
<class 'oslo_db.
rabbitmq.log shows:
2016-05-
2016-05-
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: area-library |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin) |
Changed in fuel: | |
assignee: | Georgy Kibardin (gkibardin) → Bogdan Dobrelya (bogdando) |
tags: | added: swarm-fail |
There are multiple Read error messages from all nodes right near to the "Get instance details from nova" operation, see in the sshd logs. This points out networking issues specific to the env. Those have caused a temporary, but healed later, split-brain to the management VIP:
2016-05- 29T23:50: 28.148935+ 00:00 node-1 pengine err: error: Resource vip__management (ocf::ns_IPaddr2) is active on 2 nodes attempting recovery
These makes me thing the bug is invalid. Here is events flow right after that anyway, http:// pastebin. com/MPvgG85x. Network disruption made galera nodes disturbed (line 30), some DB ops failed for neutron and nova (as reported in the bug), then a reply_q have been lost after MQ cluster failover, which made the neutron-server to retry for ever (why not to recreate?)... A tipical failover, recovered later. I see no other issues