neutron-openvswitch-agent failed after cluster cold shutdown
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
Medium
|
Fuel Sustaining | ||
Mitaka |
Invalid
|
Medium
|
Fuel Sustaining | ||
Newton |
Invalid
|
Medium
|
Fuel Sustaining |
Bug Description
Detailed bug description:
HA suite cannot be performed for cluster after cold restart. All tests fail with "Can not set proxy for Health Check.Make sure that network configuration for controllers is correct"
Steps to reproduce:
1. Pre-condition - do steps from 'deploy_ha_cinder' test
2. Create 2 instances
3. Create 2 volumes
4. Attach volumes to instances
5. Fill cinder storage up to 30%
6. Cold shutdown of all nodes
7. Wait 5 min
8. Start of all nodes
9. Wait for HA services ready <== FAIL
10. Verify networks
11. Run OSTF tests
Expected results:
HA suite PASS
Actual result:
Cluster cannot recover in long period (over 0.5 - 1 hours)
'fuel node' shows all nodes are online
[root@nailgun ~]# fuel health --env 1 --check ha
[ 1 of 7] [failure] 'Check state of haproxy backends on controllers' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 2 of 7] [failure] 'Check data replication over mysql' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 3 of 7] [failure] 'Check if amount of tables in databases is the same on each node' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 4 of 7] [failure] 'Check galera environment state' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 5 of 7] [failure] 'Check pacemaker status' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 6 of 7] [failure] 'RabbitMQ availability' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
[ 7 of 7] [failure] 'RabbitMQ replication' (0.0 s) Can not set proxy for Health Check.Make sure that network configuration for controllers is correct
root@node-5:~# crm status
Last updated: Wed May 25 12:43:53 2016 Last change: Wed May 25 06:57:55 2016 by root via cibadmin on node-1.
Stack: corosync
Current DC: node-2.
3 nodes and 46 resources configured
Online: [ node-1.
Clone Set: clone_p_vrouter [p_vrouter]
Started: [ node-1.
vip__management (ocf::fuel:
vip__vrouter_pub (ocf::fuel:
vip__vrouter (ocf::fuel:
Clone Set: clone_p_haproxy [p_haproxy]
Started: [ node-1.
Clone Set: clone_p_mysqld [p_mysqld]
Started: [ node-2.
sysinfo_
Master/Slave Set: master_p_conntrackd [p_conntrackd]
Masters: [ node-2.
Master/Slave Set: master_
Slaves: [ node-2.
Clone Set: clone_p_dns [p_dns]
Started: [ node-2.
Clone Set: clone_neutron-
neutron-
Failed Actions:
* neutron-
last-
* sysinfo_
last-
Description of the environment:
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
376
cat /etc/fuel_
376
cat /etc/fuel_release:
9.0
cat /etc/fuel_
mitaka-9.0
rpm -qa | egrep 'fuel|astute|
fuel-release-
fuel-bootstrap
fuel-migrate-
rubygem-
fuel-misc-
network-
fuel-mirror-
fuel-openstack
fuel-notify-
nailgun-
fuel-provision
python-
fuelmenu-
fuel-9.
fuel-utils-
fuel-setup-
fuel-library9.
shotgun-
fuel-agent-
fuel-ui-
fuel-ostf-
python-
fuel-nailgun-
logs: https:/
Changed in fuel: | |
milestone: | none → 9.0 |
assignee: | nobody → Fuel Sustaining (fuel-sustaining-team) |
importance: | Undecided → High |
status: | New → Confirmed |
tags: | added: area-library |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Kyrylo Galanov (kgalanov) |
status: | Confirmed → In Progress |
Changed in fuel: | |
assignee: | Kyrylo Galanov (kgalanov) → MOS Neutron (mos-neutron) |
tags: | removed: area-library |
Changed in fuel: | |
status: | Incomplete → Confirmed |
Changed in fuel: | |
status: | Incomplete → Invalid |
Last logs from ovs agent on node-2:
2016-05-25 09:17:02.039 17506 ERROR neutron. agent.ovsdb. impl_vsctl [req-4c1e47fe- 17bc-4cb5- 86d3-3963a62670 1c - - - - -] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', 'list-br']. Exception: Exit code: 142; Stdin: ; Stdout: ; Stderr: 2016-05- 25T09:16: 57Z|00001| fatal_signal| WARN|terminatin g with signal 14 (Alarm clock)
2016-05-25 09:17:02.310 17506 ERROR neutron. plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-4c1e47fe- 17bc-4cb5- 86d3-3963a62670 1c - - - - -] Exit code: 142; Stdin: ; Stdout: ; Stderr: 2016-05- 25T09:16: 57Z|00001| fatal_signal| WARN|terminatin g with signal 14 (Alarm clock)
Agent terminated!
After that there were no attempts to start the agent