some of OSTF HA test failed after restart of neutron agents

Bug #1546927 reported by Artem Hrechanychenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Maksim Malchuk
8.0.x
Invalid
High
Maksim Malchuk

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

Steps to reproduce:

1. Deploy environment with 3 controllers and NeutronTUN or NeutronVLAN, all default storages, 2 compute, 1 cinder node
2. Kill neutron agents at all on one of the controllers. Pacemaker should restart it
3. Verify networks
4. Run OSTF tests

Actual Results:

Next OSTF HA test are failed, even if manually restarted one-by-one several times:

OSTF test #1
"Check state of haproxy backends on controllers
Some haproxy backend has down state.. Please refer to OpenStack logs for more details."

If manually checked haproxy backends on controller:

#root@node-1:~# haproxy-status.sh | egrep -v "BACKEND|FRONTEND
>>swift node-2 Status: DOWN/L7TOUT Sessions: 0 Rate: 0 "

OSTF test #2
"RabbitMQ availability
Number of RabbitMQ nodes is not equal to number of cluster nodes."

if manually checked rabbimq cluster status - looks like rabbit dead on node-2:

root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@messaging-node-1' ...
[{nodes,[{disc,['rabbit@messaging-node-1','rabbit@messaging-node-3']}]},
 {running_nodes,['rabbit@messaging-node-3','rabbit@messaging-node-1']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]

root@node-2:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@messaging-node-2' ...
[{nodes,[{disc,['rabbit@messaging-node-1','rabbit@messaging-node-2',
                'rabbit@messaging-node-3']}]}]

OSTF test #3
"RabbitMQ replication
Failed to establish AMQP connection to 5673/tcp port on 10.109.11.5 from controller node! Please refer to OpenStack logs for more details."

Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :

snapshot from env will be added little later.

Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
Changed in fuel:
status: New → Confirmed
assignee: nobody → Fuel Library Team (fuel-library)
tags: added: area-library
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 8.0 → 9.0
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Maksim Malchuk (mmalchuk)
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

1. the bug doesn't reproduce in the same environment
2. the scenario should be fixed: need to check the pacemaker resource before verifying networks and run OSTF tests.

Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :

Agree - we tried with Maksim to reproduce this bug on the same env where it was been found.
Test scenario was fixed: additional steps with checking pacemaker resources was added

Changed in fuel:
status: Confirmed → Invalid
tags: added: area-ostf
removed: ostf
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.