Activity log for bug #1455761

Date Who What changed Old value New value Message
2015-05-16 13:03:36 Nastya Urlapova bug added bug
2015-05-16 13:03:36 Nastya Urlapova attachment added fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz https://bugs.launchpad.net/bugs/1455761/+attachment/4398533/+files/fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz
2015-05-16 13:04:05 Nastya Urlapova summary Deployment with bonds failed on second controller Deployment with active backup bonds failed on second controller
2015-05-16 14:41:43 Stanislaw Bogatkin fuel: status New Confirmed
2015-05-16 14:46:47 Oleksiy Molchanov fuel: assignee Fuel Library Team (fuel-library) Oleksiy Molchanov (omolchanov)
2015-05-16 14:46:56 Oleksiy Molchanov fuel: status Confirmed In Progress
2015-05-18 12:04:03 Bogdan Dobrelya bug added subscriber Stanislav Makar
2015-05-18 12:04:12 Bogdan Dobrelya bug added subscriber Sergey Vasilenko
2015-05-18 12:04:17 Bogdan Dobrelya tags l23network
2015-05-18 12:04:24 Bogdan Dobrelya nominated for series fuel/6.0.x
2015-05-18 12:04:24 Bogdan Dobrelya bug task added fuel/6.0.x
2015-05-18 12:04:31 Bogdan Dobrelya fuel/6.0.x: status New Won't Fix
2015-05-18 12:12:46 Bogdan Dobrelya summary Deployment with active backup bonds failed on second controller Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2
2015-05-18 12:12:54 Bogdan Dobrelya tags l23network ha rabbitmq
2015-05-18 12:12:59 Bogdan Dobrelya fuel: assignee Oleksiy Molchanov (omolchanov) Bogdan Dobrelya (bogdando)
2015-05-18 12:13:02 Bogdan Dobrelya fuel: status In Progress Confirmed
2015-05-18 12:55:40 Bogdan Dobrelya description VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in Centos HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
2015-05-18 13:26:48 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in Centos HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
2015-05-18 14:05:08 OpenStack Infra fuel: status Confirmed In Progress
2015-05-18 14:06:01 Bogdan Dobrelya fuel/6.0.x: status Won't Fix Triaged
2015-05-18 14:06:08 Bogdan Dobrelya nominated for series fuel/5.1.x
2015-05-18 14:06:08 Bogdan Dobrelya bug task added fuel/5.1.x
2015-05-18 14:06:44 Bogdan Dobrelya summary Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2 Rabbit failed to start and join cluster at the second controller node
2015-05-18 14:06:52 Bogdan Dobrelya fuel/5.1.x: status New Triaged
2015-05-18 14:06:56 Bogdan Dobrelya fuel/5.1.x: importance Undecided High
2015-05-18 14:06:59 Bogdan Dobrelya fuel/6.0.x: importance Undecided High
2015-05-18 14:07:06 Bogdan Dobrelya fuel/5.1.x: milestone 5.1.2
2015-05-18 14:07:11 Bogdan Dobrelya fuel/6.0.x: milestone 6.0.2
2015-05-18 14:07:59 Bogdan Dobrelya fuel/5.1.x: assignee Fuel Library Team (fuel-library)
2015-05-18 14:08:05 Bogdan Dobrelya fuel/6.0.x: assignee Fuel Library Team (fuel-library)
2015-05-18 15:57:04 Bogdan Dobrelya summary Rabbit failed to start and join cluster at the second controller node Rabbit app failed to start and join cluster at the second controller node but cannot be noticed by OCF logic
2015-05-18 16:03:15 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, then both beam and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2)"rabbit app is running. checking if we are the part of healthy cluster" but the logs show the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.
2015-05-18 16:03:22 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, then both beam and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2)"rabbit app is running. checking if we are the part of healthy cluster" but the logs show the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.
2015-05-18 21:19:52 OpenStack Infra fuel: status In Progress Fix Committed
2015-05-19 08:02:45 Bogdan Dobrelya fuel: status Fix Committed In Progress
2015-05-19 12:50:41 Bogdan Dobrelya fuel: importance High Critical
2015-05-19 12:56:00 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
2015-05-19 12:56:12 Bogdan Dobrelya nominated for series fuel/7.0.x
2015-05-19 12:56:12 Bogdan Dobrelya bug task added fuel/7.0.x
2015-05-19 12:56:20 Bogdan Dobrelya fuel/7.0.x: milestone 7.0
2015-05-19 12:56:24 Bogdan Dobrelya fuel/7.0.x: assignee Bogdan Dobrelya (bogdando)
2015-05-19 12:56:27 Bogdan Dobrelya fuel/7.0.x: importance Undecided High
2015-05-19 12:56:31 Bogdan Dobrelya fuel/7.0.x: status New Triaged
2015-05-19 13:46:06 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running
2015-05-19 18:27:00 Bogdan Dobrelya fuel: importance Critical High
2015-05-19 18:27:06 Bogdan Dobrelya fuel: status In Progress Won't Fix
2015-05-20 14:05:32 Bogdan Dobrelya fuel: status Won't Fix Confirmed
2015-05-20 14:33:47 OpenStack Infra fuel: status Confirmed In Progress
2015-05-20 14:33:47 OpenStack Infra fuel: assignee Bogdan Dobrelya (bogdando) Bartlomiej Piotrowski (bpiotrowski)
2015-05-21 01:31:54 OpenStack Infra fuel: assignee Bartlomiej Piotrowski (bpiotrowski) Vladimir Kuklin (vkuklin)
2015-05-21 09:02:30 OpenStack Infra fuel: assignee Vladimir Kuklin (vkuklin) Bartlomiej Piotrowski (bpiotrowski)
2015-05-21 23:49:52 OpenStack Infra fuel: assignee Bartlomiej Piotrowski (bpiotrowski) Vladimir Kuklin (vkuklin)
2015-05-22 07:44:56 OpenStack Infra fuel: status In Progress Fix Committed
2015-05-22 09:43:45 Bogdan Dobrelya bug task deleted fuel/7.0.x
2015-05-22 09:55:10 Bogdan Dobrelya fuel: status Fix Committed In Progress
2015-05-22 09:55:13 Bogdan Dobrelya fuel: assignee Vladimir Kuklin (vkuklin) Bogdan Dobrelya (bogdando)
2015-05-23 01:44:26 OpenStack Infra fuel: assignee Bogdan Dobrelya (bogdando) Vladimir Kuklin (vkuklin)
2015-05-25 07:41:41 Bogdan Dobrelya fuel: assignee Vladimir Kuklin (vkuklin) Bogdan Dobrelya (bogdando)
2015-05-25 07:45:23 Bogdan Dobrelya fuel: importance High Critical
2015-05-25 07:45:26 Bogdan Dobrelya fuel/5.1.x: importance High Critical
2015-05-25 07:45:30 Bogdan Dobrelya fuel/6.0.x: importance High Critical
2015-05-25 07:53:22 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running The test case which reproduces this issue after some number of iterations: 0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03. 1) Move master to node-2 2) wait for ostf ha passed 3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed) a. also get pcs status for cluster nodes b. and get the rabbit pacemaker resource status, c. and get the rabbitmqctl cluster_status output 4) report estimated failover time 5) power on node-2 and wait for it joined the rabbit cluster 6) repeat 1-5 See the test example with bash http://pastebin.com/eANHxrHV
2015-05-25 11:08:52 Bogdan Dobrelya fuel: importance Critical High
2015-05-25 11:08:57 Bogdan Dobrelya fuel/5.1.x: importance Critical High
2015-05-25 11:09:00 Bogdan Dobrelya fuel/6.0.x: importance Critical High
2015-05-25 12:11:09 OpenStack Infra fuel: status In Progress Fix Committed
2015-05-25 12:11:59 Vladimir Kuklin fuel: status Fix Committed In Progress
2015-05-25 15:03:07 OpenStack Infra fuel: status In Progress Fix Committed
2015-05-26 07:35:20 Bogdan Dobrelya fuel: status Fix Committed In Progress
2015-05-26 10:40:45 Bogdan Dobrelya fuel: status In Progress Fix Committed
2015-05-26 11:08:19 Bogdan Dobrelya description VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running The test case which reproduces this issue after some number of iterations: 0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03. 1) Move master to node-2 2) wait for ostf ha passed 3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed) a. also get pcs status for cluster nodes b. and get the rabbit pacemaker resource status, c. and get the rabbitmqctl cluster_status output 4) report estimated failover time 5) power on node-2 and wait for it joined the rabbit cluster 6) repeat 1-5 See the test example with bash http://pastebin.com/eANHxrHV VERSION:   feature_groups:     - mirantis   production: "docker"   release: "6.1"   openstack_version: "2014.2.2-6.1"   api: "1.0"   build_number: "421"   build_id: "2015-05-15_20-55-26"   nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"   python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"   astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"   fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"   fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"   fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup)         Scenario:             1. Create cluster             2. Add 3 nodes with controller role             3. Add 2 node with compute role             4. Setup bonding for all interfaces             4. Deploy the cluster             5. Run network verification             6. Run OSTF Deployment failed with err:  (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. The test case which reproduces this issue after some number of iterations: is described here https://bugs.launchpad.net/fuel/+bug/1458830
2015-07-13 10:12:53 Bogdan Dobrelya fuel/5.1.x: assignee Fuel Library Team (fuel-library) MOS Sustaining (mos-sustaining)
2015-07-13 10:12:58 Bogdan Dobrelya fuel/6.0.x: assignee Fuel Library Team (fuel-library) MOS Sustaining (mos-sustaining)
2015-09-26 11:07:42 Vitaly Sedelnik fuel/6.0.x: milestone 6.0.2 6.0.1
2015-10-26 12:56:29 Vitaly Sedelnik fuel/5.1.x: milestone 5.1.1-updates 5.1.1-mu-2
2015-10-26 12:56:31 Vitaly Sedelnik fuel/6.0.x: milestone 6.0-updates 6.0-mu-7
2015-10-26 12:56:37 Vitaly Sedelnik fuel/5.1.x: assignee MOS Maintenance (mos-maintenance) Denis Meltsaykin (dmeltsaykin)
2015-10-26 12:56:42 Vitaly Sedelnik fuel/6.0.x: assignee MOS Maintenance (mos-maintenance) Denis Meltsaykin (dmeltsaykin)
2015-10-26 13:44:04 Denis Meltsaykin fuel/5.1.x: status Triaged Won't Fix
2015-10-26 13:44:06 Denis Meltsaykin fuel/6.0.x: status Triaged Won't Fix
2015-10-26 15:08:17 Vitaly Sedelnik fuel/5.1.x: milestone 5.1.1-mu-2 5.1.1-updates
2015-10-26 15:08:20 Vitaly Sedelnik fuel/6.0.x: milestone 6.0-mu-7 6.0-updates