Fuel for OpenStack

Bug #1455761
Activity log

Activity log for bug #1455761

Date	Who	What changed	Old value	New value	Message
2015-05-16 13:03:36	Nastya Urlapova	bug			added bug
2015-05-16 13:03:36	Nastya Urlapova	attachment added		fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz https://bugs.launchpad.net/bugs/1455761/+attachment/4398533/+files/fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz
2015-05-16 13:04:05	Nastya Urlapova	summary	Deployment with bonds failed on second controller	Deployment with active backup bonds failed on second controller
2015-05-16 14:41:43	Stanislaw Bogatkin	fuel: status	New	Confirmed
2015-05-16 14:46:47	Oleksiy Molchanov	fuel: assignee	Fuel Library Team (fuel-library)	Oleksiy Molchanov (omolchanov)
2015-05-16 14:46:56	Oleksiy Molchanov	fuel: status	Confirmed	In Progress
2015-05-18 12:04:03	Bogdan Dobrelya	bug			added subscriber Stanislav Makar
2015-05-18 12:04:12	Bogdan Dobrelya	bug			added subscriber Sergey Vasilenko
2015-05-18 12:04:17	Bogdan Dobrelya	tags		l23network
2015-05-18 12:04:24	Bogdan Dobrelya	nominated for series		fuel/6.0.x
2015-05-18 12:04:24	Bogdan Dobrelya	bug task added		fuel/6.0.x
2015-05-18 12:04:31	Bogdan Dobrelya	fuel/6.0.x: status	New	Won't Fix
2015-05-18 12:12:46	Bogdan Dobrelya	summary	Deployment with active backup bonds failed on second controller	Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2
2015-05-18 12:12:54	Bogdan Dobrelya	tags	l23network	ha rabbitmq
2015-05-18 12:12:59	Bogdan Dobrelya	fuel: assignee	Oleksiy Molchanov (omolchanov)	Bogdan Dobrelya (bogdando)
2015-05-18 12:13:02	Bogdan Dobrelya	fuel: status	In Progress	Confirmed
2015-05-18 12:55:40	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in Centos HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
2015-05-18 13:26:48	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in Centos HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
2015-05-18 14:05:08	OpenStack Infra	fuel: status	Confirmed	In Progress
2015-05-18 14:06:01	Bogdan Dobrelya	fuel/6.0.x: status	Won't Fix	Triaged
2015-05-18 14:06:08	Bogdan Dobrelya	nominated for series		fuel/5.1.x
2015-05-18 14:06:08	Bogdan Dobrelya	bug task added		fuel/5.1.x
2015-05-18 14:06:44	Bogdan Dobrelya	summary	Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2	Rabbit failed to start and join cluster at the second controller node
2015-05-18 14:06:52	Bogdan Dobrelya	fuel/5.1.x: status	New	Triaged
2015-05-18 14:06:56	Bogdan Dobrelya	fuel/5.1.x: importance	Undecided	High
2015-05-18 14:06:59	Bogdan Dobrelya	fuel/6.0.x: importance	Undecided	High
2015-05-18 14:07:06	Bogdan Dobrelya	fuel/5.1.x: milestone		5.1.2
2015-05-18 14:07:11	Bogdan Dobrelya	fuel/6.0.x: milestone		6.0.2
2015-05-18 14:07:59	Bogdan Dobrelya	fuel/5.1.x: assignee		Fuel Library Team (fuel-library)
2015-05-18 14:08:05	Bogdan Dobrelya	fuel/6.0.x: assignee		Fuel Library Team (fuel-library)
2015-05-18 15:57:04	Bogdan Dobrelya	summary	Rabbit failed to start and join cluster at the second controller node	Rabbit app failed to start and join cluster at the second controller node but cannot be noticed by OCF logic
2015-05-18 16:03:15	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, then both beam and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2)"rabbit app is running. checking if we are the part of healthy cluster" but the logs show the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.
2015-05-18 16:03:22	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, then both beam and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2)"rabbit app is running. checking if we are the part of healthy cluster" but the logs show the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.
2015-05-18 21:19:52	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-05-19 08:02:45	Bogdan Dobrelya	fuel: status	Fix Committed	In Progress
2015-05-19 12:50:41	Bogdan Dobrelya	fuel: importance	High	Critical
2015-05-19 12:56:00	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK.	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
2015-05-19 12:56:12	Bogdan Dobrelya	nominated for series		fuel/7.0.x
2015-05-19 12:56:12	Bogdan Dobrelya	bug task added		fuel/7.0.x
2015-05-19 12:56:20	Bogdan Dobrelya	fuel/7.0.x: milestone		7.0
2015-05-19 12:56:24	Bogdan Dobrelya	fuel/7.0.x: assignee		Bogdan Dobrelya (bogdando)
2015-05-19 12:56:27	Bogdan Dobrelya	fuel/7.0.x: importance	Undecided	High
2015-05-19 12:56:31	Bogdan Dobrelya	fuel/7.0.x: status	New	Triaged
2015-05-19 13:46:06	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running
2015-05-19 18:27:00	Bogdan Dobrelya	fuel: importance	Critical	High
2015-05-19 18:27:06	Bogdan Dobrelya	fuel: status	In Progress	Won't Fix
2015-05-20 14:05:32	Bogdan Dobrelya	fuel: status	Won't Fix	Confirmed
2015-05-20 14:33:47	OpenStack Infra	fuel: status	Confirmed	In Progress
2015-05-20 14:33:47	OpenStack Infra	fuel: assignee	Bogdan Dobrelya (bogdando)	Bartlomiej Piotrowski (bpiotrowski)
2015-05-21 01:31:54	OpenStack Infra	fuel: assignee	Bartlomiej Piotrowski (bpiotrowski)	Vladimir Kuklin (vkuklin)
2015-05-21 09:02:30	OpenStack Infra	fuel: assignee	Vladimir Kuklin (vkuklin)	Bartlomiej Piotrowski (bpiotrowski)
2015-05-21 23:49:52	OpenStack Infra	fuel: assignee	Bartlomiej Piotrowski (bpiotrowski)	Vladimir Kuklin (vkuklin)
2015-05-22 07:44:56	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-05-22 09:43:45	Bogdan Dobrelya	bug task deleted	fuel/7.0.x
2015-05-22 09:55:10	Bogdan Dobrelya	fuel: status	Fix Committed	In Progress
2015-05-22 09:55:13	Bogdan Dobrelya	fuel: assignee	Vladimir Kuklin (vkuklin)	Bogdan Dobrelya (bogdando)
2015-05-23 01:44:26	OpenStack Infra	fuel: assignee	Bogdan Dobrelya (bogdando)	Vladimir Kuklin (vkuklin)
2015-05-25 07:41:41	Bogdan Dobrelya	fuel: assignee	Vladimir Kuklin (vkuklin)	Bogdan Dobrelya (bogdando)
2015-05-25 07:45:23	Bogdan Dobrelya	fuel: importance	High	Critical
2015-05-25 07:45:26	Bogdan Dobrelya	fuel/5.1.x: importance	High	Critical
2015-05-25 07:45:30	Bogdan Dobrelya	fuel/6.0.x: importance	High	Critical
2015-05-25 07:53:22	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running The test case which reproduces this issue after some number of iterations: 0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03. 1) Move master to node-2 2) wait for ostf ha passed 3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed) a. also get pcs status for cluster nodes b. and get the rabbit pacemaker resource status, c. and get the rabbitmqctl cluster_status output 4) report estimated failover time 5) power on node-2 and wait for it joined the rabbit cluster 6) repeat 1-5 See the test example with bash http://pastebin.com/eANHxrHV
2015-05-25 11:08:52	Bogdan Dobrelya	fuel: importance	Critical	High
2015-05-25 11:08:57	Bogdan Dobrelya	fuel/5.1.x: importance	Critical	High
2015-05-25 11:09:00	Bogdan Dobrelya	fuel/6.0.x: importance	Critical	High
2015-05-25 12:11:09	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-05-25 12:11:59	Vladimir Kuklin	fuel: status	Fix Committed	In Progress
2015-05-25 15:03:07	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-05-26 07:35:20	Bogdan Dobrelya	fuel: status	Fix Committed	In Progress
2015-05-26 10:40:45	Bogdan Dobrelya	fuel: status	In Progress	Fix Committed
2015-05-26 11:08:19	Bogdan Dobrelya	description	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster. This means that in order to fix this issue, the start action must be redesigned as the following: 1) Do not remove the iptables block rule on the action start() exit. 2) Leave the rabbit app started on the action start() exit. 3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. 4) make action monitor to report "Not running" if rabbit app is not running The test case which reproduces this issue after some number of iterations: 0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03. 1) Move master to node-2 2) wait for ostf ha passed 3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed) a. also get pcs status for cluster nodes b. and get the rabbit pacemaker resource status, c. and get the rabbitmqctl cluster_status output 4) report estimated failover time 5) power on node-2 and wait for it joined the rabbit cluster 6) repeat 1-5 See the test example with bash http://pastebin.com/eANHxrHV	VERSION: feature_groups: - mirantis production: "docker" release: "6.1" openstack_version: "2014.2.2-6.1" api: "1.0" build_number: "421" build_id: "2015-05-15_20-55-26" nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858" python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673" astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601" fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033" fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33" fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e" Deploy cluster in HA mode with bonding (active backup) Scenario: 1. Create cluster 2. Add 3 nodes with controller role 3. Add 2 node with compute role 4. Setup bonding for all interfaces 4. Deploy the cluster 5. Run network verification 6. Run OSTF Deployment failed with err: (/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ Normally, when both beam.smp and rabbit app have started, there should be two log records: 1) "checking if rabbit app is running" 2) "rabbit app is running. checking if we are the part of healthy cluster" But the logs shown the second record is missing, hence rabbit app was not started. And get_monitor() was not able to detect this and reported OK. The test case which reproduces this issue after some number of iterations: is described here https://bugs.launchpad.net/fuel/+bug/1458830
2015-07-13 10:12:53	Bogdan Dobrelya	fuel/5.1.x: assignee	Fuel Library Team (fuel-library)	MOS Sustaining (mos-sustaining)
2015-07-13 10:12:58	Bogdan Dobrelya	fuel/6.0.x: assignee	Fuel Library Team (fuel-library)	MOS Sustaining (mos-sustaining)
2015-09-26 11:07:42	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0.2	6.0.1
2015-10-26 12:56:29	Vitaly Sedelnik	fuel/5.1.x: milestone	5.1.1-updates	5.1.1-mu-2
2015-10-26 12:56:31	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0-updates	6.0-mu-7
2015-10-26 12:56:37	Vitaly Sedelnik	fuel/5.1.x: assignee	MOS Maintenance (mos-maintenance)	Denis Meltsaykin (dmeltsaykin)
2015-10-26 12:56:42	Vitaly Sedelnik	fuel/6.0.x: assignee	MOS Maintenance (mos-maintenance)	Denis Meltsaykin (dmeltsaykin)
2015-10-26 13:44:04	Denis Meltsaykin	fuel/5.1.x: status	Triaged	Won't Fix
2015-10-26 13:44:06	Denis Meltsaykin	fuel/6.0.x: status	Triaged	Won't Fix
2015-10-26 15:08:17	Vitaly Sedelnik	fuel/5.1.x: milestone	5.1.1-mu-2	5.1.1-updates
2015-10-26 15:08:20	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0-mu-7	6.0-updates