2015-05-16 13:03:36 |
Nastya Urlapova |
bug |
|
|
added bug |
2015-05-16 13:03:36 |
Nastya Urlapova |
attachment added |
|
fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz https://bugs.launchpad.net/bugs/1455761/+attachment/4398533/+files/fail_error_deploy_bonding_ha_active_backup-2015_05_16__02_41_50.tar.xz |
|
2015-05-16 13:04:05 |
Nastya Urlapova |
summary |
Deployment with bonds failed on second controller |
Deployment with active backup bonds failed on second controller |
|
2015-05-16 14:41:43 |
Stanislaw Bogatkin |
fuel: status |
New |
Confirmed |
|
2015-05-16 14:46:47 |
Oleksiy Molchanov |
fuel: assignee |
Fuel Library Team (fuel-library) |
Oleksiy Molchanov (omolchanov) |
|
2015-05-16 14:46:56 |
Oleksiy Molchanov |
fuel: status |
Confirmed |
In Progress |
|
2015-05-18 12:04:03 |
Bogdan Dobrelya |
bug |
|
|
added subscriber Stanislav Makar |
2015-05-18 12:04:12 |
Bogdan Dobrelya |
bug |
|
|
added subscriber Sergey Vasilenko |
2015-05-18 12:04:17 |
Bogdan Dobrelya |
tags |
|
l23network |
|
2015-05-18 12:04:24 |
Bogdan Dobrelya |
nominated for series |
|
fuel/6.0.x |
|
2015-05-18 12:04:24 |
Bogdan Dobrelya |
bug task added |
|
fuel/6.0.x |
|
2015-05-18 12:04:31 |
Bogdan Dobrelya |
fuel/6.0.x: status |
New |
Won't Fix |
|
2015-05-18 12:12:46 |
Bogdan Dobrelya |
summary |
Deployment with active backup bonds failed on second controller |
Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2 |
|
2015-05-18 12:12:54 |
Bogdan Dobrelya |
tags |
l23network |
ha rabbitmq |
|
2015-05-18 12:12:59 |
Bogdan Dobrelya |
fuel: assignee |
Oleksiy Molchanov (omolchanov) |
Bogdan Dobrelya (bogdando) |
|
2015-05-18 12:13:02 |
Bogdan Dobrelya |
fuel: status |
In Progress |
Confirmed |
|
2015-05-18 12:55:40 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in Centos HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] |
|
2015-05-18 13:26:48 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in Centos HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0] |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ |
|
2015-05-18 14:05:08 |
OpenStack Infra |
fuel: status |
Confirmed |
In Progress |
|
2015-05-18 14:06:01 |
Bogdan Dobrelya |
fuel/6.0.x: status |
Won't Fix |
Triaged |
|
2015-05-18 14:06:08 |
Bogdan Dobrelya |
nominated for series |
|
fuel/5.1.x |
|
2015-05-18 14:06:08 |
Bogdan Dobrelya |
bug task added |
|
fuel/5.1.x |
|
2015-05-18 14:06:44 |
Bogdan Dobrelya |
summary |
Deployment with active backup bonds failed on second controller as rabbit failed to start and node-2 |
Rabbit failed to start and join cluster at the second controller node |
|
2015-05-18 14:06:52 |
Bogdan Dobrelya |
fuel/5.1.x: status |
New |
Triaged |
|
2015-05-18 14:06:56 |
Bogdan Dobrelya |
fuel/5.1.x: importance |
Undecided |
High |
|
2015-05-18 14:06:59 |
Bogdan Dobrelya |
fuel/6.0.x: importance |
Undecided |
High |
|
2015-05-18 14:07:06 |
Bogdan Dobrelya |
fuel/5.1.x: milestone |
|
5.1.2 |
|
2015-05-18 14:07:11 |
Bogdan Dobrelya |
fuel/6.0.x: milestone |
|
6.0.2 |
|
2015-05-18 14:07:59 |
Bogdan Dobrelya |
fuel/5.1.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-05-18 14:08:05 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-05-18 15:57:04 |
Bogdan Dobrelya |
summary |
Rabbit failed to start and join cluster at the second controller node |
Rabbit app failed to start and join cluster at the second controller node but cannot be noticed by OCF logic |
|
2015-05-18 16:03:15 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
as rabbit@node-2 had started but never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/ |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, then both beam and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2)"rabbit app is running. checking if we are the part of healthy cluster"
but the logs show the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK. |
|
2015-05-18 16:03:22 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
as rabbit@node-2 app had never started and never tried to join the elected master, which was rabbit@node-1, see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, then both beam and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2)"rabbit app is running. checking if we are the part of healthy cluster"
but the logs show the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK. |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK. |
|
2015-05-18 21:19:52 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-05-19 08:02:45 |
Bogdan Dobrelya |
fuel: status |
Fix Committed |
In Progress |
|
2015-05-19 12:50:41 |
Bogdan Dobrelya |
fuel: importance |
High |
Critical |
|
2015-05-19 12:56:00 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK. |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. |
|
2015-05-19 12:56:12 |
Bogdan Dobrelya |
nominated for series |
|
fuel/7.0.x |
|
2015-05-19 12:56:12 |
Bogdan Dobrelya |
bug task added |
|
fuel/7.0.x |
|
2015-05-19 12:56:20 |
Bogdan Dobrelya |
fuel/7.0.x: milestone |
|
7.0 |
|
2015-05-19 12:56:24 |
Bogdan Dobrelya |
fuel/7.0.x: assignee |
|
Bogdan Dobrelya (bogdando) |
|
2015-05-19 12:56:27 |
Bogdan Dobrelya |
fuel/7.0.x: importance |
Undecided |
High |
|
2015-05-19 12:56:31 |
Bogdan Dobrelya |
fuel/7.0.x: status |
New |
Triaged |
|
2015-05-19 13:46:06 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster. |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
4) make action monitor to report "Not running" if rabbit app is not running |
|
2015-05-19 18:27:00 |
Bogdan Dobrelya |
fuel: importance |
Critical |
High |
|
2015-05-19 18:27:06 |
Bogdan Dobrelya |
fuel: status |
In Progress |
Won't Fix |
|
2015-05-20 14:05:32 |
Bogdan Dobrelya |
fuel: status |
Won't Fix |
Confirmed |
|
2015-05-20 14:33:47 |
OpenStack Infra |
fuel: status |
Confirmed |
In Progress |
|
2015-05-20 14:33:47 |
OpenStack Infra |
fuel: assignee |
Bogdan Dobrelya (bogdando) |
Bartlomiej Piotrowski (bpiotrowski) |
|
2015-05-21 01:31:54 |
OpenStack Infra |
fuel: assignee |
Bartlomiej Piotrowski (bpiotrowski) |
Vladimir Kuklin (vkuklin) |
|
2015-05-21 09:02:30 |
OpenStack Infra |
fuel: assignee |
Vladimir Kuklin (vkuklin) |
Bartlomiej Piotrowski (bpiotrowski) |
|
2015-05-21 23:49:52 |
OpenStack Infra |
fuel: assignee |
Bartlomiej Piotrowski (bpiotrowski) |
Vladimir Kuklin (vkuklin) |
|
2015-05-22 07:44:56 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-05-22 09:43:45 |
Bogdan Dobrelya |
bug task deleted |
fuel/7.0.x |
|
|
2015-05-22 09:55:10 |
Bogdan Dobrelya |
fuel: status |
Fix Committed |
In Progress |
|
2015-05-22 09:55:13 |
Bogdan Dobrelya |
fuel: assignee |
Vladimir Kuklin (vkuklin) |
Bogdan Dobrelya (bogdando) |
|
2015-05-23 01:44:26 |
OpenStack Infra |
fuel: assignee |
Bogdan Dobrelya (bogdando) |
Vladimir Kuklin (vkuklin) |
|
2015-05-25 07:41:41 |
Bogdan Dobrelya |
fuel: assignee |
Vladimir Kuklin (vkuklin) |
Bogdan Dobrelya (bogdando) |
|
2015-05-25 07:45:23 |
Bogdan Dobrelya |
fuel: importance |
High |
Critical |
|
2015-05-25 07:45:26 |
Bogdan Dobrelya |
fuel/5.1.x: importance |
High |
Critical |
|
2015-05-25 07:45:30 |
Bogdan Dobrelya |
fuel/6.0.x: importance |
High |
Critical |
|
2015-05-25 07:53:22 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
4) make action monitor to report "Not running" if rabbit app is not running |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
4) make action monitor to report "Not running" if rabbit app is not running
The test case which reproduces this issue after some number of iterations:
0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03.
1) Move master to node-2
2) wait for ostf ha passed
3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed)
a. also get pcs status for cluster nodes
b. and get the rabbit pacemaker resource status,
c. and get the rabbitmqctl cluster_status output
4) report estimated failover time
5) power on node-2 and wait for it joined the rabbit cluster
6) repeat 1-5
See the test example with bash http://pastebin.com/eANHxrHV |
|
2015-05-25 11:08:52 |
Bogdan Dobrelya |
fuel: importance |
Critical |
High |
|
2015-05-25 11:08:57 |
Bogdan Dobrelya |
fuel/5.1.x: importance |
Critical |
High |
|
2015-05-25 11:09:00 |
Bogdan Dobrelya |
fuel/6.0.x: importance |
Critical |
High |
|
2015-05-25 12:11:09 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-05-25 12:11:59 |
Vladimir Kuklin |
fuel: status |
Fix Committed |
In Progress |
|
2015-05-25 15:03:07 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-05-26 07:35:20 |
Bogdan Dobrelya |
fuel: status |
Fix Committed |
In Progress |
|
2015-05-26 10:40:45 |
Bogdan Dobrelya |
fuel: status |
In Progress |
Fix Committed |
|
2015-05-26 11:08:19 |
Bogdan Dobrelya |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
But the issue is that by design, the start action will start beam process and stop the rabbit app, unless master promoted or slave joined cluster.
This means that in order to fix this issue, the start action must be redesigned as the following:
1) Do not remove the iptables block rule on the action start() exit.
2) Leave the rabbit app started on the action start() exit.
3) remove the iptables block rule either on the post-promote notify, when master is elected and ready to join other nodes; or on the post-start notify, when slave is ready to join the cluster.
4) make action monitor to report "Not running" if rabbit app is not running
The test case which reproduces this issue after some number of iterations:
0) Given env 1, nodes 1,2,3. Assume node-2 is always a master under kill test and its virsh domname is env61_2_slave-03.
1) Move master to node-2
2) wait for ostf ha passed
3) kill the node-2 and start count for failover time in a wait loop (wait for ostf ha passed)
a. also get pcs status for cluster nodes
b. and get the rabbit pacemaker resource status,
c. and get the rabbitmqctl cluster_status output
4) report estimated failover time
5) power on node-2 and wait for it joined the rabbit cluster
6) repeat 1-5
See the test example with bash http://pastebin.com/eANHxrHV |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2.2-6.1"
api: "1.0"
build_number: "421"
build_id: "2015-05-15_20-55-26"
nailgun_sha: "eca3532abfcc15dc6c55f682dd3f037235c4e858"
python-fuelclient_sha: "38765563e1a7f14f45201fd47cf507393ff5d673"
astute_sha: "7e3e81f2e3d4557d5d1fd61a424df95c4d265601"
fuel-library_sha: "1645fe45f226cdd6d2829bea9912d0baa3be5033"
fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
fuelmain_sha: "d249d74f9beb5935c31b8ee674eb1ed696672f6e"
Deploy cluster in HA mode with bonding (active backup)
Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces
4. Deploy the cluster
5. Run network verification
6. Run OSTF
Deployment failed with err:
(/Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns) change from notrun to 0 failed: curl -k --noproxy localhost --retry 30 --retry-delay 6 -f -L -o /var/lib/rabbitmq/rabbitmqadmin http://nova:XT9PMcfX@localhost:15672/cli/rabbitmqadmin returned 7 instead of one of [0]
because the rabbit@node-2 app had never started and never tried to join the elected master (rabbit@node-1), see http://paste.openstack.org/show/lgkWy7A1EcFhCLoH6vdw/
Normally, when both beam.smp and rabbit app have started, there should be two log records:
1) "checking if rabbit app is running"
2) "rabbit app is running. checking if we are the part of healthy cluster"
But the logs shown the second record is missing, hence rabbit app was not
started. And get_monitor() was not able to detect this and reported OK.
The test case which reproduces this issue after some number of iterations:
is described here https://bugs.launchpad.net/fuel/+bug/1458830 |
|
2015-07-13 10:12:53 |
Bogdan Dobrelya |
fuel/5.1.x: assignee |
Fuel Library Team (fuel-library) |
MOS Sustaining (mos-sustaining) |
|
2015-07-13 10:12:58 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
Fuel Library Team (fuel-library) |
MOS Sustaining (mos-sustaining) |
|
2015-09-26 11:07:42 |
Vitaly Sedelnik |
fuel/6.0.x: milestone |
6.0.2 |
6.0.1 |
|
2015-10-26 12:56:29 |
Vitaly Sedelnik |
fuel/5.1.x: milestone |
5.1.1-updates |
5.1.1-mu-2 |
|
2015-10-26 12:56:31 |
Vitaly Sedelnik |
fuel/6.0.x: milestone |
6.0-updates |
6.0-mu-7 |
|
2015-10-26 12:56:37 |
Vitaly Sedelnik |
fuel/5.1.x: assignee |
MOS Maintenance (mos-maintenance) |
Denis Meltsaykin (dmeltsaykin) |
|
2015-10-26 12:56:42 |
Vitaly Sedelnik |
fuel/6.0.x: assignee |
MOS Maintenance (mos-maintenance) |
Denis Meltsaykin (dmeltsaykin) |
|
2015-10-26 13:44:04 |
Denis Meltsaykin |
fuel/5.1.x: status |
Triaged |
Won't Fix |
|
2015-10-26 13:44:06 |
Denis Meltsaykin |
fuel/6.0.x: status |
Triaged |
Won't Fix |
|
2015-10-26 15:08:17 |
Vitaly Sedelnik |
fuel/5.1.x: milestone |
5.1.1-mu-2 |
5.1.1-updates |
|
2015-10-26 15:08:20 |
Vitaly Sedelnik |
fuel/6.0.x: milestone |
6.0-mu-7 |
6.0-updates |
|