Fuel for OpenStack

Deployment fails with timeout on Cluster::Vrouter_ocf task

Bug #1582599 reported by Volodymyr Shypyguzov on 2016-05-17

This bug report is a duplicate of: Bug #1581015: Get last successful transaction per task does not evaluate status of deployment tasks per node. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Confirmed	High	Vladimir Sharshov	Fuel for OpenStack 9.0

Bug Description

Steps to reproduce:
        1. Create cluster in Ha mode with 1 controller
        2. Add 1 node with controller role
        3. Add 1 node with compute role
        4. Add 1 node with cinder role
        5. Verify network
        6. Provision nodes
        7. Make a test file on every node
        8. Deploy nodes
        9. Stop deployment
        10. Verify nodes are not reset to bootstrap image
        11. Re-deploy cluster << Fail
        12. Verify network
        13. Run OSTF
Expected result:
Cluster successfully redeployed
Actual result:
Deployment fails with the following error^ Deployment has failed. All nodes are finished. Failed tasks: Task[primary-rabbitmq/3], Task[cluster-vrouter/3]

In puppet logs:
node-3.test.domain.local 2016-05-17T00:58:09.110032 err: (/Stage[main]/Cluster::Vrouter_ocf/Service[p_vrouter]/ensure) change from stopped to running failed: Execution timeout after 1800 seconds!

Tags:

Revision history for this message

Volodymyr Shypyguzov (vshypyguzov) wrote on 2016-05-17:

fail_error_deploy_stop_on_deploying_ubuntu_bootstrap-fuel-snapshot-2016-05-17_02-33-52.tar.xz Edit (46.4 MiB, application/octet-stream)

Revision history for this message

Volodymyr Shypyguzov (vshypyguzov) wrote on 2016-05-17:

shotgun2_report.txt Edit (61.6 KiB, text/plain)

Sergey Shevorakov (sshevorakov) on 2016-05-18

tags:

added: swarm-blocker

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2016-05-19:

After investigation i do not see any problem which can raised by stop deployment. It was stopped just after run, stopped without any problem and deployment run again and processing without any problem from 00:24:42 to 01:27:51 (failed cluster-vrouter).

This test on iso #368 do not failed: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.bvt_ubuntu_bootstrap/111/testReport/(root)/deploy_stop_on_deploying_ubuntu_bootstrap/

So i marked this bug as incomplete.

Just in case i try to reproduce this problem locally (first run such cluster without stop - succeed, and now try to run it as described in test).

Changed in fuel:
status:	New → Incomplete
importance:	Undecided → High
assignee:	nobody → Vladimir Sharshov (vsharshov)
milestone:	none → 9.0

Nastya Urlapova (aurlapova) on 2016-05-19

Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → Fuel QA telco (fuel-qa-telco)

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2016-05-19:

Reassigned to telco folks, because they are responsible for that case, but on 9.0-mos-372 it works https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.bvt_ubuntu_bootstrap/112/testReport/(root)/deploy_stop_on_deploying_ubuntu_bootstrap/

Revision history for this message

Dmitry Kalashnik (dkalashnik) wrote on 2016-05-20:

The next one run is failing with the same error, actually that case with deploy-stop is valid so nothing to do from QA side

Artem Hrechanychenko (agrechanichenko) on 2016-05-20

Changed in fuel:
status:	Incomplete → Confirmed

Dmitry Kalashnik (dkalashnik) on 2016-05-20

Changed in fuel:
assignee:	Fuel QA telco (fuel-qa-telco) → nobody

Revision history for this message

Artem Hrechanychenko (agrechanichenko) wrote on 2016-05-20:

https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.bvt_ubuntu_bootstrap/113/testReport/%28root%29/deploy_stop_on_deploying_ubuntu_bootstrap/deploy_stop_on_deploying_ubuntu_bootstrap/

Changed in fuel:
assignee:	nobody → Vladimir Sharshov (vsharshov)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2016-05-23:

Root cause is missing fuel_pkgs/fuel_pkgs.pp task.

On primary controller tests made stop deployment on fuel_pkgs/setup_repositories.pp and then continued deployment on roles/allocate_hugepages.pp task. So fuel_pkgs/fuel_pkgs.pp task is missing. This is why corosync/pacemaker resources were not able to start:

2016-05-17T00:27:54.732628+00:00 warning: warning: Cannot execute '/usr/lib/ocf/resource.d/fuel/ns_vrouter': No such file or directory (2)
2016-05-17T00:27:54.732628+00:00 err: error: Failed to retrieve meta-data for ocf:fuel:ns_vrouter
2016-05-17T00:27:54.732628+00:00 warning: warning: No metadata found for ns_vrouter::ocf:fuel: Input/output error (-5)
2016-05-17T00:27:54.732628+00:00 err: error: No metadata for fuel::ocf:ns_vrouter
2016-05-17T00:27:54.732628+00:00 err: error: Operation p_vrouter_monitor_0 (node=node-3.test.domain.local, call=6, status=7, cib-update=39, confirmed=true) Not installed

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2016-05-23:

After investigation looks like core reason of missing task fuel_pkgs/fuel_pkgs.pp was in last successful transaction per task which does not evaluate status of deployment tasks per node.

So mark it as duplicate of https://bugs.launchpad.net/fuel/+bug/1581015.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1581015 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.