Cluster provisioning failed with <RuntimeError: Missing a required parameter uids> after stop deployment and re-deploy

Bug #1540360 reported by Andrey Sledzinskiy
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Vladimir Sharshov
8.0.x
Won't Fix
Medium
Fuel Python (Deprecated)
Mitaka
Fix Released
Medium
Vladimir Sharshov

Bug Description

Steps:
1. Create next cluster - Neutron Vlan, default storages, 3 controllers
2. Start deployment
3. Stop deployment when progress of 'deployment' task is more than 10
4. Wait nodes get back to online state
5. Add 2 compute node
6. Click deploy changes

fuel version - 8.0-506 iso

Actual result - provisioning failed with next messages in astute.log:

2016-02-01 04:26:28 DEBUG [771] c8690652-a4a1-4985-9758-924ca5e24805: MC agent 'execute_shell_command', method 'execute', results:
{:sender=>"5",
 :statuscode=>0,
 :statusmsg=>"OK",
 :data=>
  {:stdout=>"",
   :stderr=>
    "Unexpected error\nActual checksum 46e645ebc81cc20bedf495953b93580d mismatches with expected 21a814807db5edd7e6cbcf2b9b52b74b for file /dev/vda3\n",
   :exit_code=>255}}

2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 5
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 4
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 1
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 3
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 2

2016-02-01 04:26:29 ERROR [771] Error occured while provisioning:
#<RuntimeError: Missing a required parameter uids>

2016-02-01 04:26:34 ERROR [771] Retrying RPC client instantiation after exception:
#<RuntimeError: Could not find any hosts in discovery data provided>

2016-02-01 04:26:39 ERROR [771] No more retries for MCollective client instantiation after exception:
["/usr/share/gems/gems/mcollective-client-2.8.4/lib/mcollective/rpc/client.rb:507:in `discover'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:155:in `initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:41:in `initialize'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `new'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `unlock_nodes_discovery'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:58:in `rescue in provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:44:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/orchestrator.rb:123:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:51:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:189:in `dispatch_message'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:123:in `block in perform_main_job'"]

2016-02-01 04:26:39 ERROR [771] Error running provisioning: #<RuntimeError: Could not find any hosts in discovery data provided>
, trace:
["/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:166:in `rescue in initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:149:in `initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:41:in `initialize'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `new'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `unlock_nodes_discovery'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:58:in `rescue in provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:44:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/orchestrator.rb:123:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:51:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:189:in `dispatch_message'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:123:in `block in perform_main_job'"]

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
tags: added: area-python
removed: area-qa
Changed in fuel:
status: New → Confirmed
tags: added: team-bugfix
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

> "Unexpected error\nActual checksum 46e645ebc81cc20bedf495953b93580d mismatches with expected 21a814807db5edd7e6cbcf2b9b52b74b for file /dev/vda3\n",

means that image was corrupted somehow during applying to the disk.

Is it reproducible? Are you sure that env is healthy?

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

2016-02-01 04:25:35.713 31121 DEBUG fuel_agent.manager [-] Trying to re-enable journaling for ext4
2016-02-01 04:25:35.713 31121 DEBUG fuel_agent.utils.utils [-] Trying to execute command: tune2fs -O has_journal /dev/loop1
2016-02-01 04:25:35.716 31121 WARNING fuel_agent.utils.utils [-] Failed to execute command: Unexpected error while running command.
Command: tune2fs -O has_journal /dev/loop1
Exit code: 1
Stdout: 'tune2fs 1.42.9 (28-Dec-2013)\n'
Stderr: "tune2fs: Invalid argument while trying to open /dev/loop1\nCouldn't find valid filesystem superblock.\n"
2016-02-01 04:25:35.717 31121 ERROR fuel_agent.manager [-] Failed to build image: Unexpected error while running command.
Command: tune2fs -O has_journal /dev/loop1
Exit code: 1
Stdout: 'tune2fs 1.42.9 (28-Dec-2013)\n'
Stderr: "tune2fs: Invalid argument while trying to open /dev/loop1\nCouldn't find valid filesystem superblock.\n"

IBP images weren't built properly for some reasons.

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

That's first reproduce, tomorrow I'll check for the second one

Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

@Andrey, if bug will be reproduced, please provide us with snapshot and reopen the bug. Moving status to Incomplete.

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Ok, it hasn't reproduced on today's run

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

@Alexander, snapshot is already here https://bugs.launchpad.net/fuel/+bug/1540360/comments/1

Apart from obvious issues with provisioning, there're strange run-time errors in astute. I think it's a separate issue.

http://paste.openstack.org/show/485714/

I think that defined flow didn't exprect that all nodes failed to be provisioned. Right after that 'reboot_provisioned_nodes' task failed with RuntimeError. No uids as no one node was provisioned.

2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 2
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 3
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 1
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 4
2016-02-01 04:26:28 ERROR [771] c8690652-a4a1-4985-9758-924ca5e24805: Provision command returned non zero exit code on node: 5
2016-02-01 04:26:28 DEBUG [771] Cobbler syncing
2016-02-01 04:26:29 INFO [771] Run hook ---
priority: 100
type: reboot
fail_on_error: false
id: reboot_provisioned_nodes
uids: []
parameters:
  timeout: 240

2016-02-01 04:26:29 ERROR [771] Error occured while provisioning:
#<RuntimeError: Missing a required parameter uids>

We might add a check into astute to skip this hook if all nodes failed provisioning.

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

This error is just result of previous error: "Unexpected error\nActual checksum 46e645ebc81cc20bedf495953b93580d mismatches with expected 21a814807db5edd7e6cbcf2b9b52b74b for file /dev/vda3\n"

So it only affect UX. It is easy to fix and i will prepare fix for 9.0

tags: added: move-to-9.0
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

we don't backport medium bugfixes

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/276377

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Vladimir Sharshov (vsharshov)
status: Incomplete → In Progress
Revision history for this message
Vasily Gorin (vgorin) wrote :

I suppose I've got the same issue with following steps:

Scenario:
1. Create new environment
2. Choose Neutron, Vlan
3. Choose cinder for volumes and ceph for images/ephemeral/objects
4. Add 3 controllers
5. Add 3 cinder+ceph
6. Add 1 Compute
7. Verify networks
8. Start deployment
9. Stop on provisioning
10. Wait nodes get back to online state
11. Click deploy changes

Revision history for this message
Vasily Gorin (vgorin) wrote :
Download full text (4.8 KiB)

Log for the comment above:

2016-02-08 18:21:05 ERROR [887] Error running provisioning: #<RuntimeError: Could not find any hosts in discovery data provided>
, trace:
["/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:166:in `rescue in initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:149:in `initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:41:in `initialize'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `new'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `unlock_nodes_discovery'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:58:in `rescue in provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:44:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/orchestrator.rb:123:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:51:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:189:in `dispatch_message'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `each_with_index'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:144:in `dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:123:in `block in perform_main_job'"]
2016-02-08 18:21:04 ERROR [887] No more retries for MCollective client instantiation after exception:
["/usr/share/gems/gems/mcollective-client-2.8.4/lib/mcollective/rpc/client.rb:507:in `discover'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:155:in `initialize_mclient'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/mclient.rb:41:in `initialize'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `new'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:410:in `unlock_nodes_discovery'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:58:in `rescue in provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/provision.rb:44:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/orchestrator.rb:123:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:51:in `provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:189:in `dispatch_message'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/server.rb:146:in `block in dispatch'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/share/gems/gems/astute-8.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/share/gems/gems/astut...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/276377
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=e07e74eb5980421b47fbc64b6d6f50a955e7cad1
Submitter: Jenkins
Branch: master

commit e07e74eb5980421b47fbc64b6d6f50a955e7cad1
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Thu Feb 4 20:03:44 2016 +0300

    Prevent unexpected exception if provision fail

    Two problem fixed:

    - do not fail if no nodes were sent to reboot;
    - in case of exception unlock nodes by uids instead of names

    Change-Id: Iafe9c78e7a92c9f410c9452748b242a6fe8140e4
    Closes-Bug: #1540360

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Noam Angel (noama) wrote :

I get same issue with 8.0 after provisioning. nodes are up and can be pinged and ssh but fuel report them as offline. This critical issue stuck our progress with plugin.

is there a workaround it?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/322770

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

I've moved issue to "Fix Released" for 9.0, because SWARM on ISO #495 shows 97% and there wasn't issue with provisioning like this.

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (stable/8.0)

Reviewed: https://review.openstack.org/322770
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=17ddf0ecac92475287266179828a6cc03967c876
Submitter: Jenkins
Branch: stable/8.0

commit 17ddf0ecac92475287266179828a6cc03967c876
Author: Michael Polenchuk <email address hidden>
Date: Mon May 30 13:58:40 2016 +0300

    Prevent unexpected exception if provision fail

    Squashed commits from the 9.0:
    - 79f99adf48de37d33b5e089472f91b2f7e614e55
      - fault tolerance for uploading errors
      - use upload file task instead of magnet directly
    - e07e74eb5980421b47fbc64b6d6f50a955e7cad1
      - do not fail if no nodes were sent to reboot

    Change-Id: I5b806f3d1411c4445a58b899b73eca035f5931b9
    Closes-Bug: #1546604
    Related-Bug: #1540360

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.