[10.0.main.ubuntu.smoke_neutron][1047] Provisioning failed on node-2

Bug #1652002 reported by Ivan Udovichenko on 2016-12-22
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Undecided
Georgy Kibardin
Newton
Undecided
Georgy Kibardin
Ocata
Undecided
Georgy Kibardin

Bug Description

Smoke neutron test failed [1] with error:
"AssertionError: Task 'deploy' has incorrect status. error != ready, 'Provision has failed. Too many nodes failed to provision'"

Due to the fact, that env is not available anymore. There is no obvious way to check why provisioning failed. Logs from astute [2].

According to logs from node-3 (node-3-10.109.15.6/var/log/cloud-init.log) [3] mcollective service failed to restart.

[1] https://product-ci.infra.mirantis.net/job/10.0.main.ubuntu.smoke_neutron/1047/console
[2] http://paste.openstack.org/show/593142/
[3] http://paste.openstack.org/show/593120/

description: updated
summary: - [10.0.main.ubuntu.smoke_neutron][1047] Memcached service failed to
+ [10.0.main.ubuntu.smoke_neutron][1047] mcollective service failed to
restart
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → Medium
status: New → Confirmed
description: updated
description: updated
summary: - [10.0.main.ubuntu.smoke_neutron][1047] mcollective service failed to
- restart
+ [10.0.main.ubuntu.smoke_neutron][1047] Provisioning failed on node-2
description: updated
description: updated
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Changed in fuel:
importance: Medium → Critical
Changed in fuel:
status: Confirmed → In Progress
Georgy Kibardin (gkibardin) wrote :

Something was wrong with one of the nodes from the very beginning. There is no logs from it and, finally, it failed to restart after the provisioning.
Need an env to revert - waiting for reproduction.

Changed in fuel:
status: In Progress → Incomplete
Changed in fuel:
status: Confirmed → In Progress
Roman Rufanov (rrufanov) on 2017-02-02
Changed in fuel:
milestone: 10.x-updates → 10.1
Georgy Kibardin (gkibardin) wrote :

The reason is that sometimes cloudinit fails completely on a node. It happens the following way:
1. Cloudinit creates a folder in tmp (using mkdtemp)
2. Cloudinit mounts config drive image into it
3. Cloudinit reads the configuration
4. Cloudinit unmounts the folder
5. Cloud init fails to delete the folder because there is no such folder anymore !!!

Michael Dovgal (mdovgal) wrote :

Looks like one more job was failed due to this problem. Logs are still available
https://product-ci.infra.mirantis.net/view/10.0/job/10.0.main.ubuntu.smoke_neutron/1280/

Fix proposed to branch: master
Review: https://review.openstack.org/435901

Change abandoned by Georgy Kibardin (<email address hidden>) on branch: master
Review: https://review.openstack.org/435035
Reason: We've decided this functionality must stay. Using of separate configuration partition is going to be controlled by a new flag we introduce later.

Change abandoned by Georgy Kibardin (<email address hidden>) on branch: master
Review: https://review.openstack.org/435910
Reason: Wrong place to pass the option.

Reviewed: https://review.openstack.org/435901
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=b9842ce714f9369f4881fcefd1e96f0e458d3644
Submitter: Jenkins
Branch: master

commit b9842ce714f9369f4881fcefd1e96f0e458d3644
Author: Georgy Kibardin <email address hidden>
Date: Mon Feb 20 12:29:05 2017 +0300

    Do not use separate partition for cloudinit configuration

    In our usecases the separate partition is not needed. It is enough just
    to put cloudinit configuration into the root filesystem.
    This also allows to avoid a race condition which sometimes happens: some
    process deletes the folder in tmp where the configuration partition is
    mounted resulting in cloudinit failure to read its configuration.

    Change-Id: Ib3efb4f517a5cf86dbf91ee53ac00108d4624dcd
    Closes-Bug: #1652002

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/436416
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=739326df02e0fae2fd17fe59890fc381d5adf1a3
Submitter: Jenkins
Branch: stable/newton

commit 739326df02e0fae2fd17fe59890fc381d5adf1a3
Author: Georgy Kibardin <email address hidden>
Date: Mon Feb 20 12:29:05 2017 +0300

    Do not use separate partition for cloudinit configuration

    In our usecases the separate partition is not needed. It is enough just
    to put cloudinit configuration into the root filesystem.
    This also allows to avoid a race condition which sometimes happens: some
    process deletes the folder in tmp where the configuration partition is
    mounted resulting in cloudinit failure to read its configuration.

    Change-Id: Ib3efb4f517a5cf86dbf91ee53ac00108d4624dcd
    Closes-Bug: #1652002
    (cherry picked from commit b9842ce714f9369f4881fcefd1e96f0e458d3644)

tags: added: in-stable-newton

This issue was fixed in the openstack/fuel-agent 11.0.0.0rc1 release candidate.

Nastya Urlapova (aurlapova) wrote :

The new failure on iso 10.0 1455,
scenario:
            1. Check mcollective version on bootstrap
            2. Create cluster
            3. Add one node to cluster
            4. Provision nodes
            5. Check mcollective version on node

Nastya Urlapova (aurlapova) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Georgy Kibardin (gkibardin) wrote :

Nastya this failure is different, please create another bug.

2017-03-12T22:40:59.196984+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent [-] Unexpected error while running command.
2017-03-12T22:40:59.197190+00:00 info: Command: resize2fs /dev/vda3
2017-03-12T22:40:59.197408+00:00 info: Exit code: 1
2017-03-12T22:40:59.197637+00:00 info: Stdout: ''
2017-03-12T22:40:59.197854+00:00 info: Stderr: "resize2fs 1.42.13 (17-May-2015)\nPlease run 'e2fsck -f /dev/vda3' first.\n\n"
2017-03-12T22:40:59.198072+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent Traceback (most recent call last):
2017-03-12T22:40:59.198295+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/cmd/agent.py", line 144, in main
2017-03-12T22:40:59.198518+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent getattr(mgr, action)()
2017-03-12T22:40:59.198763+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/manager.py", line 1000, in do_provisioning
2017-03-12T22:40:59.198957+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent self.do_copyimage()
2017-03-12T22:40:59.199177+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/manager.py", line 514, in do_copyimage
2017-03-12T22:40:59.199437+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent fu.extend_fs(image.format, image.target_device)
2017-03-12T22:40:59.199640+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/utils/fs.py", line 83, in extend_fs
2017-03-12T22:40:59.199876+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent utils.execute('resize2fs', fs_dev, check_exit_code=[0])
2017-03-12T22:40:59.200106+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/utils/utils.py", line 140, in execute
2017-03-12T22:40:59.200304+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent stderr=stderr, cmd=command)
2017-03-12T22:40:59.200535+00:00 info: 2017-03-12 21:50:02.119 4370 ERROR fuel_agent.cmd.agent ProcessExecutionError: Unexpected error while running command.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers