Fuel for OpenStack

[Image Based] Cloud init reset all node files after delete cluster and deploy another one on the same nodes

Bug #1394599 reported by Andrey Sledzinskiy on 2014-11-20

This bug report is a duplicate of: Bug #1407634: [Ubuntu 14.04] astute fails to detect that the image based provisioning completed successfully. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Confirmed	Medium	Alexander Gordeev	Fuel for OpenStack 6.1

Bug Description

{

    "build_id": "2014-11-17_17-53-34",
    "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068",
    "build_number": "504",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "8d23d1b1bcd9213a70a40c38c3c1486d215d40b5",
    "production": "docker",
    "fuelmain_sha": "8d4943d5ead7a894d4af5e10172510fa60eeed84",
    "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.0",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2014-11-17_17-53-34",
                "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068",
                "build_number": "504",
                "api": "1.0",
                "nailgun_sha": "8d23d1b1bcd9213a70a40c38c3c1486d215d40b5",
                "production": "docker",
                "fuelmain_sha": "8d4943d5ead7a894d4af5e10172510fa60eeed84",
                "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.0",
                "fuellib_sha": "8a0ceff90777af75a3f9363a57185e608f3ee10d"
            }
        }
    },
    "fuellib_sha": "8a0ceff90777af75a3f9363a57185e608f3ee10d"

}

Steps:
1. Create and deploy next cluster - Ubuntu, HA, Neutron GRE, Image-based provisioning, 3 controller, 2 compute, 1 cinder node
2. After deployment delete cluster
3. Create new cluster - CentOS, HA, Neutron Vlan, Image-based provisioning, 3 controller, 2 compute nodes
4. Provision cluster

Expected - all nodes were successfully provisioned
Actual - 1 time out of 4 one of the nodes is provisioned but after node's restart and start up of cloud init it destroyed all node's files

Logs are attached

Tags:

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-11-20:

fuel-snapshot-2014-11-20_13-44-40.tgz Edit (10.7 MiB, application/x-tar)

Alexander Gordeev (a-gordeev) on 2014-11-20

tags:	added: experimental
Changed in fuel:
status:	New → Confirmed

Alexander Gordeev (a-gordeev) on 2014-11-21

tags:

added: cloud-init

Dmitry Pyzhov (dpyzhov) on 2014-11-25

Changed in fuel:
milestone:	6.0 → 6.1

Revision history for this message

Alexander Gordeev (a-gordeev) wrote on 2014-12-04:

The root cause is still unknown. Stable and repeatable way of how to reproduce it even.

At first i thought it was a failure inside of the boothook script. Nope, the boothook script worked fine every time i'd tried. https://review.openstack.org/#/c/138384/ <- patch for boothook scripts.

It might be cloud-init's semaphores issues. For unknown reason cloud-init log was full of messages showing that all config_* modules have been already run. The executor simply checks the semaphore and skips if it exists.

They stored in /var/lib/cloud/instance/sem/config_*

I have only one strategy to follow. We need to disable automatic cloud-init start on boot (just removing links from /etc/rc.d/* should help) and then start cloud-init by hand under `strace` or other hardcore debug stuff and watch what will happen.

Sounds as very time consuming task.

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-12-04:

Also last time this issue has been reproduced mainly on CentOS clusters sporadically.
On our CI it fails in 1 timeout of 4

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-12-12:

Reproduced on ISO #49 for 6.0

"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "auth_required": true, "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}}}, "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}

1. Create new environment (Ubuntu, HA mode)
2. Choose nova-network, flat
3. Choose both Ceph
4. Add 3 controllers, 2 computes, 2 ceph
5. Choose Image Based provisioning
6. Start deployment
7. One of nodes hangs during provisioning, other nodes provisioned successfully

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-12-12:

fuel-snapshot-2014-12-12_09-05-04.tgz Edit (45.1 MiB, application/x-tar)

Andrey Sledzinskiy (asledzinskiy) on 2014-12-17

tags:

added: release-notes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-13: Related fix proposed to fuel-astute (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/146776

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-13: Related fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/146776
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=2e9d2733a2ddd1ea3ff583d0a9f81792e2569dba
Submitter: Jenkins
Branch: master

commit 2e9d2733a2ddd1ea3ff583d0a9f81792e2569dba
Author: Alexei Sheplyakov <email address hidden>
Date: Tue Jan 13 08:55:28 2015 +0300

Fix rebooting of the bootstrap nodes

    Skip the hard reboot for the image based provisioning since the reboot
    command might hit a node which has booted into the provisioned OS (which
    causes the filesystem corruption and interrupts the deployment).
    Fix the condition which selects the bootstrap nodes, that is, use
    SshHardReboot instead of SshRebootNotProvisioning (the latter reboots
    the locally booted nodes instead the bootstrap ones due to the inverted
    condition).

    Related-bug: #1394599
    Related-bug: #1407634
    Change-Id: Ie4af6904a8297d9acbc4e96425903e9e57450286

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-14: Related fix proposed to fuel-astute (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/147223

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-14: Related fix merged to fuel-astute (stable/6.0)

Reviewed: https://review.openstack.org/147223
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
Submitter: Jenkins
Branch: stable/6.0

commit f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
Author: Alexei Sheplyakov <email address hidden>
Date: Tue Jan 13 08:55:28 2015 +0300

Fix rebooting of the bootstrap nodes

    Related-bug: #1394599
    Related-bug: #1407634
    Change-Id: Ie4af6904a8297d9acbc4e96425903e9e57450286

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1407634 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.