Nodes with customized disk partitioning reset settings to default when rebooted

Bug #1432656 reported by Fabrizio Soppelsa on 2015-03-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Aleksey Kasatkin

Bug Description

Bug report from community (IRC)

Fuel 5.1.1

How to reproduce:
* Setup a HA cluster and add nodes
* Customize disk partitioning on a controller or on a storage node (first disk)
* Deploy
* Reboot the node with customized disk partitioning

Result: The node comes back as undeployed, with default disk partitioning.

Ryan Moe (rmoe) wrote :

Can we get a diagnostic snapshot for this? It could also be related to this bug in some way https://bugs.launchpad.net/fuel/+bug/1423328

Changed in fuel:
status: New → Incomplete
devstok (cdevstok) wrote :

no it's different. It happens when a controller or other machine reboots

Dmitry Pyzhov (dpyzhov) on 2015-04-02
tags: added: module-volumes
tags: added: feature-hardware-change
Dmitry Pyzhov (dpyzhov) wrote :

Could you attach diagnostic snapshot?

Changed in fuel:
milestone: 5.1.2 → 6.1
assignee: nobody → Fuel Python Team (fuel-python)
Aleksey Kasatkin (alekseyk-ru) wrote :

Is the node shown in UI as undeployed or it became broken (or bootstrapped) and environment lost it actually?

Dmitry Pyzhov (dpyzhov) on 2015-04-02
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Aleksey Kasatkin (alekseyk-ru)
Dmitry Pyzhov (dpyzhov) on 2015-04-06
Changed in fuel:
importance: Undecided → High
Anastasia Palkina (apalkina) wrote :

Cannot reproduce on ISO #304

"build_id": "2015-04-10_22-54-31", "ostf_sha": "c2a76a60ec4ebbd78e508216c2e12787bf25e423", "build_number": "304", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-04-10_22-54-31", "ostf_sha": "c2a76a60ec4ebbd78e508216c2e12787bf25e423", "build_number": "304", "api": "1.0", "nailgun_sha": "69547a71abb4696df7e6f44b1f7864b0535f2df7", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "9208ff4a08dcb674ce2df132399a5aa3ddfac21c", "astute_sha": "d96a80b63198a578b2c159edbd76048819039eb0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "8daac234aea6ac0a98f27871deec039f74f6fdab", "fuellib_sha": "867028fe78837dc2e4635a2cbb976782856964d0"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "69547a71abb4696df7e6f44b1f7864b0535f2df7", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "9208ff4a08dcb674ce2df132399a5aa3ddfac21c", "astute_sha": "d96a80b63198a578b2c159edbd76048819039eb0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "8daac234aea6ac0a98f27871deec039f74f6fdab", "fuellib_sha": "867028fe78837dc2e4635a2cbb976782856964d0"

I have successful deployment for 3 controllers, 1 compute, 1 cinder
I changed Cinder volume for sda. After deployment I power off this node and power on. And nothing changed: node is ready and disk configuration doesn't changed.

Aleksey Kasatkin (alekseyk-ru) wrote :

Let's close it then. Nastya couldn't reproduce it because https://bugs.launchpad.net/fuel/+bug/1387028 was fixed. Disk configuration should not change after environment is deployed due to that fix.

Changed in fuel:
status: Incomplete → Fix Committed
devstok (cdevstok) wrote :

should not change after environment is deployed ... but the web interface ask for deploy the disk changes in the controller.

I have 3 controller (ceph installed on) and 2 computes . I customized the system disk quota of the controllers and I rebooted the controllers machine.
When I try to deploy a new compute, The web interface asks me to deploy changes to the controller's disk.
Look into the controller and I see that fuel hasn't kept the customized configuration of disks but shows the default configuration quota.

Probably the changes will not deployed but the fuel web interface ask for.

Aleksey Kasatkin (alekseyk-ru) wrote :

Devstok, please provide your fuel version info.

Could you also add diagnostic snapshot please.

Miroslav Anashkin (manashkin) wrote :

There are machines, equipped with several network adapters.
Such machines may be configured to boot the following way.
1. List of network adapters. Not the single but list to try
2. HDD.
So, it is possible one configured node to boot from the list of NICs.

At first boot machine PXE boots from the first NIC in the list and get registered on master node.
At the deployment Cobbler orders to boot this machine to OS distro - by PXE one more time.
Then provisions new system and deploys OpenStack.
Then - reboots the machine. Not sure how it handles boot from the second NIC this time, probably system is locked during deployment.
Then, after node is deployed, Cobbler or Nailgun locks this system another way.

So, next time the node is restarted, it asks PXE boot with first NIC - gets NO from master node. Then, it asks PXE boot with second NIC and somehow master node does not recognize this node as already deployed and PXE boots it with another NIC.

We need to check this scenario.

Miroslav Anashkin (manashkin) wrote :

devstok,

Web UI asks not the best way - it mixes all changes in the single list.

Actually, when it asks to make changes to disks Fuel may mean only to change /etc/hosts on already deployed nodes in the cluster in order to add reference to the new node.

Aleksey Kasatkin (alekseyk-ru) wrote :

Miroslav, Fuel reports disks changes when some HW info is changed. E.g. some drive is added, removed, or reconfigured, or their order is changed. There were false reports on disks changes but after the fix mentioned in https://bugs.launchpad.net/fuel/+bug/1432656/comments/6 they should disappear.

I'm not sure what was the real issue this ticket is about. Question on that is not answered yet (https://bugs.launchpad.net/fuel/+bug/1432656/comments/4).

Aleksey Kasatkin (alekseyk-ru) wrote :

Miroslav, in your scenario: did you mean exchanging of NICs or reconnecting them to make slave to boot from other NIC ?

Miroslav Anashkin (manashkin) wrote :

No, I meant the default Cobbler and PXE behavior.
Actually, it may be not a bug but a node NIC misconfiguration.

If Cobbler has no netboot flag for the system, as it has for all already deployed nodes - it simply rejects this node and does not boot it via PXE.
Normally, such node should boot from the next boot device, a HDD.
If there is more than single NIC configured in BIOS as boot devices and this NIC is also connected to the Admin network, for instance, by mistake, then node tries to boot from the second NIC first.
Since this NIC has another MAC address - Cobbler boot this node to the bootstrap as new.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers