Baremetal server provisioning fails: 'Failed to remove physical volume "/dev/sda4" from volume group "docker"' (will be fixed in Ubuntu 16.04)

Bug #1557972 reported by Artem Panchenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Sustaining
Mitaka
Won't Fix
Medium
Dmitry Guryanov

Bug Description

Fuel version info (9.0 liberty): http://paste.openstack.org/show/490653/

Environment deployment fails on baremetal servers, because one of slave nodes can't be provisioned:

2016-03-16 09:08:06 DEBUG [972] 4f67c00a-a262-43f7-a9e6-2d5b62e59f34: MC agent 'execute_shell_command', method 'execute', results:
{:sender=>"3",
 :statuscode=>0,
 :statusmsg=>"OK",
 :data=>
  {:stdout=>"",
   :stderr=>
    "Unexpected error\nUnexpected error while running command.\nCommand: vgremove -f docker\nExit code: 5\nStdout: ''\nStderr: 'File descriptor 3 (/run/lock/provision.lock) l
eaked on vgremove invocation. Parent PID 13459: /usr/bin/python2.7\\nFile descriptor 4 (/var/log/fuel-agent.log) leaked on vgremove invocation. Parent PID 13459: /usr/bin/pyt
hon2.7\\n Assertion failed: can\\'t _pv_write non-orphan PV (in VG #orphans_lvm2)\\n Failed to remove physical volume \"/dev/sda4\" from volume group \"docker\"\\n Volume
group \"docker\" not properly removed\\n'\n",
   :exit_code=>255}}
...
2016-03-16 09:08:06 ERROR [972] 4f67c00a-a262-43f7-a9e6-2d5b62e59f34: Provision command returned non zero exit code on node: 3
...
2016-03-16 09:11:55 DEBUG [972] Aborting provision. To many nodes failed: ["3"]

Step to reproduce:

1. Create cluster
2. Add baremetal nodes. Use servers which previously had Fuel master node (8.0 with docker) installed and their drives aren't wiped.
3. Deploy changes.

Expected result: cluster is deployed

Actual result: deployment fails

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Dmitry Pyzhov (dpyzhov)
tags: added: feature-image-based
Changed in fuel:
status: New → Confirmed
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Perhaps, it's a bug in lvm2 somewhere.

According to the logs, the order of removal is fine and proper. I have no idea how it could be 'not properly removed'.

There was few logical volumes:

Found logical volumes: [{'path': '/dev/docker/docker-pool', 'uuid': 'bWtwfM-G6cH-YiVl-J2sB-UkxO-csZD-FfRaGR', 'vg': 'docker', 'name': 'docker-pool', 'size': 9588}, {'path': '/dev/os/root', 'uuid': 'Il94nB-3dZB-uGqa-64cG-2Xw0-QQ2r-Cy1tVV', 'vg': 'os', 'name': 'root', 'size': 10000}, {'path': '/dev/os/swap', 'uuid': 'juhGUi-jg27-oqZw-mO8n-ueaz-j2zY-LcahMp', 'vg': 'os', 'name': 'swap', 'size': 16064}, {'path': '/dev/os/var', 'uuid': 'GeoTnJ-afXF-00ZR-flJC-3d5c-uxSB-usUgWZ', 'vg': 'os', 'name': 'var', 'size': 362920}, {'path': '/dev/os/varlog', 'uuid': 'PvnaeU-Gq5o-8cKP-4zSh-Kett-2Ezb-dksDoG', 'vg': 'os', 'name': 'varlog', 'size': 544376}]

They was removed one by one.

The last volume which was removed was os/varlog

  Found logical volumes: [{'path': '/dev/os/varlog', 'uuid': 'PvnaeU-Gq5o-8cKP-4zSh-Kett-2Ezb-dksDoG', 'vg': 'os', 'name': 'varlog', 'size': 544376}]
  [-] Trying to execute command: lvremove -f /dev/os/varlog

lvremove returned 0.

Once all logical volumes were removed, then fuel-agent started to remove volume group.

  Found volume groups: [{'size': 19996, 'name': 'docker', 'free': 19976, 'uuid': 'bjKXRo-dy2D-Skgv-Wrlx-bfEt-M2jX-mNldQ6'}, {'size': 933444, 'name': 'os', 'free': 933444, 'uuid': 'hq0V19-RtYh-UkjR-C2e5-gGXx-Lb43-cJft1p'}]

  Trying to execute command: vgremove -f docker

But, vgremove threw an error:

2016-03-16T09:07:10.214044+00:00 info: Command: vgremove -f docker
2016-03-16T09:07:10.214044+00:00 info: Exit code: 5
2016-03-16T09:07:10.214055+00:00 info: Stdout: ''
2016-03-16T09:07:10.214055+00:00 info: Stderr: 'File descriptor 3 (/run/lock/provision.lock) leaked on vgremove invocation. Parent PID 13459: /usr/bin/python2.7\nFile descriptor 4 (/var/log/fuel-agent.log) leaked on vgremove invocation. Parent PID 13459: /usr/bin/python2.7\n Assertion failed: can\'t _pv_write non-orphan PV (in VG #orphans_lvm2)\n Failed to remove physical volume "/dev/sda4" from volume group "docker"\n Volume group "docker" not properly removed\n

So, the bug doesn't seem to be related with python code itself.

Looks like fuel-agent stepped into the issue of lvm2.

Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Anil Shashikumar Belur (askb23) wrote : Re: Baremetal server provisioning fails: 'Failed to remove physical volume "/dev/sda4" from volume group "docker"'

IMU, the error message "leaked on vgremove invocation" is seen in cases when the physical volume is already removed (and the corresponding device nodes), while vgdisplay/vgremove has failed to read the vg info, when the physical device is unavailable.

One possible work around to this is to try and remove the `vg` using `dmsetup remove /dev/<vg>` or `echo 1 > /sys/block/sda/device/delete` and in some cases would require a reboot.

Revision history for this message
Anil Shashikumar Belur (askb23) wrote :

Also for addressing "leaked invocation" warning messages, these message are seen (if the file discriptors are not one of 0,1,2), from `man lvm`:

"
On invocation, lvm requires that only the standard file descriptors stdin,
stdout and stderr are available. If others are found, they get closed and
messages are issued warning about the leak.

"
I would recommend trying to set `export LVM_SUPPRESS_FD_WARNINGS=1` as a workaround to suppress these warnings.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dmitry Guryanov (dguryanov)
Changed in fuel:
milestone: 9.0 → 10.0
Revision history for this message
Dmitry Guryanov (dguryanov) wrote :

It's a bug in lvm, which is fixed in new versions (at least lvm2 in ubuntu 16.04 and in fuel's master node's centos works fine). So I'd suggest to close this bug.

Workaround - redeploy after fail, because volume group will be removed despite vgremove returns non-zero.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → MOS Linux (mos-linux)
tags: added: area-mos
removed: area-python feature-image-based
Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

It looks like rebuilding lvm2 (and supporting during 9.0 lifecicle) costs too much in comparison with applying a workaround. Anyway, we'll find a bug to Ubuntu, just in case.

Changed in fuel:
assignee: MOS Linux (mos-linux) → Dmitry Guryanov (dguryanov)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Setting medium priority because this is a rare case and it has simple manual workaround (restart of deployment). This bug will disappear with upgrade to Ubuntu 16.04 in 10.0 release.

Changed in fuel:
importance: High → Medium
assignee: Dmitry Guryanov (dguryanov) → Fuel Sustaining (fuel-sustaining-team)
summary: Baremetal server provisioning fails: 'Failed to remove physical volume
- "/dev/sda4" from volume group "docker"'
+ "/dev/sda4" from volume group "docker"' (will be fixed in Ubuntu 16.04)
tags: added: area-python feature-image-based
removed: area-mos
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.