massive qemu-nbd instances in Ubuntu after instances' spawning

Bug #1250231 reported by Vladimir Kuklin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Tatyanka

Bug Description

deploy ubuntu controller+cinder, compute+cinder, run several ostf tests (some of them fail due to wrong libvirt_type configuration).

spawn an instance, see that la is ~28

open ps aux and see:

  443 root 20 0 288m 2024 916 R 3.8 0.2 2:02.31 /usr/bin/qemu-nbd -c /dev/nbd13 /var/lib/nova/instances/ac8e1f1e-d7d6-4ec3-aeb8-1140c43e4800/disk
 1809 root 20 0 288m 2024 916 R 3.8 0.2 1:34.46 /usr/bin/qemu-nbd -c /dev/nbd0 /var/lib/nova/instances/c7ae23bf-e09d-47bc-924f-ca12f90b78e0/disk
 1953 root 20 0 288m 2028 916 R 3.8 0.2 1:32.33 /usr/bin/qemu-nbd -c /dev/nbd2 /var/lib/nova/instances/bb431949-c928-4c9e-a0a5-c668e9e2de8c/disk
 3017 root 20 0 288m 2028 916 R 3.8 0.2 1:16.36 /usr/bin/qemu-nbd -c /dev/nbd9 /var/lib/nova/instances/be259c39-80ac-48f9-93ec-208a85741752/disk
 3745 root 20 0 288m 2024 916 R 3.8 0.2 1:05.76 /usr/bin/qemu-nbd -c /dev/nbd2 /var/lib/nova/instances/7eb02eaf-5308-4ebf-be1b-f514a44aef22/disk
 3969 root 20 0 288m 2028 916 R 3.8 0.2 1:03.25 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/6f94e106-6e05-42aa-827b-21b8bf918e55/disk
 4095 root 20 0 288m 2024 916 R 3.8 0.2 1:01.97 /usr/bin/qemu-nbd -c /dev/nbd14 /var/lib/nova/instances/80af15d2-6a02-4fed-a363-8d3779c4a1b7/disk
 4344 root 20 0 288m 2024 916 R 3.8 0.2 1:00.79 /usr/bin/qemu-nbd -c /dev/nbd7 /var/lib/nova/instances/1621e0ad-8824-47cb-b7fd-a8652a2af5ac/disk
 4618 root 20 0 288m 2024 916 R 3.8 0.2 0:59.10 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/78560a59-ceb3-4ee9-bf71-86ba5d98396d/disk
 6034 root 20 0 288m 2028 916 R 3.8 0.2 0:49.57 /usr/bin/qemu-nbd -c /dev/nbd9 /var/lib/nova/instances/a83ddcd6-fc0c-4509-8e2e-08b30eee7b5c/disk
 6861 root 20 0 288m 2028 916 R 3.8 0.2 0:46.27 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/588768aa-1a59-447f-9061-b6cbbf98c4fd/disk
 7216 root 20 0 288m 2028 916 R 3.8 0.2 0:44.91 /usr/bin/qemu-nbd -c /dev/nbd9 /var/lib/nova/instances/7aed9122-974f-40f7-bce1-dfb2fcfad798/disk
 7726 root 20 0 288m 2028 916 R 3.8 0.2 0:43.25 /usr/bin/qemu-nbd -c /dev/nbd6 /var/lib/nova/instances/3f476c3a-a738-422b-b720-4636cb951e8f/disk
 8258 root 20 0 288m 2032 916 R 3.8 0.2 0:40.67 /usr/bin/qemu-nbd -c /dev/nbd11 /var/lib/nova/instances/054ebc16-8f97-4d43-9521-1901666b5da8/disk
 9344 root 20 0 288m 2024 916 R 3.8 0.2 0:35.82 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/8e7aa37b-1773-42dd-a514-2d287a2b85e8/disk
12398 libvirt- 20 0 1666m 239m 10m R 3.8 24.1 0:12.05 /usr/bin/qemu-system-x86_64 -name instance-0000001b -S -M pc-i440fx-1.5 -cpu kvm64,+lahf_lm,+hypervisor,+popcnt
32115 root 20 0 288m 2028 916 R 3.8 0.2 2:25.34 /usr/bin/qemu-nbd -c /dev/nbd4 /var/lib/nova/instances/fafaa0d6-92b7-4d9b-8b90-029a71d596f7/disk
32183 root 20 0 288m 2024 916 R 3.8 0.2 2:20.59 /usr/bin/qemu-nbd -c /dev/nbd5 /var/lib/nova/instances/8400aefa-77b4-4648-aa2d-16f62873a44b/disk
32497 root 20 0 288m 2028 916 R 3.8 0.2 2:15.06 /usr/bin/qemu-nbd -c /dev/nbd12 /var/lib/nova/instances/16b668c2-d615-443e-a08e-fc9d10fd3b0e/disk
32754 root 20 0 288m 2032 916 R 3.8 0.2 2:10.32 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/f6794712-669a-495a-af18-cbe597e9901c/disk
 1483 root 20 0 288m 2016 916 R 3.1 0.2 1:36.99 /usr/bin/qemu-nbd -c /dev/nbd7 /var/lib/nova/instances/e2a06a7a-51d1-47e0-a8d2-dd210d37a8ea/disk
 2033 root 20 0 288m 2028 916 R 3.1 0.2 1:30.39 /usr/bin/qemu-nbd -c /dev/nbd6 /var/lib/nova/instances/05a8f909-9167-4015-9a26-7a5c3b7cc56a/disk
 3273 root 20 0 288m 2024 916 R 3.1 0.2 1:12.89 /usr/bin/qemu-nbd -c /dev/nbd10 /var/lib/nova/instances/3dc5f5d3-ecb2-4cdf-8572-cb8bd155730e/disk
 4810 root 20 0 288m 2024 916 R 3.1 0.2 0:57.83 /usr/bin/qemu-nbd -c /dev/nbd8 /var/lib/nova/instances/12305e77-afea-411f-ade6-8616933ba471/disk
12002 root 20 0 288m 2024 916 R 3.1 0.2 0:12.68 /usr/bin/qemu-nbd -c /dev/nbd12 /var/lib/nova/instances/b827ad4b-f6be-4b46-a175-2a2a913c286d/disk
31912 root 20 0 288m 2012 908 R 3.1 0.2 2:38.13 /usr/bin/qemu-nbd -c /dev/nbd11 /var/lib/nova/instances/47a5eb02-d64f-4c77-80c1-7d2d2416319b/disk
32373 root 20 0 288m 2024 916 R 3.1 0.2 2:17.45 /usr/bin/qemu-nbd -c /dev/nbd4 /var/lib/nova/instances/d39d71a0-1bd2-4358-b41f-b7f1e78f7f51/disk

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

{u'release': u'4.0', u'nailgun_sha': u'cacc3c1db665db4c9b02b19479678dba2299a71d', u'ostf_sha': u'0c72c99fadf6408312bea0385e5a2042b0fcc751', u'astute_sha': u'df6ddea3abc93fbe1cab9b4534d4d5e9508c95d6', u'fuellib_sha': u'f77672d70516795995522d5689ec67546615c4a2'}

Changed in fuel:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Tatyana (tatyana-leontovich)
milestone: none → 3.2.1
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

1. Could you provide logs
2. Are instances deleted in DB - I am not sure if ostf can handle libvirt_type configuration. Also remember that we use just soft deletion

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

yes. nodes get deleted. looks like these volumes are also get created for vms in error state.
logs are absent. sorry

Changed in fuel:
assignee: Tatyana (tatyana-leontovich) → nobody
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 3.2.1 → 4.0
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

No solutions here as well, thus not Triaged...

Changed in fuel:
status: Triaged → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please reproduce it and provide a logs

Revision history for this message
Mike Scherbakov (mihgen) wrote :

I was able to reproduce on Ubuntu Neutron + VLAN - see details in another ticket: https://bugs.launchpad.net/fuel/+bug/1261832.

Steps to reproduce - simply run Health Check (OSTF).

Changed in fuel:
status: Incomplete → Confirmed
importance: Medium → High
assignee: nobody → Andrey Korolyov (xdeller)
Revision history for this message
Mike Scherbakov (mihgen) wrote :
Revision history for this message
Mike Scherbakov (mihgen) wrote :

In my setup, qemu-nbd processes eat all available CPU.

Revision history for this message
Yuriy Yekovenko (yyekovenko) wrote :

To get qemu-nbd processes (which overload the system) started on compute node, you could run "Check stack autoscaling" test on Health Check tab. This test creates a stack with autoscaling: at first, one instance of Fedora-17 is launched, then the 2nd one is expected to be launched in result of scaling up. The test fails on Ubuntu due to qemu-nbd processes overload CPU.
Just in case, I've attached the template that is used in the test.

summary: - massive qemu-nbd instances after OSTF tests
+ тебеmassive qemu-nbd instances in Ubuntu after instances' spawning
summary: - тебеmassive qemu-nbd instances in Ubuntu after instances' spawning
+ massive qemu-nbd instances in Ubuntu after instances' spawning
Revision history for this message
Andrey Korolyov (xdeller) wrote :

Clearly is a race, not reproducible for me.

1. Reproduce on Centos please
2. Reproduce in a CPU-saturated environment.

Since we`ve using different qemu versions in Ubuntu and Centos due to upstream decisions (1.2 in Centos relates to the generic package almost the same as 1.5 in Ubuntu) and we`re not able to use upstream qemu in Ubuntu due to known and not-yet-fixed librbd bug I may suggest to drop to the 1.2 and lock the version if the issue will take any extraordinary effort to fix.

Revision history for this message
Andrey Korolyov (xdeller) wrote :

/CPU-saturated/CPU-unsaturated/

Revision history for this message
Andrey Korolyov (xdeller) wrote :

Steps to reproduce (we`re using direct calls via nova rootwrap)

- qemu-nbd -c /dev/nbdX /disk/..
- rm /disk/..
- qemu-nbd -d /dev/nbdX <- Problem begins here

Passing to OSTF team, fix please.

Changed in fuel:
assignee: Andrey Korolyov (xdeller) → Tatyana (tatyana-leontovich)
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-ostf (master)

Fix proposed to branch: master
Review: https://review.openstack.org/63110

Changed in fuel:
assignee: Tatyana (tatyana-leontovich) → Andrey Korolyov (xdeller)
status: Confirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-ostf (master)

Reviewed: https://review.openstack.org/63110
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=02325904f6db3dba53d463336a2842429719a4cb
Submitter: Jenkins
Branch: master

commit 02325904f6db3dba53d463336a2842429719a4cb
Author: Andrey Korolyov <email address hidden>
Date: Thu Dec 19 17:28:19 2013 +0400

    Add sleep between image detach

    Image detach via qemu-nbd and disk removal
    has a racy behaviour for locally placed URIs,
    but not for RBD disks for example.

    Closes-Bug: #1250231

    Change-Id: I2a3c8f5b8db44ed831916164815c60114cf22906

Changed in fuel:
status: Fix Committed → In Progress
Revision history for this message
Andrey Korolyov (xdeller) wrote :

Progress in the OSCI-980.

I`ve added timeout just after nbd detach and it works for the series of tests. Adding timeouts to the OSTF code resulted as lesser hit rate of the issue(?)

Revision history for this message
Andrey Korolyov (xdeller) wrote :

1.2 has no such regression. Will we use it in the 4.0 then?

Revision history for this message
Andrey Korolyov (xdeller) wrote :

As following mail conversation, switch to 1.2 will happen not regarding fixes in the orchestrator or OSTF suite (of course they are welcome).

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Did we switch to 1.2 in latest ISOs? What's the status of this issue?

Revision history for this message
Andrey Korolyov (xdeller) wrote :

Yes we did. Right now waiting some feedback from Tanya to pin the issue.

Changed in fuel:
assignee: Andrey Korolyov (xdeller) → Tatyana (tatyana-leontovich)
importance: Critical → Medium
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Update:
there are not qemu -nbd processes, but instance in error state after deletion:
2013-12-23 17:13:57.812+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name instance-00000013 -S -M pc-1.2 -m 64 -smp 1,sockets=1,cores=1,threads=1 -uuid fa1eea70-fd02-45fe-ab57-13d37c1d2e29 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2013.2.1,serial=0d919a82-21f2-569e-3177-8ea153aa23e0,uuid=fa1eea70-fd02-45fe-ab57-13d37c1d2e29 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000013.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/fa1eea70-fd02-45fe-ab57-13d37c1d2e29/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:06:d2:c6,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/fa1eea70-fd02-45fe-ab57-13d37c1d2e29/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Domain id=22 is tainted: high-privileges
char device redirected to /dev/pts/3
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
qemu-system-x86_64: -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:06:d2:c6,bus=pci.0,addr=0x3: pci_add_option_rom: failed to find romfile "pxe-virtio.rom"
qemu: terminating on signal 15 from pid 27091
2013-12-23 17:16:20.048+0000: shuttingroot@node-3:/var/log/libvirt/qemu# virsh list --all
 Id Name State
----------------------------------------------------
 - instance-00000013 shut off

 down

192 iso

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Fix proposed to the master
https://review.openstack.org/#/c/63951/
includes - use nova client to detach volumes (before we use cinder client)
before instance deletion verify that volume has been successful detach and delete
Tested on Ubuntu simple 1 controller/cinder + 2 compute/cinder Nova network, also on ubuntu HA with netron vlan

Changed in fuel:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/63951
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=32c179b19ae16016cf9154c4b62d4ff7a11519bb
Submitter: Jenkins
Branch: master

commit 32c179b19ae16016cf9154c4b62d4ff7a11519bb
Author: Tatyana Leontovich <email address hidden>
Date: Tue Dec 24 19:11:08 2013 +0200

    Add timeout between volume and instance deletion

    Make next changes in smoke.test_volumes.py:
    * Make volume detach trough nova client
    * Add verification that volume really deleted
      before deletion of instance

    Closes-Bug: #1250231

    Change-Id: If1dd2408cdf14f7a562ea8d5a447681d2c42df95

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-ostf (stable/4.0)

Reviewed: https://review.openstack.org/63968
Committed: https://git.openstack.org/cgit/stackforge/fuel-ostf/commit/?id=1f3e2716343886f8330f5a028b4c7990aadbd54f
Submitter: Jenkins
Branch: stable/4.0

commit 1f3e2716343886f8330f5a028b4c7990aadbd54f
Author: Tatyana Leontovich <email address hidden>
Date: Tue Dec 24 19:11:08 2013 +0200

    4.0 fixes: Add timeout between volume and instance deletion

    Make next changes in smoke.test_volumes.py:
    * Make volume detach trough nova client
    * Add verification that volume really deleted
      before deletion of instance

    Closes-Bug: #1250231

    Change-Id: If1dd2408cdf14f7a562ea8d5a447681d2c42df95

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Benoit Tremblay (benoit-c-tremblay) wrote :

This problem seems not to be specific to fuel as I have the same issue using devstack still investigating to find the reason.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.