QEMU-KVM / detect_zeroes causes KVM to start unlimited number of threads on Guest-Sided High-IO with big Blocksize

Bug #1687653 reported by Florian Strankowski
260
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Invalid
Undecided
Unassigned
Ubuntu
Confirmed
Undecided
Unassigned

Bug Description

QEMU-KVM in combination with "detect_zeroes=on" makes a Guest able to DoS the Host. This is possible if the Host itself has "detect_zeroes" enabled and the Guest writes a large Chunk of data with a huge blocksize onto the drive.

E.g.: dd if=/dev/zero of=/tmp/DoS bs=1G count=1 oflag=direct

All QEMU-Versions after implementation of detect_zeroes are affected. Prior are unaffected. This is absolutely critical, please fix this ASAP!

#####

Provided by Dominik Csapak:

source , bs , count , O_DIRECT, behaviour

urandom , bs 1M, count 1024, O_DIRECT: OK
file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, no O_DIRECT: NOT OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK

discard on:

urandom , bs 1M, count 1024, O_DIRECT: OK
rand file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, no O_DIRECT: NOT OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK

detect_zeros off:

urandom , bs 1M, count 1024, O_DIRECT: OK
rand file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: OK
zero file , bs 1G, count 1, O_DIRECT: OK
zero file , bs 1G, count 1, no O_DIRECT: OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK

#####

Provided by Florian Strankowski

bs - count - io-threads

512K - 2048 - 2
1M - 1024 - 2
2M - 512 - 4
4M - 256 - 6
8M - 128 - 10
16M - 64 - 18
32M - 32 - uncountable

Please refer to further information here:

https://bugzilla.proxmox.com/show_bug.cgi?id=1368

Revision history for this message
Florian Strankowski (fstrankowski) wrote :
information type: Private Security → Public
information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Public Security
Changed in qemu:
status: New → Confirmed
Revision history for this message
Florian Strankowski (fstrankowski) wrote :

Sorry ab out the visibility settings, this bugtracker drives me nuts.

information type: Public Security → Private Security
information type: Private Security → Public Security
Revision history for this message
Florian Strankowski (fstrankowski) wrote :

Just to make this clear: This bug affects only LVM-backed storages. File-based-storage is not affected. LVM-Thin and also LVM-Thick.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

I'm unable to reproduce this issue. The host stays responsive and the dd command completes in a reasonable amount of time. QEMU does not exceed the 64-thread pool size.

Please post steps to reproduce the issue using a minimal command-line without libvirt.

Here is information on my attempt to reproduce the problem:

Guest: Kernel 4.10.8-200.fc25.x86_64
Host: 4.10.11-200.fc25.x86_64
QEMU: qemu.git/master (e619b14746e5d8c0e53061661fd0e1da01fd4d60)

The LV is 1 GB on top of LUKS on a Samsung MZNLN256HCHP SATA SSD drive.

mpstat -P ALL 5 output:
11:02:02 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
11:02:07 AM all 3.36 0.00 6.22 34.54 0.25 0.50 0.00 3.11 0.00 52.03
11:02:07 AM 0 2.82 0.00 5.63 32.39 0.80 1.21 0.00 3.22 0.00 53.92
11:02:07 AM 1 3.02 0.00 6.04 28.77 0.20 0.20 0.00 3.02 0.00 58.75
11:02:07 AM 2 3.56 0.00 7.71 44.27 0.20 0.40 0.00 2.37 0.00 41.50
11:02:07 AM 3 3.81 0.00 5.61 32.46 0.00 0.40 0.00 4.01 0.00 53.71

vmstat 5 output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 0 1617404 6484 3541468 0 0 2145 84794 1976 8814 8 8 64 20 0
 0 0 0 1619492 6484 3538592 0 0 613 69340 1518 7430 6 7 70 17 0
 0 0 0 1618920 6484 3538680 0 0 280 75199 1421 6811 6 7 52 35 0

pidstat -v -p $PID_OF_QEMU 5 output:
11:01:08 AM UID PID threads fd-nr Command
11:02:03 AM 0 13043 67 37 qemu-system-x86
11:02:08 AM 0 13043 67 37 qemu-system-x86
11:02:13 AM 0 13043 67 37 qemu-system-x86

$ sudo x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
        -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 \
        -drive file=test.img,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on \
        -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 \
        -drive file=/dev/path/to/testlv,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on \
        -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1,bootindex=101 \
        -nographic

guest# dd if=/dev/zero of=/dev/sdb bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.0681 s, 71.3 MB/s

Revision history for this message
Florian Strankowski (fstrankowski) wrote :

Please be so kind and go for a 6G LVM-Vol and do "dd if=/dev/zero of=/dev/sdb bs=3G count=2 oflag=direct". Please keep an eye on your processor usage in comparison to the threads created. Its harder to knock-down an SSD-Backed system than one with spinners.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

After further investigation on IRC the following points were raised:

1. Non-vcpu threads in QEMU weren't being isolated. Libvirt can do this
   using the <cputune> domain XML element. The guest can create a high
   load if some QEMU threads are unconstrained.

2. The wait% CPU stat was causing confusion. It's the idle time during
   which synchronous I/O is pending. High wait% does not mean that the
   system is under high CPU load. detect-zeroes=on can take a
   synchronous I/O path even when aio=native is used, and this results
   in wait% instead of idle%.

I'm closing the bug.

Changed in qemu:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.