qemu is very slow when adding 16,384 virtio-scsi drives

Bug #1686980 reported by Richard Jones
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

qemu runs very slowly when adding many virtio-scsi drives. I have attached a small reproducer shell script which demonstrates this.

Using perf shows the following stack trace taking all the time:

    72.42% 71.15% qemu-system-x86 qemu-system-x86_64 [.] drive_get
            |
             --72.32%--drive_get
                       |
                        --1.24%--__irqentry_text_start
                                  |
                                   --1.22%--smp_apic_timer_interrupt
                                             |
                                              --1.00%--local_apic_timer_interrupt
                                                        |
                                                         --1.00%--hrtimer_interrupt
                                                                   |
                                                                    --0.83%--__hrtimer_run_queues
                                                                              |
                                                                               --0.64%--tick_sched_timer

    21.70% 21.34% qemu-system-x86 qemu-system-x86_64 [.] blk_legacy_dinfo
            |
            ---blk_legacy_dinfo

     3.65% 3.59% qemu-system-x86 qemu-system-x86_64 [.] blk_next
            |
            ---blk_next

Revision history for this message
Richard Jones (rjones-redhat) wrote :
Revision history for this message
Daniel Berrange (berrange) wrote :

The first place where it ages an insane amount of time is simply processing -drive options. The stack trace I see is this

(gdb) bt
#0 0x00005583b596719a in drive_get (type=type@entry=IF_NONE, bus=bus@entry=0, unit=unit@entry=2313) at blockdev.c:223
#1 0x00005583b59679bd in drive_new (all_opts=0x5583b890e080, block_default_type=<optimized out>) at blockdev.c:996
#2 0x00005583b5971641 in drive_init_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>)
    at vl.c:1154
#3 0x00005583b5c1149a in qemu_opts_foreach (list=<optimized out>, func=0x5583b5971630 <drive_init_func>, opaque=0x5583b9980030, errp=0x0) at util/qemu-option.c:1114
#4 0x00005583b5830d30 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4499

We're iterating over every -drive option. Now because we're using if=none, and thus unit==0, line 996 of blockdev.c looks calling drive_get() until we find a matching drive, in order to identify the unit number. So we have a loop over every drive, calling drive_new which loops over every drive calling drive_get which loops over every drive. So about O(N*N*N)

Revision history for this message
Daniel Berrange (berrange) wrote :

I instrumented drive_new to time how long 1000 creations took with current code:

1000 drive_new() in 0 secs
1000 drive_new() in 2 secs
1000 drive_new() in 18 secs
1000 drive_new() in 61 secs

As a quick hack you can just disable the drive_get() calls when if=none. They're mostly just used to fill in default unit_id, but that's not really required for if=none. That said, if no id= parameter is set, then the code does expect unit_id to be valid, so not sure how to fully fix that.

Anyway, with this hack applied it is much faster, but there is still some kind of N*N complexity going on, because drive_new() gets slower & slower as each drive is created - just not nearly as badly as before.

1000 drive_new() in 0 secs
1000 drive_new() in 0 secs
1000 drive_new() in 0 secs
1000 drive_new() in 1 secs
1000 drive_new() in 1 secs
1000 drive_new() in 1 secs
1000 drive_new() in 2 secs
1000 drive_new() in 2 secs
1000 drive_new() in 2 secs
1000 drive_new() in 4 secs
1000 drive_new() in 4 secs
1000 drive_new() in 6 secs
1000 drive_new() in 8 secs
1000 drive_new() in 8 secs

Revision history for this message
Daniel Berrange (berrange) wrote :

I added further instrumentation and got this profile of where the remaining time goes

1000x drive_new 18.347secs
-> 1000x blockdev_init 18.328secs
   -> 1000x monitor_add_blk 4.515secs
      -> 1000x blk_by_name 1.545secs
      -> 1000x bdrv_find_node 2.968secs
   -> 1000x blk_new_open 13.786secs
      -> 1000x bdrv_open 13.783secs

These numbers are all increasing as we process more & more -drive args, so there's some O(N) factor in blk_by_name, bdrv_find_node and bdrv_open

Revision history for this message
Thomas Huth (th-huth) wrote :

Is this faster nowadays if you use the new -blockdev parameter instead of -drive?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.