Block device scheduler should be multiqueue for spinning disks

Bug #1903543 reported by David Krauser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-gcp (Ubuntu)
Incomplete
High
Khaled El Mously

Bug Description

On a GCE e2-medium instance running Groovy with a standard persistent disk, we see:

$ cat /sys/block/sda/queue/rotational
1
$ cat /sys/block/sda/queue/scheduler
[none] mq-deadline

I'd expect the contents of /sys/block/sda/queue/scheduler to be:
[mq-deadline] none

---

$ lsb_release -rd
Description: Ubuntu 20.10
Release: 20.10

$ apt-cache policy linux-gcp
linux-gcp:
  Installed: 5.8.0.1008.8
  Candidate: 5.8.0.1008.8
  Version table:
 *** 5.8.0.1008.8 500
        500 http://us-central1.gce.archive.ubuntu.com/ubuntu groovy/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
Colin Ian King (colin-king) wrote :

Can you provide the output from dmesg so we can get an idea of what is happening during boot?

Changed in linux-gcp (Ubuntu):
status: New → Triaged
status: Triaged → Incomplete
importance: Undecided → High
Revision history for this message
David Krauser (davidkrauser) wrote :
Download full text (40.2 KiB)

@colin-king dmesg output:

[ 0.000000] Linux version 5.8.0-1008-gcp (buildd@lgw01-amd64-043) (gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35.1) #8-Ubuntu SMP Thu Oct 15 12:48:27 UTC 2020 (Ubuntu 5.8.0-1008.8-gcp 5.8.14)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.8.0-1008-gcp root=PARTUUID=506f6274-5923-4632-96d5-b96445cc673c ro console=ttyS0 panic=-1
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000be11dfff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000be11e000-0x00000000be120fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000be121000-0x00000000be121fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000be122000-0x00000000be2d1fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000be2d2000-0x00000000be2d9fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000be2da000-0x00000000be31afff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000be31b000-0x00000000bf39afff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bf39b000-0x00000000bf3f2fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000bf3f3000-0x00000000bf3fafff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000bf3fb000-0x00000000bf3fefff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000bf3ff000-0x00000000bffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] efi: EFI v2.70 by EDK II
[ 0.000000] efi: TPMFinalLog=0xbe2d2000 ACPI=0xbf3fa000 ACPI 2.0=0xbf3fa014 SMBIOS=0xbf3cd000 MEMATTR=0xbe6d3698 RNG=0xbf3cec98 TPMEventLog=0xbd8b6018
[ 0.000000] efi: seeding entropy pool
[ 0.000000] random: fast init done
[ 0.000000] secureboot: Secure boot disabled
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] DMI: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 45401001, primary cpu clock
[ 0.000000] kvm-clock: using sched offset of ...

Revision history for this message
Khaled El Mously (kmously) wrote :

Strangely:

[ 0.743029] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244)
[ 0.744256] io scheduler mq-deadline registered

Revision history for this message
Khaled El Mously (kmously) wrote :

sda is definitely being registered as mq-deadline .. and it appears to remain that way until quite late in the boot. But then something is changing the sda scheduler later at some point. No luck so far determining what is making that later change.

Revision history for this message
Khaled El Mously (kmously) wrote :

Still trying to hunt down who is changing that attribute. It is certainly not a kernel change that is causing this new behaviour. Something in userspace is doing it, very late in the boot.

In the attached systemd time thingie, at the time that both rc-local.service and khaled.service are run, the scheduler was still mq-deadline:

from rc.local Tue Nov 10 03:22:15 UTC 2020 [mq-deadline] none
from myscript.sh Tue Nov 10 03:22:21 UTC 2020 [mq-deadline] none

My guess is that something in cloud-config.service or cloud-final.service is making this change.

Changed in linux-gcp (Ubuntu):
assignee: nobody → Khaled El Mously (kmously)
Revision history for this message
Khaled El Mously (kmously) wrote :

In this new systemd timegraph (systemd-time-graph-2.svg) , at the time khaled.service is run, the scheduler had to changed to [none]:

from rc.local Tue Nov 10 03:39:00 UTC 2020 [mq-deadline] none
from myscript.sh Tue Nov 10 03:39:08 UTC 2020 [none] mq-deadline

The difference between the first time-graph and the second is that khaled.service was moved to after cloud-config.service has *finished* starting-up. I think this basically confirms that something in cloud-config.service is doing (or triggering somehow) this change.

Revision history for this message
Khaled El Mously (kmously) wrote :

With a modified kernel, I was able to determine the name of the process making the change: google_guest_agent

Though I'm not really sure why it is making that change.

Revision history for this message
Khaled El Mously (kmously) wrote :

From the google_guest_agent code:

google_guest_agent/instance_setup.go:

func setIOScheduler() error {
        dir, err := os.Open("/sys/block")
        if err != nil {
                return err
        }
        defer dir.Close()

        devs, err := dir.Readdirnames(0)
        if err != nil {
                return err
        }

        for _, dev := range devs {
                // Detect if device is using MQ subsystem.
                stat, err := os.Stat("/sys/block/" + dev + "/mq")
                if err == nil && stat.IsDir() {
                        f, err := os.OpenFile("/sys/block/"+dev+"/queue/scheduler", os.O_WRONLY|os.O_TRUNC, 0700)
                        if err != nil {
                                return err
                        }
                        _, err = f.Write([]byte("none"))
                        if err != nil {
                                return err
                        }
                }
        }
        return nil
}

Seems to be intentionally doing that.

Revision history for this message
Khaled El Mously (kmously) wrote :

@David Maybe this should be brought up in the next meeting. I don't think there's more that can be done about this from our side.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.