Comment 15 for bug 1817738

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

The root cause of this problem is plymouth, indeed.

The 'chvt N' command blocks if the VT/tty is in VT_AUTO + KD_GRAPHICS state.

In this state the kernel bails out early in the ioctl(VT_ACTIVATE) syscall
and does not post the VT_EVENT_SWITCH event the ioctl(VT_WAITACTIVE) syscall
will be waiting for -- causing chvt to block.

The function path is:

  vt_ioctl(tty0, VT_ACTIVATE, ...)
  -> set_console()
     -> if (... VT_AUTO && KD_GRAPHICS ...) return -EINVAL; <<-- bails out.
     -> schedule_console_callback(); return 0; <<-- continue to send event.
        -> console_callback()
           -> change_console()
              -> complete_change_console()
                 -> vt_event_post(VT_EVENT_SWITCH, ...)

  vt_ioctl(tty0, VT_WAITACTIVE, ...)
  -> vt_waitactive()
     -> __vt_event_wait(VT_EVENT_SWITCH) <<-- blocks/wait to receive event.

gdm properly sets the VT out of VT_AUTO mode (which causes chvt not to block)
after it tells plymouth to deactivate.

BUT plymouth can set it back to VT_AUTO mode afterward, regardless, while it
handles the udev event of the DRM/DRI graphics card addition, as that causes
the VT/tty to be reconfigured.

This can be verified with plymouth debugging, e.g., kernel boot option
'plymouth.debug=file:/run/plymouth.debug', plus source code inspection:

1) gdm calls 'plymouth deactivate', which calls
   ply_terminal_close()
    -> ply_terminal_stop_watching_for_vt_changes()
       -> if (terminal->is_watching_for_vt_changes == true) ioctl(VT_SETMODE, VT_AUTO)
       -> terminal->is_watching_for_vt_changes = false

 [ply-boot-server.c:LINE] print_connection_process_identity:connection is from pid PID (/bin/plymouth deactivate) with parent pid PID (/usr/sbin/gdm)
...
 [ply-terminal.c:LINE] ply_terminal_close:restoring color palette
 [ply-terminal.c:LINE] ply_terminal_close:stop watching tty fd
...

2) plymouth udev event timeout expires, it notices the DRM/DRI devices,
   and re-enables the VT watching while processing those; in the calls:

   ply_terminal_open()
   -> ply_terminal_watch_for_vt_changes()
      -> terminal->is_watching_for_vt_changes = true;

 [ply-device-manager.c:LINE] create_devices_from_udev:Timeout elapsed, looking for devices from udev
 ...
 [ply-device-manager.c:LINE] create_devices_for_terminal_and_renderer_type:creating devices for /dev/dri/card0 (renderer type: 1) (terminal: /dev/tty1)
 ...
 [./plugin.c:LINE] load_driver:Opening '/dev/dri/card0'
 [ply-terminal.c:LINE] ply_terminal_open:trying to open terminal '/dev/tty1'

3) init calls 'plymouth quit --retain-splash' which goes into
   ply_terminal_close() again, and since watching is true, it
   sets the VT into VT_AUTO again (see calls in #1 above) ...
   (*after* gdm had already set the VT up out of VT_AUTO).

 [ply-boot-server.c:LINE] print_connection_process_identity:connection is from pid PID (/bin/plymouth quit --retain-splash) with parent pid PID (/sbin/init splash)
...
 [ply-terminal.c:LINE] ply_terminal_close:restoring color palette
 [ply-terminal.c:LINE] ply_terminal_close:stop watching tty fd
....

That depends on timing (plymouth udev event watch timeout + device detection)
and this probably explains why the problem does not happen every single time
(but apparently it's the case often; the problem reproduces most of the time
in this KVM guest with Ubuntu 18.04.2 Desktop).

After understanding that this behavior / code path is responsible for the
problem, I found there's an upstream for this in plymouth, which realizes
that after 'plymouth deactivate' the udev events should not be reacted on,
which prevents re-setting the VT_AUTO mode.

Interestingly this fix is already applied in Ubuntu Cosmic and later, for
LP: #1795637, due to a different problem (cayses wayland/xorg to fail).

The patch needed just a small refresh to apply to Bionic, and a test kernel
with it applied successfully passes all 'chvt' tests, multiple times.
1) While gdm is in the login screen, 'ssh <guest> -- sudo chvt 4'
2) With gdm autologin, try the same.

Possible workarounds for this are disabling plymouth (remove the 'splash'
option from kernel/grub boot options) OR setting the kernel console to
a device other than tty0/tty1 (check it with 'dmesg | grep console'),
for example, console=ttyS0 or console=ttyS1 (serial/non-graphic consoles).
This causes plymouth not to mess with the VT used by gdm (a graphic one).