VT switching API is inherently racy

Bug #290197 reported by Pete Graner
8
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Canonical Kernel Team

Bug Description

The requisite calling sequence for switching to a different VT is:

1. ioctl(VT_ACTIVATE, N) // request an asynchronous switch to VT #N
2. ioctl(VT_WAITACTIVE, N) // wait for VT #N to become active

If, between these two steps, another process calls VT_ACTIVATE, call #2 will block indefinitely waiting for the VT to become active.

Tags: kj-omit
Pete Graner (pgraner)
Changed in linux:
assignee: nobody → lieb
importance: Undecided → High
milestone: none → intrepid-updates
status: New → Triaged
Revision history for this message
Matt Zimmerman (mdz) wrote :

Some examples of problems related to this API include bug 243047 and bug 268644.

There was also a bug way back in Ubuntu 6.10 or so which was fixed in usplash 0.4-18, though I can't find the bug number at the moment.

description: updated
Revision history for this message
Matt Zimmerman (mdz) wrote :

Here's an exchange between Keith Packard and Linus Torvalds on the subject of race conditions in using this API:

http://lists.freedesktop.org/archives/xorg/2007-October/028837.html

Revision history for this message
Jim Lieb (lieb) wrote :
Download full text (4.1 KiB)

I have tried a couple of versions of a patch to make the activate/
waitactive more atomic but each has had problems. This note is
to document the issues I have found so far. I'm looking for some
feedback since this problem spans the kernel, X and a small set of
apps.

I have cut the code sequence from Matt's attachment to doc the
issue. I have verified the X code (actually in two hw spec places).

This is the key bit of uglyness. Note that Ajax comments on this in
the IRC log attached... What follows is an analysis with results of some attempted patches.

 console_fd = open ("/dev/tty7", O_RDWR|O_NDELAY);
 chown ("/dev/tty7", 0, 0);
 chown ("/dev/tty0", 0, 0);
 ioctl (console_fd, VT_GETSTATE, &vts);
 ioctl (console_fd, VT_ACTIVATE, 7);

This is the first problem. We *really* need the ACTIVATE/WAITACTIVE
to be atomic. My latest change was to turn "want_console" into
a mutex with a cmpxchg, setting it to -1 after the wait. This would serialize the activates so that the waits return. The problem with
this is twofold. First, the clearing depends on the other side completing the transaction. Second, there is not always a waiter.
Therefore, it either gets reset too son or the whole of console switching locks up. We don't have waiters for the ctrl+meta+fN, output char sequence, or for chvt and friends, the "async" case...
There is also the problem that an ACTIVATE could silently fail to
do anything in the current code (doesn't check return status).

 ioctl (console_fd, VT_WAITACTIVE, 7);

This is the second nasty part of this exchange. The only way to guarantee it happens is to delay the reset until the wait. If we keep the reset on the "complete_change_console" side, there is a
window for the next activate to slip through. Where we want to reset is here but we can't. The fundamental problem is that we cannot assert that the waitactive will ever return. We can't even test "want_console" reliably.

 ioctl (console_fd, VT_GETMODE, &VT);
 VT.mode = VT_PROCESS;
 VT.relsig = SIGUSR1;
 VT.acqsig = SIGUSR1;
 ioctl (console_fd, VT_SETMODE, &VT);

This is also a race. Somebody else can come by after the wait and
steal the console before the setmode takes place. This call completely changes the interface to no longer do activate/waits but instead, exchange signals at the process level.

 ioctl (console_fd, KDSETMODE, KD_GRAPHICS);

This adds some more fun. It disables switching.

---------
The problem comes down to our using an outmoded design for
things it was never intended for. It harkens back to single cpu,
non-preemptive kernels with simple VT and X sessions. Using
this for suspend signaling is bizarre.

There seem to be three choices:

1. Try more hacks. As ajax notes, he handles this in userspace and
simply crashes X, not an option with suspend/resume. One
problem with the existing api and its use is the lack of returning or
handling error returns. X could be a bit more graceful and
careful use could be able to re-start failed switches. This would increase the ubuntu maintenance load for an interface that not
many maintainers are enthusiastic about...

2. Try a new api to existing code. The big requirement would be to
have ...

Read more...

Matt Zimmerman (mdz)
Changed in linux (Ubuntu):
importance: High → Medium
tags: added: kj-omit
Revision history for this message
Jim Lieb (lieb) wrote :

This report, written when the bug was first assigned to me is attached to preserve the info. The introduction to KMS to the kernel and Xorg changes much of the code described in the report, making it out of date starting with Karmic. Once mode setting stablizes, this bug should be closed. If the hangs and suspend/resume problems persist, the info in this doc may have further use in chasing down the remaining issues .

Pete Graner (pgraner)
Changed in linux (Ubuntu):
status: Triaged → Won't Fix
assignee: Jim Lieb (lieb) → Canonical Kernel Team (canonical-kernel-team)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.