Comment 3 for bug 290197

Revision history for this message
Jim Lieb (lieb) wrote :

I have tried a couple of versions of a patch to make the activate/
waitactive more atomic but each has had problems. This note is
to document the issues I have found so far. I'm looking for some
feedback since this problem spans the kernel, X and a small set of
apps.

I have cut the code sequence from Matt's attachment to doc the
issue. I have verified the X code (actually in two hw spec places).

This is the key bit of uglyness. Note that Ajax comments on this in
the IRC log attached... What follows is an analysis with results of some attempted patches.

 console_fd = open ("/dev/tty7", O_RDWR|O_NDELAY);
 chown ("/dev/tty7", 0, 0);
 chown ("/dev/tty0", 0, 0);
 ioctl (console_fd, VT_GETSTATE, &vts);
 ioctl (console_fd, VT_ACTIVATE, 7);

This is the first problem. We *really* need the ACTIVATE/WAITACTIVE
to be atomic. My latest change was to turn "want_console" into
a mutex with a cmpxchg, setting it to -1 after the wait. This would serialize the activates so that the waits return. The problem with
this is twofold. First, the clearing depends on the other side completing the transaction. Second, there is not always a waiter.
Therefore, it either gets reset too son or the whole of console switching locks up. We don't have waiters for the ctrl+meta+fN, output char sequence, or for chvt and friends, the "async" case...
There is also the problem that an ACTIVATE could silently fail to
do anything in the current code (doesn't check return status).

 ioctl (console_fd, VT_WAITACTIVE, 7);

This is the second nasty part of this exchange. The only way to guarantee it happens is to delay the reset until the wait. If we keep the reset on the "complete_change_console" side, there is a
window for the next activate to slip through. Where we want to reset is here but we can't. The fundamental problem is that we cannot assert that the waitactive will ever return. We can't even test "want_console" reliably.

 ioctl (console_fd, VT_GETMODE, &VT);
 VT.mode = VT_PROCESS;
 VT.relsig = SIGUSR1;
 VT.acqsig = SIGUSR1;
 ioctl (console_fd, VT_SETMODE, &VT);

This is also a race. Somebody else can come by after the wait and
steal the console before the setmode takes place. This call completely changes the interface to no longer do activate/waits but instead, exchange signals at the process level.

 ioctl (console_fd, KDSETMODE, KD_GRAPHICS);

This adds some more fun. It disables switching.

---------
The problem comes down to our using an outmoded design for
things it was never intended for. It harkens back to single cpu,
non-preemptive kernels with simple VT and X sessions. Using
this for suspend signaling is bizarre.

There seem to be three choices:

1. Try more hacks. As ajax notes, he handles this in userspace and
simply crashes X, not an option with suspend/resume. One
problem with the existing api and its use is the lack of returning or
handling error returns. X could be a bit more graceful and
careful use could be able to re-start failed switches. This would increase the ubuntu maintenance load for an interface that not
many maintainers are enthusiastic about...

2. Try a new api to existing code. The big requirement would be to
have all the syscalls atomic wrt console management. The
downside is it can't mix with the old api and breaks chvt and friends.
This also breaks a kernel api but, fortunately, it is of limited scope, i.e. chvt and friends, X, etc. Ubuntu maintenance issues apply here too.

3. Step back and try again with a new console subsystem.
There are some good ideas in http://homepages.tesco.net/J.deBoynePollard/Proposals/linux-console-daemon.html.
It appears that this has gotten no further than this white paper but
there are some good ideas here. It *should* be possible to even preserve the old API as a compatibility mode for apps outside X and
other ubunto apps. This is not even a series of bug fixes, it is a project. The details would be more appropriate for a whitepaper
or design proposal.

I'll next publish a provisional patch here. I want some feedback
first so I can better understand requirements on the lpia+ubuntu
side and confirm in my own mind whether such a patch would
really help.