until_no_eintr used on unrestartable IO, and cannot address all cases of EINTR.

Bug #496813 reported by Martin Pool
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Andrew Bennetts
2.0
Won't Fix
Undecided
Unassigned
2.1
Fix Released
High
Andrew Bennetts
2.2
Fix Released
High
Andrew Bennetts

Bug Description

From thread 'until_no_eintr correct?'

> until_no_eintr is in a sense a dangerous function because it encourages you to solve just half the problem. Perhaps we should deprecate it in favour of things that really do read or write all the requested data.

Tags: osutils

Related branches

Revision history for this message
Andrew Bennetts (spiv) wrote :

Specifically, the problem is that we often catch-and-retry on EINTR with operations that aren't safe to retry. Thanks very much to Martin <gzlist>, who has careful analysed the scope of the problem on the mailing list, and started writing patches towards it.

The current plan is to try use signal.siginterrupt on Python 2.6 to avoid this, at least for signal handlers that bzrlib registers. We probably should write a simple C extension for earlier Python versions to provide the same functionality.

It's possible that plugins and non-bzr programs that use bzrlib might register their own signal handlers; for them we should probably document that bzrlib doesn't expect EINTR in general so that 3rd-party code using our API ought to take care to also use siginterrupt or similar when registering signal handlers.

FWIW, the only signal handlers registered by bzrlib itself are:

 * SIGWINCH (in bzrlib/ui/text.py)
 * SIGQUIT (in bzrlib/breakin.py) [we can ignore SIGBREAK, which can't cause EINTR as it is windows-only]
 * SIGINT (in bzrlib/transport/ssh.py) — but only to install the SIG_IGN handler, and only in a child process just before exec of an SSH subprocess, so no problem there.

I'm not sure about the signal handlers that CPython installs automatically on VM startup; there's not many, but there is at least the SIGINT handler (to turn Ctrl-C into KeyboardInterrupt). So in addition to running siginterrupt for bzrlib's signal handlers for SIGWINCH and SIGQUIT, we may also want to do siginterrupt(SIGINT, False) in bzrlib.initialize(). I'll investigate further before submitting my final patch.

If using siginterrupt is a sufficient fix for this issue, as I think and hope it will be, then we can also remove until_no_eintr (after deprecating), which will make everyone happy :)

Changed in bzr:
assignee: nobody → Andrew Bennetts (spiv)
status: Confirmed → In Progress
Revision history for this message
Andrew Bennetts (spiv) wrote :

I'll briefly mention that while currently until_no_eintr is in places used badly, causing bugs, it's also used correctly and usefully for pipe IO bzrlib/smart/medium.py, so we can't simply disable it as an incremental step without causing a regression.

Revision history for this message
Andrew Bennetts (spiv) wrote :

My plan to fix this will also fix #173007.

Andrew Bennetts (spiv)
summary: - until_no_eintr ignores the possibility of partial success
+ until_no_eintr used on unrestartable IO, and cannot address all cases of
+ EINTR.
Revision history for this message
Andrew Bennetts (spiv) wrote :

Btw, here are the default signal handlers registered by CPython 2.6:

 * SIGINT; but it just causes KeyboardInterrupt to be raised. Even if there are multiple threads I'm fairly sure this will always cause KeyboardInterrupt to be raised in whichever thread received the signal, so IO in other threads should be safe?
 * SIGPIPE, SIGXFSZ, SIGXFZ: set to SIG_IGN, so is harmless. (Even though Python calls siginterrupt(SIGPIPE, 1) after setting it to SIG_IGN, no EINTR occurs during a blocked os.read when a SIGPIPE arrives. Maybe POSIX platforms other than Ubuntu behave differently in this case, but hopefully not.)

So I think we don't need to worry about these. Phew!

Revision history for this message
Andrew Bennetts (spiv) wrote :

I've submitted merge proposals for 2.0 and 2.1 to address this for Python 2.6, which has signal.siginterrupt available. The C code to call siginterrupt is pretty simple, so next I'll write a simple C module to provide the same functionality when bzr is run on Python < 2.6. (Another possibility is to use ctypes, but that's only in the standard library in 2.5 and ctypes is a bit icky anyway.)

Revision history for this message
Andrew Bennetts (spiv) wrote :

I think this is fixed, although not in the way orriginally planned, by the fix for bug 583941: we simply don't register any signal handlers any more, so we don't encounter EINTR any more. We also no longer use until_no_eintr anywhere in appropriate, and provide a helper in osutils to use siginterrupt for any code (e.g. a plugin, or maybe a future version of bzrlib...) that does want to register a signal without causing EINTR. And the docstring of until_no_eintr is now clearer about its limitations, too.

Changed in bzr:
status: In Progress → Fix Released
Revision history for this message
Andrew Bennetts (spiv) wrote :

Marking Won't Fix for 2.0 because bzr 2.0 does not register any problematic signal handlers except SIGQUIT, which is a developer debugging tool. Also, I don't know of any plugins or other clients of bzrlib that register signal handlers either, so the concerns for 2.0 are just academic. So I'm not planning on making any changes to 2.0 for this, and wouldn't recommend anyone else spends time on doing so either.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.