tiny_socket does not handle EINTR and may fail if SIGCHLD signal received

Bug #501617 reported by Borja López Soilán (NeoPolus)
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Confirmed
Wishlist
OpenERP's Framework R&D

Bug Description

The basic network functions used by OpenERP for receiving and sending data (tiny_socket.py) don't handle EINTR ("interrupted system call") errors, and that may cause weird race conditions.

EINTR errors happen when the process receives signals while doing some low level I/O (like receiving or sending data over a socket): the I/O operation is interrupted by the kernel (is just not performed*) so the process can take care of the signal, and should be retried again afterwards (the I/O didn't really fail, it was just interrupted to wake up the process).

   (*) EINTR for socket fuctions means "The recv() function was interrupted by a signal that was caught, before any data was available." / "A signal interrupted send() before any data was transmitted." (http://www.opengroup.org/onlinepubs/000095399/functions/recv.html) so the calls can be safely retried (http://www.wlug.org.nz/EINTR)

Python does not handle EINTR errors by itself (there had been discussions about this: http://bugs.python.org/issue1628205) so is the Python programmer that uses I/O who must take care (and retry the operation).

***

This bug was first detected on the Koo client, that uses a copy of the tiny_socket.py file for NetRPC communication, but may affect all the code that depends on tiny_socket.py (like the server itself, the GTK client and the Web client). The bug shown up on computers running Linux Mint 7 (kernel 2.6.28-16 32bits) and Linux Ubuntu 9.10 64 bit (2.6.31-14) - (https://bugs.launchpad.net/openobject-client-kde/+bug/484651).

On (tiny_socket.py) mysocket.myreceive, some data may have been received (in calls to recv) when the EINTR error happens; as the EINTR is not handled, mysocket.myreceive will just raise up the Exception so the current operation will fail. That means that OpenERP is susceptible to weird race conditions (it will fail only when the SIGCHLD, or other non-ignored signal, arrives while performing I/O) or denial of service attacks (sending lots of signals to OpenERP).

For example, on the OpenERP server, some addons use spawnlp or other similar functions to create sub-processes. Some of them, like the jasper_reports, need to run those sub-process without waiting for the spawned process to end (os.P_NOWAIT). In that context, OpenERP will receive SIGCHLD signals when the spawned sub-process end. If OpenERP receives one of those signals while it is performing a socket I/O operation (mainly using socket.recv or socket.send functions in tiny_socket.py), the call may fail with an EINTR error (4) and data may be lost.

***

A possible fix is to patch tiny_socket.py so it handles EINTR errors, retrying the recv/send operations. This would make sure that no signal breaks mysocket.mysend or mysocket.myreceive.

As an optional workaround, if no fix is applied to tiny_socket.py, SIGCHLD signals could be ignored ("signal.signal(signal.SIGCHLD, signal.SIG_IGN)"), and no EINTR error will be raised then when a sub-process end. This would avoid the spawn* with os.P_NOWAIT problem.

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

I'm including a possible patch for tiny_socket.py (server 5.0; but may be applied on trunk too) that adds EINTR errors handling.

tags: added: call eintr interrupted system
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Borja, thanks a lot for the report and for the patch

Changed in openobject-server:
assignee: nobody → OpenERP's Framework R&D (openerp-dev-framework)
importance: Undecided → Wishlist
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.