Comment 4 for bug 1726372

St├ęphane Graber (stgraber) wrote :

Attaching what should be a fix for the container handling code.

This does two things:

 1) It stops assuming that argv[1] != argv[4] means that the crash occurred in a container as this check was getting false positives from host software using pidns (chrome) as well as false negatives from a container process that happens to have the same pid as the host process.

Instead, we're now checking whether we got a separate global pid. If we did, we check whether the crashed process shares mntns and pidns with the apport process (usually hostns). If the crashed process shares either namespace with the host, we consider it as not being a container crash and will process it as a host process crash, replacing the local pid with the global pid before continuing.

This ensures that a crash coming from an application that uses a pidns only will still get handled by apport as normal (pidns would differ, mntns wouldn't).

 2) When sending the crash to apport inside the container, we're now doing the following:
   1) Send a ucred as out-of-band data
   2) Send a fd as out-of-band data
   3) Send the arguments as normal data

   Apport in the container will then look for the ucred and if found, will override the pid in the arguments string with the pid found in the ucred.

   This allows for backward compatibility both ways. A container can run a newer apport and will just move on if the ucred is missing or the host can run a newer apport and the container will still get what it expects.

   When both are up to date, the message being sent now will use the ucred's pid field to get kernel level translation of the crashed process' pid. So apport in the container will be getting a pid that makes sense in its namespace as derived from the global pid, rather than using the crashed process' view of the pid.

The attached patch is against Ubuntu 16.04's version of apport and was tested on a 16.04 system.