Regression: Stable kernel update to 3.13.0-66 breaks UDP sockets

Bug #1510213 reported by Sebastian Marsching on 2015-10-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

I am running the 3.13 series kernel on Ubuntu 14.04 LTS (Trusty Tahr).

A change introduced in version 3.13.0-66.108 of this kernel breaks UDP sockets under certain circumstances. The effect is that the recvfrom operation returns with an error, setting errno to EFAULT, even though the pointers passed to recvfrom are okay.

Using bisection, I could track down this problem to a single change:

2dde51aa53393a531b493e3a8194e4d467e194a3 is the first bad commit
commit 2dde51aa53393a531b493e3a8194e4d467e194a3
Author: Herbert Xu <email address hidden>
Date: Mon Jul 13 20:01:42 2015 +0800

    net: Fix skb csum races when peeking

    BugLink: http://bugs.launchpad.net/bugs/1500810

    [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]

    When we calculate the checksum on the recv path, we store the
    result in the skb as an optimisation in case we need the checksum
    again down the line.

    This is in fact bogus for the MSG_PEEK case as this is done without
    any locking. So multiple threads can peek and then store the result
    to the same skb, potentially resulting in bogus skb states.

    This patch fixes this by only storing the result if the skb is not
    shared. This preserves the optimisations for the few cases where
    it can be done safely due to locking or other reasons, e.g., SIOCINQ.

    Signed-off-by: Herbert Xu <email address hidden>
    Acked-by: Eric Dumazet <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>
    Signed-off-by: Luis Henriques <email address hidden>

:040000 040000 423debc59ddbc7424283e647e609289fd40dc494 2511e80df4c30a7309737f6b3cee0260269a0ef7 M net

Steps to reproduce the problem: Install freeradius, and have a radius client connect to the RADIUS server. After a short amount of time, freeradius spins at 100% CPU, alternating between a select and recvfrom call. The recvfrom call fails every time with error EFAULT.

As an alternative to freeradius, you can use the following minimal program that I wrote that also exhibits this problem:

#include <stdio.h>

#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/select.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/ip.h>

int prepare_socket(int port) {
  int sock = socket(AF_INET, SOCK_DGRAM, 0);
  if (sock < 0) {
    printf("Could not create socket.\n");
    return -1;
  }
  int opt = 1;
  if (setsockopt(sock, SOL_IP, IP_PKTINFO, &opt, sizeof(opt)) < 0) {
    printf("setsockopt failed.\n");
    return -1;
  }
  struct sockaddr_in bind_addr;
  bind_addr.sin_family = AF_INET;
  bind_addr.sin_port = htons(port);
  bind_addr.sin_addr.s_addr = INADDR_ANY;
  int rc = bind(sock, (struct sockaddr *) &bind_addr, sizeof(bind_addr));
  if (rc < 0) {
    printf("Could not bind socket.\n");
    return -1;
  }
  return sock;
}

int main(int argc, char **argv) {
  int sock = prepare_socket(1812);
  if (sock < 0) {
    return 1;
  }
  for (;;) {
    unsigned char buffer[4];
    struct sockaddr src;
    socklen_t src_len = sizeof(src);
    ssize_t received_len = recvfrom(sock, buffer, sizeof(buffer), MSG_PEEK, &src, &src_len);
    if (received_len < 0) {
      if (errno == EAGAIN) {
        printf("EAGAIN\n");
        continue;
      }
      printf("recvfrom failed.\n");
      perror(NULL);
      return 1;
    }
    if (received_len == 4) {
      src_len = sizeof(src);
      received_len = recvfrom(sock, buffer, sizeof(buffer), 0, &src, &src_len);
      if (received_len != 4) {
        printf("Strange received length.\n");
        return 1;
      }
    }
  }
  /* Never reached */
  return 0;
}

However, I did not find out how to craft the traffic that triggers the bug. However, the traffic from a RADIUS client (a WiFi AP in my case) reliably triggers the bug after a few seconds.

As this is perfectly legal code and the problem only appears with the change introduced earlier, I think that this is a regression and the change in question should be removed from the stable kernel tree.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-66-generic 3.13.0-66.108
ProcVersionSignature: Ubuntu 3.13.0-66.108-generic 3.13.11-ckt27
Uname: Linux 3.13.0-66-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Oct 25 19:23 seq
 crw-rw---- 1 root audio 116, 33 Oct 25 19:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.16
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Mon Oct 26 18:58:42 2015
HibernationDevice: RESUME=/dev/mapper/vg0-swap
InstallationDate: Installed on 2015-01-02 (296 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.

 virbr0 no wireless extensions.
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-66-generic root=/dev/mapper/vg0-root ro
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-66-generic N/A
 linux-backports-modules-3.13.0-66-generic N/A
 linux-firmware 1.127.15
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-trusty:cvnBochs:ct1:cvr:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-trusty
dmi.sys.vendor: QEMU

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Luis Henriques (henrix) wrote :

Thank you, this seems to be a duplicate of bug #1508510 which has been fixed already in -proposed.

I can confirm that this bug is a duplicate of bug #1508510 and that the kernel from -proposed fixes the problem.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers