shutdown(2) behavior changed in kernel

Bug #731878 reported by Florian Effenberger on 2011-03-09
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
High
Leann Ogasawara
Natty
High
Leann Ogasawara
linux-ti-omap4 (Ubuntu)
Undecided
Unassigned
Natty
Undecided
Unassigned

Bug Description

shutdown(2) no longer shuts down the socket fully.

$ ./testcase
...
$ ./testcase
bind: Address already in use

This did not happen prior the Natty.

Florian Effenberger (floeff) wrote :

Seems to occur as well when "restart" is used.

==
Mar 11 08:35:19 myserver amavis[31199]: starting. /usr/sbin/amavisd-new at myserver.alkernetz amavisd-new-2.6.4 (20090625), Unicode aware
Mar 11 08:35:19 myserver amavis[31199]: Perl version 5.010001
Mar 11 08:35:19 myserver amavis[31203]: (!)Net::Server: 2011/03/11-08:35:19 Can't connect to TCP port 10024 on 127.0.0.1 [Address already in use]\n at line 88 in file /usr/share/perl5/Net/Server/Proto/TCP.pm
==

I can reproduce it only every once in a while with no pattern at the moment. The machine itself is nearly not using e-mail at all right now, so it cannot be a load problem.

James Page (james-page) wrote :

Hi Florian

The 'restart' and 'force-reload' commands both stop and start the amavis daemon which would explain why you have seen similar behaviour from both commands.

I suspect that this is caused by the port not being fully released before the new instance of the daemon tries to bind to the port - this condition would result in this type of message.

Symptomatically I could reproduce this issue by stopping amavis, using netcat to listen on port 10024 and then starting amavis.

Unfortunately that does not tell us what is causing this issue.

Perhaps you could capture the number of open connections to amavis prior to restarting or force reloading to see if we can spot a pattern:

   sudo netstat -a | grep 10024

It would also be helpful to know which version of Ubuntu and Amavis you are using - you can do this automatically using apport:

   apport-collect 731878

Thanks

Changed in amavisd-new (Ubuntu):
status: New → Incomplete
Florian Effenberger (floeff) wrote :

Seems it is indeed a bug, users on the mailing list could verify it. See this thread: http://lists.amavis.org/pipermail/amavis-users/2011-March/000069.html

It seems to be directly related to the number of maximum instances. netstat doesn't seem to give helpful information:

==
floeff@myserver:~$ sudo netstat -a | grep 10024
[sudo] password for floeff:
tcp 0 0 localhost:10024 *:* LISTEN
floeff@myserver:~$ sudo /etc/init.d/amavis force-reload
Stopping amavisd: amavisd-new.
Starting amavisd: amavisd-new.
floeff@myserver:~$ sudo netstat -a | grep 10024
tcp 0 0 localhost:10024 *:* LISTEN
==

Would love to use apport-collect, but I have only a console, and lynx does not seem to work - I've always get thrown back to the same page with a "Continue" button :(

Kees Cook (kees) wrote :

This appears to be a behavioral change to the shutdown(2) function. The socket gets only partially shut down. It's like "close()" was called instead of "shutdown()" which is supposed to kill the socket everywhere.

summary: - amavis force-reload crashes amavis
+ shutdown(2) behavior changed in kernel
affects: amavisd-new (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-2
status: Incomplete → Confirmed
Kees Cook (kees) wrote :

Running this testcase twice will show the problem on Natty but not earlier kernels.

tags: added: regression-release
description: updated
Scott Moser (smoser) wrote :

Its possible/likely that this is what I was running into in bug 712026

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Dave Walker (davewalker) on 2011-04-07
tags: added: server-nro
Paul Sladen (sladen) wrote :

From the concurrent thread (noting that 'haproxy' no longer works):

  http://marc.info/?l=linux-netdev&m=130176733401613&w=2 (2011-04-02, "tcp: disallow bind() to reuse addr/port regression in 2.6.38")

suggests a possible cause of:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c191a836a908d1dd6b40c503741f91b914de3348

I'll work on building a test kernel with commit c191a836a908d1dd6b40c503741f91b914de3348 reverted so we can confirm it's the culprit. Will also keep an eye on the upstream discussion.

Changed in linux (Ubuntu Natty):
assignee: Canonical Kernel Team (canonical-kernel-team) → Leann Ogasawara (leannogasawara)
status: Confirmed → Triaged

I've tested and confirmed that commit c191a836a908d1dd6b40c503741f91b914de3348 is indeed the root cause of the issue. For those interested, a test kernel with the commit reverted can be found at the following:

http://people.canonical.com/~ogasawara/lp731878/amd64/

Tim Gardner (timg-tpi) wrote :

Leann - If upstream doesn't come up with a fix prior to kernel freeze, then I suggest that we revert this patch in advance of our initial release. The deficiency that this patch addresses has existed since 2.6.34. We can always cherry-pick it back if there is a dependent stable update.

Tim, we've got the same train of thought :) I'm planning to keep an on on the upstream thread and if no solution presents itself I'll revert this patch prior to our final upload before kernel freeze.

Changed in linux (Ubuntu Natty):
status: Triaged → In Progress
Brad Figg (brad-figg) on 2011-04-07
tags: added: natty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-8.42

---------------
linux (2.6.38-8.42) natty; urgency=low

  [ David Henningsson ]

  * SAUCE: (drop after 2.6.38) ALSA: HDA: Fix dock mic for Lenovo
    X220-tablet
    - LP: #751033

  [ Gustavo F. Padovan ]

  * SAUCE: Revert "Bluetooth: Add new PID for Atheros 3011"
    - LP: #720949

  [ Herton Ronaldo Krzesinski ]

  * SAUCE: (drop after 2.6.39) v4l: make sure drivers supply a zeroed
    struct v4l2_subdev
    - LP: #745213

  [ John Johansen ]

  * AppArmor: Fix masking of capabilities in complain mode
    - LP: #748656

  [ Leann Ogasawara ]

  * [Config] Disable CONFIG_RTS_PSTOR for armel, powerpc

  [ Manoj Iyer ]

  * SAUCE: (drop after 2.6.38) add support for Lenovo tablet ID (0xE6)
    - LP: #746652

  [ Steve Langasek ]

  * [Config] Make linux-libc-dev coinstallable under multiarch
    - LP: #750585

  [ Tim Gardner ]

  * [Config] CONFIG_RTS_PSTOR=m
    - LP: #698006

  [ Upstream Kernel Changes ]

  * Revert "tcp: disallow bind() to reuse addr/port"
    - LP: #731878
  * ALSA: HDA: Add dock mic quirk for Lenovo Thinkpad X220
    - LP: #746259
  * ALSA: HDA: New AD1984A model for Dell Precision R5500
    - LP: #741516
  * Input: sparse-keymap - report scancodes with key events
  * Input: sparse-keymap - report KEY_UNKNOWN for unknown scan codes
  * KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n
    - LP: #729085
  * watchdog: sp5100_tco.c: Check if firmware has set correct value in
    tcobase.
    - LP: #740011
  * staging: add rts_pstor for Realtek PCIE cardreader
    - LP: #698006
  * staging: fix rts_pstor build errors
    - LP: #698006
  * Staging: rts_pstor: fixed some brace code styling issues
    - LP: #698006
  * staging: rts_pstor: potential NULL dereference
    - LP: #698006
  * Staging: rts_pstor: fix read past end of buffer
    - LP: #698006
  * staging: rts_pstor: delete a function
    - LP: #698006
  * staging: rts_pstor: fix sparse warning
    - LP: #698006
  * staging: rts_pstor: fix a bug that a greenhouse sd card can't be
    recognized
    - LP: #698006
  * staging: rts_pstor: optimize kmalloc to kzalloc
    - LP: #698006
  * staging: rts_pstor: MSXC card power class
    - LP: #698006
  * staging: rts_pstor: modify initial card clock
    - LP: #698006
  * staging: rts_pstor: set lun_mode in a different place
    - LP: #698006
  * x86, hibernate: Initialize mmu_cr4_features during boot
    - LP: #752870
 -- Leann Ogasawara <email address hidden> Fri, 08 Apr 2011 09:24:59 -0700

Changed in linux (Ubuntu Natty):
status: In Progress → Fix Released
Tim Gardner (timg-tpi) on 2011-04-25
Changed in linux-ti-omap4 (Ubuntu Natty):
status: New → Fix Committed
Changed in linux:
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (38.0 KiB)

This bug was fixed in the package linux-ti-omap4 - 2.6.38-1209.15

---------------
linux-ti-omap4 (2.6.38-1209.15) natty-proposed; urgency=low

  * Release tracking bug
    - LP: #837761

  [ Paolo Pisati ]

  * [Config] Turn on CONFIG_USER_NS and DEVPTS_MULTIPLE_INSTANCES.
    - LP: #787749

  [ Tim Gardner ]

  * [Config] Add enic/fnic to nic-modules udeb, CVE-2011-1020
    - LP: #801610

  [ Upstream Kernel Changes ]

  * mpt2sas: prevent heap overflows and unchecked reads
    - LP: #780546
  * agp: fix arbitrary kernel memory writes
    - LP: #775809
  * can: add missing socket check in can/raw release
    - LP: #780546
  * agp: fix OOM and buffer overflow
    - LP: #775809
  * bonding: Incorrect TX queue offset, CVE-2011-1581
    - LP: #792312
    - CVE-2011-1581
  * fs/partitions/efi.c: corrupted GUID partition tables can cause kernel
    oops
    - LP: #795418
    - CVE-2011-1577
  * can: Add missing socket check in can/bcm release.
    - LP: #796502
    - CVE-2011-1598
  * USB: ehci: remove structure packing from ehci_def
    - LP: #791552
  * taskstats: don't allow duplicate entries in listener mode,
    CVE-2011-2484
    - LP: #806390
    - CVE-2011-2484
  * ext4: init timer earlier to avoid a kernel panic in __save_error_info,
    CVE-2011-2493
    - LP: #806929
    - CVE-2011-2493
  * dccp: handle invalid feature options length, CVE-2011-1770
    - LP: #806375
    - CVE-2011-1770
  * pagemap: close races with suid execve, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * report errors in /proc/*/*map* sanely, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * close race in /proc/*/environ, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * auxv: require the target to be tracable (or yourself), CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * deal with races in /proc/*/{syscall, stack, personality}, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * rose: Add length checks to CALL_REQUEST parsing, CVE-2011-1493
    - LP: #816550
    - CVE-2011-1493
  * GFS2: make sure fallocate bytes is a multiple of blksize, CVE-2011-2689
    - LP: #819572
    - CVE-2011-2689
  * Bluetooth: l2cap and rfcomm: fix 1 byte infoleak to userspace.
    - LP: #819569
    - CVE-2011-2492
  * Add mount option to check uid of device being mounted = expect uid,
    CVE-2011-1833
    - LP: #732628
    - CVE-2011-1833
  * ipv6: make fragment identifications less predictable, CVE-2011-2699
    - LP: #827685
    - CVE-2011-2699
  * perf: Fix software event overflow, CVE-2011-2918
    - LP: #834121
    - CVE-2011-2918
  * proc: fix oops on invalid /proc/<pid>/maps access, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020

linux-ti-omap4 (2.6.38-1209.13) natty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #772381

  [ Brad Figg ]

  * Ubuntu-2.6.38-9.43

  [ Bryan Wu ]

  * merge Ubuntu-2.6.38-9.43
  * cherry-pick 6 patches from u2 of 'for-ubuntu' branch
  * [Config] Sync up configs for 2.6.38.4

  [ Herton Ronaldo Krzesinski ]

  * SAUCE: Revert "x86, hibernate: Initialize mmu_cr4_features during boot"
    - LP: #764758

  [ Leann Ogasawara ]

  * [Config] updateconfigs for 2.6.38.4

  [ Paolo Pisati ]

  * [Conf...

Changed in linux-ti-omap4 (Ubuntu Natty):
status: Fix Committed → Fix Released
Paolo Pisati (p-pisati) on 2012-01-30
Changed in linux-ti-omap4 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.