net.ipv6.conf.default.use_tempaddr = 2 breaks TCP sessions ( IPv6 )

Bug #759337 reported by Michael Heimann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Mathieu Trudel-Lapierre
Oneiric
Invalid
Medium
Unassigned
Precise
Fix Released
Medium
Mathieu Trudel-Lapierre

Bug Description

Binary package hint: procps

activating IPv6 privacy extensions removes the active IPv6 after the router-adverticement lifetime. That can be 5 minutes. And the lifetime of the IPv6 is not increased when the ra is valid
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 0/1
   Subdevice #0: subdevice #0
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mheimann 2660 F.... pulseaudio
 /dev/snd/pcmC0D0p: mheimann 2660 F...m pulseaudio
CRDA: Error: [Errno 2] Datei oder Verzeichnis nicht gefunden
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6fdc000 irq 48'
   Mixer name : 'IDT 92HD71B7X'
   Components : 'HDA:111d76b2,10280233,00100302'
   Controls : 15
   Simple ctrls : 10
DistroRelease: Ubuntu 11.10
HibernationDevice: RESUME=UUID=f83786bd-1825-4529-954a-94d56fa2d27d
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
MachineType: Dell Inc. Latitude E6400
NonfreeKernelModules: nvidia
Package: linux (not installed)
ProcEnviron:
 PATH=(custom, no user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-15-generic root=UUID=dd6974e3-a4ea-411f-93ae-6ca1a8d1078c ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.0.0-15.25-generic 3.0.13
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-15-generic N/A
 linux-backports-modules-3.0.0-15-generic N/A
 linux-firmware 1.60
StagingDrivers: mei
Tags: oneiric running-unity staging
Uname: Linux 3.0.0-15-generic x86_64
UpgradeStatus: Upgraded to oneiric on 2011-09-30 (100 days ago)
UserGroups: adm admin cdrom dialout disk libvirtd lpadmin mythtv plugdev sambashare vboxusers
dmi.bios.date: 06/04/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A25
dmi.board.name: 0U695R
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA25:bd06/04/2010:svnDellInc.:pnLatitudeE6400:pvr:rvnDellInc.:rn0U695R:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude E6400
dmi.sys.vendor: Dell Inc.

Revision history for this message
Michael Heimann (michael-heimann) wrote :

This has also been commented at

http://osdir.com/ml/linux.ipv6.usagi.users/2006-09/msg00042.html

basically the stateless IP gets refreshed by the routers adverticement, but the random privacy extension IP does not but gets recalculated.

This should not happen but the IP should simply stay as it is and only change when a linkdown happens or at least no active session relies on the existance of the old IP.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Reassigning to 'linux', if TCP sessions break because the privacy IPs are not being recalculated at the same time as we get a new autoconfig IPv6; then something might be broken and need fixing in the kernel (since that's what handles the autoconfiguration)

affects: procps (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 759337

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . If possible, please test the latest v3.2-rcN kernel (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the others). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.2-rc1 fixed and issue, the tag would be: 'kernel-fixed-upstream-v3.2-rc1'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'. If you believe this bug does not require upstream testing, please add the tag: 'kernel-upstream-testing-not-needed'.

Thanks in advance.

tags: added: needs-upstream-testing
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Actually, even now that I look at this on my system, I'm not sure if this hasn't already been fixed in the kernel:

3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:470:1d:356:6da0:defb:bc6f:ad2d/64 scope global temporary dynamic
       valid_lft 604737sec preferred_lft 85737sec
    inet6 2001:470:1d:356:ae72:89ff:fe85:3338/64 scope global dynamic
       valid_lft 2591937sec preferred_lft 604737sec

valid_lft on the temp address matches the preferred_lft time on the "real" address. That already looks pretty good. (and I'd need some further tweaking to be in a position to properly test this)...

Could you please re-test this on the latest release (Oneiric) or in the development release, and let us know whether the bug can still be reproduced? It would also help if you could tell us more about how exactly you tested this, what kind of timeouts for address validity where used, etc.

Revision history for this message
Michael Heimann (michael-heimann) wrote : AcpiTables.txt

apport information

tags: added: apport-collected oneiric running-unity staging
description: updated
Revision history for this message
Michael Heimann (michael-heimann) wrote : AlsaDevices.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : BootDmesg.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : IwConfig.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : Lspci.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : Lsusb.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : PciMultimedia.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : ProcModules.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : PulseSinks.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : PulseSources.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : RfKill.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : UdevDb.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : UdevLog.txt

apport information

Revision history for this message
Michael Heimann (michael-heimann) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Michael Heimann (michael-heimann) wrote :

I've just enabled privacy extensions again and am still suffering on this bug.

It's not possible to reliably use privacy extensions on the latest and greatest Ubuntu - sad :(
I need to disable it again which tells everybody my hardware MAC :\

To reproduce:
- get a IPv6 network running (any vm environment will be OK if you have an incapable ISP)
- build a tcp session (like ssh into another machine via ipv6)
- wait longer than the valid lifetime of the temporary IPv6.

What happens:
Your (ssh-) session is dead and must be killed to close.

What should happen:
The (ssh-) session should stay alive.

Btw: Windows gets this done...

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

How are you connecting to the ssh server? Is the SSH server using privacy extensions, and if so, are you using the privext address to connect to it via SSH?

I'm not in a position to effectively test this again this week. Regardless, we'll need more information on how to reproduce the problem; even providing packet captures catching IPv6 advertisements and a session would go a long way toward helping us figure it out.

In particular, this means it would be helpful to have a snapshot of the addresses used (especially to see the lifetime values), which you can get via 'ip -6 addr'. Getting to know the lifetime of RAs on your network would also help (how often they are sent and how long before they expire).

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

For now, we've managed to ascertain that at least on Precise, temporary IP addresses hang around for the lifetime of the "static" autoconfigured one, and get their lifetime updated by RAs coming in from the router. I think this should be very much covering your use case, so I'll close this bug as Fix Released for Precise.

Would it be possible for you to test a kernel from Precise on Oneiric, or to execute the same steps you were taking in Oneiric using a Precise LiveCD?

You can get Oneiric version of the kernels used on Precise from the http://kernel.ubuntu.com/~kernel-ppa/mainline/ (Kernel mainline PPA). For example, http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc4-oneiric/ may be a good candidate to test.

Changed in linux (Ubuntu Oneiric):
status: New → Incomplete
importance: Undecided → Medium
Changed in linux (Ubuntu Precise):
status: Incomplete → Fix Released
Revision history for this message
Michael Heimann (michael-heimann) wrote :

I'm unable to boot precise on my notebook. No X coming up - neither with precise kernel in oneiric, nor with the current live CD.

But to sum it up, I can ssh via ipv4 via "ssh -4" for hours but ssh without -4 gives me stuck sessions. This is annoying. This is the current state with Oneiric, the current stable release.

The server part is not the problem - it happens when connecting to cisco routers or other linux boxes. Of course I'm not connecting to a temporary address.

Come on, this issue is easy to reproduce, kills IPv6 communication when someone enables privacy extensions.

Actually privacy extensions should be enabled by default since it's best practice for years (and other OSs do it).

Going to try out the current fedora release now...

Revision history for this message
Stéphane Graber (stgraber) wrote :

I've been running on Precise with privacy extensions turned on for a month or so now and never got any disconnection.
I'm using IPv6 to connect to most of the services I use in my everyday work, including ssh sessions to a few servers.

I have ssh connections that have been established for over a week without getting disconnected and that's when using the temporary address as the source.

As Mathieu said, both the standard global address and the temporary global address lifetime are reset every time a router advertisement is received.
In my case, I have advertisements every 15s with the valid_ttl set to 7200s and preferred_ttl set to 3600s.

 root@castiana:~# ip -6 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:470:c364:1000:8d2f:6f74:666:b97d/64 scope global temporary dynamic
       valid_lft 7164sec preferred_lft 3564sec
    inet6 2001:470:c364:1000:5eff:35ff:fe0f:d6e/64 scope global dynamic
       valid_lft 7164sec preferred_lft 3564sec
    inet6 fe80::5eff:35ff:fe0f:d6e/64 scope link
       valid_lft forever preferred_lft forever

In this example, this connection has been up for more than 3 hours now and as you can see all the lifetimes are pretty much at the maximum value as they've just been reset by a RA.

The pcap of one of these RAs can be found here: http://www.stgraber.org/download/ipv6-ra.pcap

I'd appreciate it if anyone seeing this issue could paste the same "ip -6 addr show" (altering the IPs in a consistent way if you want) and a pcap of a router advertisement on your network to confirm it's not the router's configuration that's at fault.

Thanks

dino99 (9d9)
Changed in linux (Ubuntu Oneiric):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.