Eucalpytus Cloud (aoe kernel module) crashes VPN tunnel (Juniper ncsvc)

Bug #677343 reported by Fred Zellinger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I am filing this bug in the hope that it helps someone else with the same problems.

Long description: I am running Ubuntu 10.10 Maverick Meercat 64bit. I VPN in to work using Juniper Networks Network Connect ncsvc (like http://mad-scientist.us/juniper.html, ncsvc version 6.5-0-Build15551). I needed to install ia32-sun-java6-bin and ia32-libs to get the GUI to work, and it worked well enough. This VPN uses IPSEC and a tun0 device (may be important later).

Then I decided to investigate cloud computing and installed a bunch of eucalyptus packages. After this, my VPN would consistently go dead within a minute of connecting. I fiddled with the VPN client logging and isolated a log entry consistently near the time of death which seemed relevant:

ncsvc[p15519.t15519] adapter.warn Bad ip packet len 32 - should be 65535 (adapter.cpp:98)

Note the magic number 65535. This is an unreasonable packet length, and seems like a buffer/pointer over/under-flow issue, or perhaps a signed/unsigned issue which might arise when mixing 64bit and 32 bit libraries.

I scoured the internet for clues but did not find anything that matched well.

I totally un-installed Eucalyptus packages, rebooted and the problem went away. Then, I one-by-one installed all the package dependencies leading up to the eucalyptus-nc package, while also seeing which kernel modules were loaded, until I got this problem to repeat. I eventually isolated the problem down to the 'aoe' (ATA over Ethernet) module.

Now, if I have all the Eucalyptus packages intalled, I can use lsmod/insmod/rmmod to load/unload the 'aoe' kernel module. If I have my VPN running and pinging one of my work computers, and 'insmod' the 'aoe' module, the VPN instantly locks up with the error message above.

This seems to me to be a problem with the 'aoe' module is injecting something into the kernel networking stack which is causing some slight network packet mis-aligning which the 32-bit Juniper Network Connect VPN is very sensitive to.

Given the complexities of all the systems involved, I don't know where to go next. Perhaps a more elegant and portable bug-test can be developed. Someone very knowledgeable in linux kernel networking and apps mixing 64/32-bit libs would probably have to think about this bug.

Even if this bug never gets fixed, I at least wanted to post my observations out on the internet in hopes that other Juniper Network Connect 32-bit ncsvc users (or developers) can save themselves several days of troubleshooting and try un-loading the 'aoe' kernel module.

While researching my problems, I saw numerous other bugs related to the 'aoe' module where users described segfaults, kernel crashes and memory allocation errors when pushing large quantities of network packets. Perhaps there is a leak somewhere in the 'aoe' module code which is creating many of these problems.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: upstart 0.6.6-3
ProcVersionSignature: Ubuntu 2.6.35-22.35-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Thu Nov 18 23:42:21 2010
ExecutablePath: /sbin/init
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
ProcEnviron: PATH=(custom, no user)
SourcePackage: upstart

Revision history for this message
Fred Zellinger (fzellinger) wrote :
tsg1zzn (tsg1zzn)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Fred,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Fred Zellinger (fzellinger) wrote :

Ok, I tried a newer mainline kernel:

# Installed:
cd /home/fred/Dowloads
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.37-rc2-maverick/linux-headers-2.6.37-020637rc2-generic_2.6.37-020637rc2.201011160905_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.37-rc2-maverick/linux-headers-2.6.37-020637rc2_2.6.37-020637rc2.201011160905_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.37-rc2-maverick/linux-image-2.6.37-020637rc2-generic_2.6.37-020637rc2.201011160905_amd64.deb

sudo dpkg -i linux-headers-2.6.37-020637rc2_2.6.37-020637rc2.201011160905_all.deb
sudo dpkg -i linux-headers-2.6.37-020637rc2-generic_2.6.37-020637rc2.201011160905_amd64.deb
sudo dpkg -i linux-image-2.6.37-020637rc2-generic_2.6.37-020637rc2.201011160905_amd64.deb

#Rebooted and verified kernel:
fred@stitch:~$ uname -a
Linux stitch 2.6.37-020637rc2-generic #201011160905 SMP Tue Nov 16 09:08:47 UTC 2010 x86_64 GNU/Linux

I verified that the 'aoe' module was loaded, started up my VPN and it crashed with the same error message as originally observed.

I unloaded the 'aoe' module and started up my VPN, and it ran fine without crashing.

tags: removed: needs-upstream-testing
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.