Eucalpytus Cloud (aoe kernel module) crashes VPN tunnel (Juniper ncsvc)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I am filing this bug in the hope that it helps someone else with the same problems.
Long description: I am running Ubuntu 10.10 Maverick Meercat 64bit. I VPN in to work using Juniper Networks Network Connect ncsvc (like http://
Then I decided to investigate cloud computing and installed a bunch of eucalyptus packages. After this, my VPN would consistently go dead within a minute of connecting. I fiddled with the VPN client logging and isolated a log entry consistently near the time of death which seemed relevant:
ncsvc[p15519.
Note the magic number 65535. This is an unreasonable packet length, and seems like a buffer/pointer over/under-flow issue, or perhaps a signed/unsigned issue which might arise when mixing 64bit and 32 bit libraries.
I scoured the internet for clues but did not find anything that matched well.
I totally un-installed Eucalyptus packages, rebooted and the problem went away. Then, I one-by-one installed all the package dependencies leading up to the eucalyptus-nc package, while also seeing which kernel modules were loaded, until I got this problem to repeat. I eventually isolated the problem down to the 'aoe' (ATA over Ethernet) module.
Now, if I have all the Eucalyptus packages intalled, I can use lsmod/insmod/rmmod to load/unload the 'aoe' kernel module. If I have my VPN running and pinging one of my work computers, and 'insmod' the 'aoe' module, the VPN instantly locks up with the error message above.
This seems to me to be a problem with the 'aoe' module is injecting something into the kernel networking stack which is causing some slight network packet mis-aligning which the 32-bit Juniper Network Connect VPN is very sensitive to.
Given the complexities of all the systems involved, I don't know where to go next. Perhaps a more elegant and portable bug-test can be developed. Someone very knowledgeable in linux kernel networking and apps mixing 64/32-bit libs would probably have to think about this bug.
Even if this bug never gets fixed, I at least wanted to post my observations out on the internet in hopes that other Juniper Network Connect 32-bit ncsvc users (or developers) can save themselves several days of troubleshooting and try un-loading the 'aoe' kernel module.
While researching my problems, I saw numerous other bugs related to the 'aoe' module where users described segfaults, kernel crashes and memory allocation errors when pushing large quantities of network packets. Perhaps there is a leak somewhere in the 'aoe' module code which is creating many of these problems.
ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: upstart 0.6.6-3
ProcVersionSign
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelMo
Architecture: amd64
Date: Thu Nov 18 23:42:21 2010
ExecutablePath: /sbin/init
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
ProcEnviron: PATH=(custom, no user)
SourcePackage: upstart
affects: | ubuntu → linux (Ubuntu) |
Hi Fred,
If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https:/ /wiki.ubuntu. com/KernelMainl ineBuilds . Once you've tested the upstream kernel, please remove the 'needs- upstream- testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs- upstream- testing' text. Please let us know your results.
Thanks in advance.
[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]