Ubuntu

xen guest kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'

Reported by mkl on 2008-04-16
180
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Hardy
Medium
Unassigned

Bug Description

[root@pps0355 ~]# xm create -c /etc/xen/pps-x12
Using config file "/etc/xen/pps-x12".
Started domain pps-x12
[ 0.000000] Linux version 2.6.24-16-xen (buildd@yellow) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 14:35:03 UTC 2008 (Ubuntu 2.6.24-4.6-generic)
[ 0.000000] Command line: root=/dev/sda1 ro console=/dev/xvc0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
[ 0.000000] end_pfn_map = 67584
[2311650.315719] Zone PFN ranges:
[2311650.315721] DMA 0 -> 4096
[2311650.315724] DMA32 4096 -> 1048576
[2311650.315725] Normal 1048576 -> 1048576
[2311650.315727] Movable zone start PFN for each node
[2311650.315728] early_node_map[1] active PFN ranges
[2311650.315730] 0: 0 -> 67584
[2311650.326160] No mptable found.
[2311650.326947] PERCPU: Allocating 22368 bytes of per cpu data
[2311650.326972] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 66660
[2311650.326977] Kernel command line: root=/dev/sda1 ro console=/dev/xvc0
[2311650.327623] Initializing CPU#0
[2311650.327799] PID hash table entries: 2048 (order: 11, 16384 bytes)
[2311650.327835] Xen reported: 1992.446 MHz processor.
[ 0.026270] console [xvc-1] enabled
[ 0.026309] Console: colour dummy device 80x25
[ 0.026603] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.026842] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.026906] Software IO TLB disabled
[ 0.029575] Memory: 232180k/270336k available (2530k kernel code, 29616k reserved, 1329k data, 220k init)
[ 0.092349] Calibrating delay using timer specific routine.. 3988.62 BogoMIPS (lpj=7977259)
[ 0.092429] Security Framework initialized
[ 0.092441] SELinux: Disabled at boot.
[ 0.092451] AppArmor: AppArmor initialized
[ 0.092456] Failure registering capabilities with primary security module.
[ 0.092483] Mount-cache hash table entries: 256
[ 0.092653] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.092657] CPU: L2 Cache: 1024K (64 bytes/line)
[ 0.092677] SMP alternatives: switching to UP code
[ 0.093284] Freeing SMP alternatives: 23k freed
[ 0.093435] Early unpacking initramfs... done
[ 0.112929] Brought up 1 CPUs
[ 0.113630] net_namespace: 120 bytes
[ 0.113635] failed to set up cpufreq notifier
[ 0.132384] Time: 165:165:165 Date: 165/165/65
[ 0.132421] NET: Registered protocol family 16
[ 0.133845] Brought up 1 CPUs
[ 0.133863] PCI: Fatal: No config space access function found
[ 0.133866] PCI: setting up Xen PCI frontend stub
[ 0.134578] ACPI: Interpreter disabled.
[ 0.134584] Linux Plug and Play Support v0.97 (c) Adam Belay
[ 0.134622] pnp: PnP ACPI: disabled
[ 0.135046] xen_mem: Initialising balloon driver.
[ 0.136382] Setting mem allocation to 262144 kiB
[ 0.136682] PCI: System does not support PCI
[ 0.136687] PCI: System does not support PCI
[ 0.138848] NET: Registered protocol family 8
[ 0.138853] NET: Registered protocol family 20
[ 0.138936] AppArmor: AppArmor Filesystem Enabled
[ 0.139338] NET: Registered protocol family 2
[ 0.139351] Time: xen clocksource has been installed.
[ 0.170880] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.171086] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.171256] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.171426] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.171432] TCP reno registered
[ 0.182943] checking if image is initramfs... it is
[ 0.203821] Freeing initrd memory: 18972k freed
[ 0.217541] audit: initializing netlink socket (disabled)
[ 0.217565] audit(1208340133.904:1): initialized
[ 0.217759] VFS: Disk quotas dquot_6.5.1
[ 0.217786] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.217873] io scheduler noop registered
[ 0.217877] io scheduler anticipatory registered
[ 0.217879] io scheduler deadline registered

[ 0.217887] io scheduler cfq registered (default)
[ 0.218105] Xen virtual console successfully installed as xvc0
[ 0.218158] Event-channel device installed.
[ 0.226820] Successfully initialized TPM backend driver.
[ 0.237186] netfront: Initialising virtual ethernet driver.
[ 0.238033] xen-vbd: registered block device major 8
[ 0.267152] rtc: IRQ 8 is not free.
[ 0.267256] Linux agpgart interface v0.102
[ 0.267852] RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
[ 0.267934] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[ 0.268076] PNP: No PS/2 controller found. Probing ports directly.
[ 0.268920] i8042.c: No controller found.
[ 0.274923] mice: PS/2 mouse device common for all mice
[ 0.274961] cpuidle: using governor ladder
[ 0.275048] NET: Registered protocol family 1
[ 0.275120] registered taskstats version 1
[ 0.275143] Magic number: 1:252:3141
[ 0.275293] /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 0.275307] Freeing unused kernel memory: 220k freed
Loading, please wait...
Begin: Loading essential drivers... ...
[ 0.583333] thermal: Unknown symbol acpi_processor_set_thermal_limit
Done.
Begin: Running /scripts/init-premount ...
Done.
Begin: Mounting root file system... ...
Begin: Running /scripts/local-top ...
Done.
Begin: Waiting for root file system... ...
Done.
Begin: Running /scripts/local-premount ...
Done.
[ 1.119322] kjournald starting. Commit interval 5 seconds
[ 1.119355] EXT3-fs: mounted filesystem with ordered data mode.
Begin: Running /scripts/local-bottom ...
Done.
Done.
Begin: Running /scripts/init-bottom ...
Done.
 * Setting preliminary keymap... [ OK ]
 * Setting the system clock
Cannot access the Hardware Clock via any known method.
Use the --debug option to see the details of our search for an access method.
 * Unable to set System Clock to: Wed Apr 16 10:02:16 UTC 2008
 * Starting basic networking... [ OK ]
 * Starting kernel event manager... [ OK ]
 * Loading hardware drivers... [ 4.433282] ------------[ cut here ]------------
[ 4.433302] kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785!
[ 4.433307] invalid opcode: 0000 [1] SMP
[ 4.433311] CPU 0
[ 4.433314] Modules linked in: evdev ext3 jbd mbcache
[ 4.433322] Pid: 2329, comm: ifconfig Not tainted 2.6.24-16-xen #1
[ 4.433325] RIP: e030:[<ffffffff8039aa14>] [<ffffffff8039aa14>] network_alloc_rx_buffers+0x534/0x5e0
[ 4.433339] RSP: e02b:ffff88000fea1d08 EFLAGS: 00010282
[ 4.433342] RAX: ffff880010700580 RBX: ffff88000e58acc0 RCX: 0000000000000000
[ 4.433345] RDX: ffff88000fc38000 RSI: 0000000000000000 RDI: 0000000000000240
[ 4.433348] RBP: 0000000000000000 R08: 0000000000000011 R09: 0000000000000000
[ 4.433351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000fc38700
[ 4.433354] R13: 0000000000000240 R14: 0000000000000000 R15: ffff88000fc38868
[ 4.433362] FS: 00007f281b6216e0(0000) GS:ffffffff805c6000(0000) knlGS:0000000000000000
[ 4.433365] CS: e033 DS: 0000 ES: 0000
[ 4.433367] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.433371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[ 4.433374] Process ifconfig (pid: 2329, threadinfo ffff88000fea0000, task ffff880010705800)
[ 4.433377] Stack: 0000000000000000 ffff88000fc39ce0 ffff88000fc38000 0000880000000100
[ 4.433384] 00000001ffffffff ffff88000fc38740 ffff88000ee9f000 0000000180282166
[ 4.433390] 00ff880010440cc0 0000000000000000 ffff880000000000 000000000000002d
[ 4.433395] Call Trace:
[ 4.433400] [<ffffffff8039db38>] network_open+0x78/0x130
[ 4.433406] [<ffffffff803f4fe3>] dev_open+0x53/0x90
[ 4.433409] [<ffffffff803f37e2>] dev_change_flags+0x92/0x1b0
[ 4.433415] [<ffffffff80444f00>] devinet_ioctl+0x5a0/0x750
[ 4.433420] [<ffffffff803e4fdf>] sock_ioctl+0xcf/0x260
[ 4.433425] [<ffffffff802ab90f>] do_ioctl+0x2f/0xa0
[ 4.433428] [<ffffffff802abba0>] vfs_ioctl+0x220/0x2c0
[ 4.433433] [<ffffffff8029b875>] fd_install+0x25/0x60
[ 4.433437] [<ffffffff802abcd1>] sys_ioctl+0x91/0xb0
[ 4.433442] [<ffffffff8020c698>] system_call+0x68/0x6d
[ 4.433445] [<ffffffff8020c630>] system_call+0x0/0x6d
[ 4.433448]
[ 4.433449]
[ 4.433449] Code: 0f 0b eb fe 0f 0b eb fe c7 44 24 3c 00 00 00 00 e9 58 ff ff
[ 4.433464] RIP [<ffffffff8039aa14>] network_alloc_rx_buffers+0x534/0x5e0
[ 4.433469] RSP <ffff88000fea1d08>
[ 4.433481] ---[ end trace 817b38e10754b0a4 ]---
[ 4.433487] Kernel panic - not syncing: Aiee, killing interrupt handler!

[root@pps0355 ~]# uname -a
Linux pps0355.gridpp.rl.ac.uk 2.6.18-53.1.14.el5xen #1 SMP Wed Mar 5 10:26:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

host is Scientific Linux (I assume the Centos host would give the same bug)

[root@pps0355 ~]# cat /etc/redhat-release
Scientific Linux SL release 5.1 (Boron)
[root@pps0355 ~]# rpm -qa | grep kernel
kernel-xen-2.6.18-53.1.4.el5
kernel-xen-2.6.18-53.1.14.el5
[root@pps0355 ~]# rpm -qa | grep linux
util-linux-2.13-0.45.el5_1.1
libselinux-1.33.4-4.el5
[root@pps0355 ~]# rpm -qa | grep xen
kernel-xen-2.6.18-53.1.4.el5
kernel-xen-2.6.18-53.1.14.el5
xen-libs-3.0.3-41.el5
xen-3.0.3-41.el5

[root@pps0355 ~]# cat /etc/xen/pps-x12 | grep -v ^# | grep -v ^$
ramdisk = '/var/lib/xen-strap/pps-x12/initrd.img-2.6.24-16-xen'
kernel = '/var/lib/xen-strap/pps-x12/vmlinuz-2.6.24-16-xen'
memory = '256'
root = '/dev/sda1 ro console=/dev/xvc0 '
disk = [ 'phy:data/ub7t1,/dev/sda1,w' ,'phy:data/ub7t1-SWAP,/dev/sda2,w' ]
name = 'pps-x12'
vif = [ 'bridge=xenbr1' ]

[root@pps0355 ~]# xm info
host : pps0355.gridpp.rl.ac.uk
release : 2.6.18-53.1.14.el5xen
version : #1 SMP Wed Mar 5 10:26:35 EST 2008
machine : x86_64
nr_cpus : 2
nr_nodes : 1
sockets_per_node : 2
cores_per_socket : 1
threads_per_core : 1
cpu_mhz : 1992
hw_caps : 078bfbff:e1d3fbff:00000000:00000010
total_memory : 2047
free_memory : 1493
xen_major : 3
xen_minor : 1
xen_extra : .0-53.1.14.el5
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : unavailable
cc_compiler : gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
cc_compile_by : brewbuilder
cc_compile_domain : (none)
cc_compile_date : Wed Mar 5 10:05:28 EST 2008
xend_config_format : 2

HIRANO Takahito (hiranotaka) wrote :

This is a ciritical problem for Xen users.
The attachment is the patch for hardy's broken Xen netfront driver.
This fixes the duplicated memory allocation on the older Xen hypervisors,
and enables NAPI for correct message receiving.
Put this under debian/binary-custom.d/xen/patchset in the source tree.

HIRANO Takahito (hiranotaka) wrote :

Sorry, the previous patch sucks.
Please use this one.

Download full text (12.4 KiB)

Unfortunately I don't have the opportunity to test the fixed packages
in the same scenario , because I removed my SL5.1 host and installed
ubuntu as the host OS as well.

2008/4/18, HIRANO Takahito <email address hidden>:
> Sorry, again.
> Please use THIS one.
>
> The compiled package is at:
> http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/linux-image-2.6.24-16-xen_2.6.24-16.30zng1_amd64.deb
> http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/linux-headers-2.6.24-16-xen_2.6.24-16.30zng1_amd64.deb
>
>
> ** Attachment added: "patch for the netfront driver"
>
> http://launchpadlibrarian.net/13564638/004-xen-netfront-fix.patch
>
>
> --
> xen guest kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'
> https://bugs.launchpad.net/bugs/218126
> You received this bug notification because you are a direct subscriber
> of the bug.
>
>
> Status in Source Package "linux" in Ubuntu: New
>
>
> Bug description:
>
> [root@pps0355 ~]# xm create -c /etc/xen/pps-x12
> Using config file "/etc/xen/pps-x12".
> Started domain pps-x12
> [ 0.000000] Linux version 2.6.24-16-xen (buildd@yellow) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 14:35:03 UTC 2008 (Ubuntu 2.6.24-4.6-generic)
> [ 0.000000] Command line: root=/dev/sda1 ro console=/dev/xvc0
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
> [ 0.000000] end_pfn_map = 67584
> [2311650.315719] Zone PFN ranges:
> [2311650.315721] DMA 0 -> 4096
> [2311650.315724] DMA32 4096 -> 1048576
> [2311650.315725] Normal 1048576 -> 1048576
> [2311650.315727] Movable zone start PFN for each node
> [2311650.315728] early_node_map[1] active PFN ranges
> [2311650.315730] 0: 0 -> 67584
> [2311650.326160] No mptable found.
> [2311650.326947] PERCPU: Allocating 22368 bytes of per cpu data
> [2311650.326972] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 66660
> [2311650.326977] Kernel command line: root=/dev/sda1 ro console=/dev/xvc0
> [2311650.327623] Initializing CPU#0
> [2311650.327799] PID hash table entries: 2048 (order: 11, 16384 bytes)
> [2311650.327835] Xen reported: 1992.446 MHz processor.
> [ 0.026270] console [xvc-1] enabled
> [ 0.026309] Console: colour dummy device 80x25
> [ 0.026603] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
> [ 0.026842] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
> [ 0.026906] Software IO TLB disabled
> [ 0.029575] Memory: 232180k/270336k available (2530k kernel code, 29616k reserved, 1329k data, 220k init)
> [ 0.092349] Calibrating delay using timer specific routine.. 3988.62 BogoMIPS (lpj=7977259)
> [ 0.092429] Security Framework initialized
> [ 0.092441] SELinux: Disabled at boot.
> [ 0.092451] AppArmor: AppArmor initialized
> [ 0.092456] Failure registering capabilities with primary security module.
> [ 0.092483] Mount-cache hash table entries: 256
> [ 0.092653] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.092657] C...

mkl (klein-marian) wrote :
Download full text (12.3 KiB)

Will be the fix also in the upstream kernel www.kernel.org and in debian?

2008/4/18, HIRANO Takahito <email address hidden>:
> Sorry, again.
> Please use THIS one.
>
> The compiled package is at:
> http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/linux-image-2.6.24-16-xen_2.6.24-16.30zng1_amd64.deb
> http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/linux-headers-2.6.24-16-xen_2.6.24-16.30zng1_amd64.deb
>
>
> ** Attachment added: "patch for the netfront driver"
>
> http://launchpadlibrarian.net/13564638/004-xen-netfront-fix.patch
>
>
> --
> xen guest kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'
> https://bugs.launchpad.net/bugs/218126
> You received this bug notification because you are a direct subscriber
> of the bug.
>
>
> Status in Source Package "linux" in Ubuntu: New
>
>
> Bug description:
>
> [root@pps0355 ~]# xm create -c /etc/xen/pps-x12
> Using config file "/etc/xen/pps-x12".
> Started domain pps-x12
> [ 0.000000] Linux version 2.6.24-16-xen (buildd@yellow) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 14:35:03 UTC 2008 (Ubuntu 2.6.24-4.6-generic)
> [ 0.000000] Command line: root=/dev/sda1 ro console=/dev/xvc0
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
> [ 0.000000] end_pfn_map = 67584
> [2311650.315719] Zone PFN ranges:
> [2311650.315721] DMA 0 -> 4096
> [2311650.315724] DMA32 4096 -> 1048576
> [2311650.315725] Normal 1048576 -> 1048576
> [2311650.315727] Movable zone start PFN for each node
> [2311650.315728] early_node_map[1] active PFN ranges
> [2311650.315730] 0: 0 -> 67584
> [2311650.326160] No mptable found.
> [2311650.326947] PERCPU: Allocating 22368 bytes of per cpu data
> [2311650.326972] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 66660
> [2311650.326977] Kernel command line: root=/dev/sda1 ro console=/dev/xvc0
> [2311650.327623] Initializing CPU#0
> [2311650.327799] PID hash table entries: 2048 (order: 11, 16384 bytes)
> [2311650.327835] Xen reported: 1992.446 MHz processor.
> [ 0.026270] console [xvc-1] enabled
> [ 0.026309] Console: colour dummy device 80x25
> [ 0.026603] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
> [ 0.026842] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
> [ 0.026906] Software IO TLB disabled
> [ 0.029575] Memory: 232180k/270336k available (2530k kernel code, 29616k reserved, 1329k data, 220k init)
> [ 0.092349] Calibrating delay using timer specific routine.. 3988.62 BogoMIPS (lpj=7977259)
> [ 0.092429] Security Framework initialized
> [ 0.092441] SELinux: Disabled at boot.
> [ 0.092451] AppArmor: AppArmor initialized
> [ 0.092456] Failure registering capabilities with primary security module.
> [ 0.092483] Mount-cache hash table entries: 256
> [ 0.092653] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.092657] CPU: L2 Cache: 1024K (64 bytes/line)
> [ 0.092677] SMP alternatives: switching to UP code
> ...

HIRANO Takahito (hiranotaka) wrote :

As long as I read the source code, The netfront driver available from kernel.org has already fixed this problem.
However, the current source code in Ubuntu (and perhaps in Debian) differs greatly from the one in kernel.org.

I can confirm that this fix does work for me. Thanks Hirano!

Could this please be released in time for Hardy, since this makes the difference between an essentially useless DomU and a working DomU. See Bug 204010 for the types and numbers of people that have been affected.

mkl (klein-marian) wrote :

I can now confirm that your packages helped me to get the domU network working.
It is interesting to know, that testing ubuntu domU with other DISTRO (SL5.1) as host OS can reveal the problem more precisely
and fix the problem more quickly then testing with ubuntu.

Changed in linux:
assignee: nobody → hiranotaka
status: New → Fix Committed
HIRANO Takahito (hiranotaka) wrote :

The fix is not committed yet, and I have no privilege to commit it.

Changed in linux:
assignee: hiranotaka → nobody
status: Fix Committed → In Progress
Tim Gardner (timg-tpi) wrote :

Attached patch is what was committed.

Tim Gardner (timg-tpi) wrote :
Changed in linux:
assignee: nobody → timg-tpi
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: In Progress → Fix Committed
Matthew Grant (grantma) wrote :

Yep,

Been running Hardy Xen Dom0 here, solid as a rock, on top of gutsy
gibbon. I downloaded source and backported Xen. Just added patch
here, and recompiled, now DomU is running sweetly too!

I have been running Debian Etch 2.6.18 DomU (maintained source tree),
and also have a 2.6.25 paravirt Ops DomU working as well.

Here are few tips:

Hard drives are /dev/xvda1, /dev/xvda2 etc. Edit /etc/fstab
accordingly.

Console is /dev/xvc0 - Just edit /etc/inittab to run 1 getty on console,
and comment out all the rest, add console=xvc0 to 'extra' line in domain
config file in /etc/xen.

Also, install udev in your DomUs, and create a minimal /dev to go
underneath it on /dev so that you get console messages after init is
started. The console on the kernel config line will give you a console
while an initrd is loaded.

To get a linux 2.6.25 Xen kernel working, strip the vmlinux in the top
of the source tree after compile and compress:

$ strip vmlinux -o vmlinux-striped
$ gzip -c vmlinux-striped > vmlinuz

and use that as Xen kernel. xm create cannot load a staright bzImage
kernel yet.

Networking goes sweetly as I use my own network configuration package.

The Hard source is now on track!

Cheers,

Matthew Grant

PS: If you are wondering, I am a Debian Maintainer

John Leach (johnleach) wrote :

Just ran into this bug and eagerly awaiting the release of the fix :)

Rithmarin (rithmarin) wrote :

I am running Hardy RC 64bit with 2.6.24-16-xen Dom0 and DomU, can confirm this bug is still present. Thanks Hirano for your compiled packages, they fix the problem.

mkl (klein-marian) wrote :

I cannot believe it.
The ubuntu is released and the bug is still there.
This bug is critical for xen users. That is also what hirano said in his first response.
It is not even in the release notes.
Shame on Canonical.

Kosa (durchanek) wrote :

Many thanks to Hirano, I hope that new xen karnel will hit the tree soon. Now it looks like "We support KVM, so Xen is going to randomly break from now".

I am also quite outraged that Ubuntu 8.04LTS was released with such a major bug. This basically disqualifies 8.04LTS for a very large number of server deployments and puts Ubuntu's reuputation as a well tested, stable server distribution on stake.

Can someone from Canonical (or whoever has some insight into the matter) please at least comment on if and when we can see a fix for this issue?

Kosa (durchanek) wrote :

Ooops, actually this should be released much sooner than in 8.04.1.

Tim Gardner (timg-tpi) on 2008-04-30
description: updated
HIRANO Takahito (hiranotaka) wrote :
Changed in linux:
status: Fix Committed → In Progress
Alessandro Gervaso (gervystar) wrote :

Hirano, since you've already built the packages could you create a PPA and host them from there in order to have a temporary fix until there is a new official release of the packages?

Thanks,
Alessandro

Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: Xen network traffic between Dom0 and another domain causes kernel oops.

Fix Description: Use NAPI enable/disable when virtual network interface is enabled/disabled.

Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=294d61f7b574d2909aec6017ab2333b26d6314c5

TEST CASE: See first bug description entry, e.g., xm create -c /etc/xen/pps-x12

HIRANO Takahito (hiranotaka) wrote :

Tim, the current fix in the hardy git tree by Chuck is not complete.
This still causes the kernel BUG() in some environments.

HIRANO Takahito (hiranotaka) wrote :

The patch against the current hardy tree.

Tim Gardner (timg-tpi) wrote :

Drat. I'll pick this up on the next upload cycle, probably next week.

Kayvan Sylvan (kayvan) wrote :

Hirano,

Thank you very much for this fix.

I was running into the exact same problem trying to run a PVM Ubuntu-8.04 within a CentOS-5.1 Dom0. I was able to install the fixed package in the DomU (running in HVM mode), copy the kernel and initrd to the Dom0 host, fix the guest configuration to point to the fixed kernel, and restarted the DomU and it works great.

Thank you so much.

driver (driver-megahappy) wrote :

+1 for Hirano kernel

linux-image-2.6.24-16-xen_2.6.24-16.30zng1_amd64.deb runs great (so far).

Thanks Hirano!

Colin Watson (cjwatson) wrote :

Accepted into hardy-proposed.

Changed in linux:
assignee: nobody → timg-tpi
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: New → Fix Committed
milestone: ubuntu-8.04.1 → none
Russell Nash (russnash37) wrote :

Do we know yet if this fix is to be released in 8.04.1 or will it (hopefully!) be seen earlier?

Thanks for the fix, Hirano!

As far as I can tell from the activity log Colin scheduled the fix for for 8.04.1.

I think waiting for a fix for a such a major bug which isn't even mentioned in the release notes for another two months is too long. This prevents production environments from being upgraded or installed and gives lots of people some headache. Please consider rethinking this decision.

Thank you.

Bart Heinsius (bheinsius) wrote :

Colin Watson wrote:

   Accepted into hardy-proposed

does this mean that the fixed kernel is now in the hard-proposed repository and that I can safely upgrade Hirano's custom made kernel with the proposed one?

On Tue, 2008-05-06 at 19:34 +0000, Bart Heinsius wrote:
> Colin Watson wrote:
>
> Accepted into hardy-proposed
>
> does this mean that the fixed kernel is now in the hard-proposed
> repository and that I can safely upgrade Hirano's custom made kernel
> with the proposed one?
>

Seems like this, I haven't tried yet.
I'll upgrade one of my test servers tomorrow and I'll let you know.
--
web: http://gervystar.net/

You can be anything you want to be, just turn yourself into anything you
think that you could ever be. (Queen)

kbe (karsten-behrens) wrote :

I can confirm that the new kernel 2.6.24-17-xen from hardy-proposed fixes this problem on my AMD64 machine (hardy Dom0, hardy DomU).

Karsten

  • unnamed Edit (967 bytes, text/html; charset=ISO-8859-1)

Has anyone tried upgrading Dom0 from Gutsy to Hardy?
And DomU?
Experiences?

On Wed, May 7, 2008 at 12:17 AM, kbe <email address hidden> wrote:

> I can confirm that the new kernel 2.6.24-17-xen from hardy-proposed
> fixes this problem on my AMD64 machine (hardy Dom0, hardy DomU).
>
> Karsten
>
> --
> xen guest kernel bug: 'kernel BUG at
> /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'
> https://bugs.launchpad.net/bugs/218126
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

kibe (b-kix) wrote :
Download full text (3.7 KiB)

Hi,

the Kernel from the proposed-repo (linux-image-2.6.24-17-xen_2.6.24-17.31_i386.deb) is'nt fixing the problem for us.

We got a CentOS 5.1 (x64) with self-compiled XEN3.2 on it as dom0. When I start the domU with 32bit-hardy an the kernel (s.a.)
the following is printed to stdo:

------->
[root@xen02 mnt]# xm create crm.tarent.de -c
Using config file "/etc/xen/crm.tarent.de".
Started domain crm.tarent.de
[ 0.181865] Calibrating delay using timer specific routine.. 4657.60 BogoMIPS (lpj=9315207)
[ 0.181907] Security Framework initialized
[ 0.181915] SELinux: Disabled at boot.
[ 0.181922] AppArmor: AppArmor initialized
[ 0.181926] Failure registering capabilities with primary security module.
[ 0.181940] Mount-cache hash table entries: 512
[ 0.182069] CPU: L1 I cache: 32K, L1 D cache: 32K
[ 0.182075] CPU: L2 cache: 4096K
[ 0.182087] Compat vDSO mapped to f57fe000.
[ 0.182095] Checking 'hlt' instruction... OK.
[ 0.182530] SMP alternatives: switching to UP code
[ 0.183217] Freeing SMP alternatives: 11k freed
[ 0.183297] Early unpacking initramfs... done
[ 0.188102] Brought up 1 CPUs
[ 0.188759] net_namespace: 64 bytes
[ 0.188765] failed to set up cpufreq notifier
[ 0.205887] Time: 165:165:165 Date: 165/165/65
[ 0.205916] NET: Registered protocol family 16
[ 0.207224] Brought up 1 CPUs
[ 0.207241] PCI: Fatal: No config space access function found
[ 0.207244] PCI: setting up Xen PCI frontend stub
[ 0.207758] ACPI: Interpreter disabled.
[ 0.207764] Linux Plug and Play Support v0.97 (c) Adam Belay
[ 0.207791] pnp: PnP ACPI: disabled
[ 0.208037] xen_mem: Initialising balloon driver.
[ 0.209708] Setting mem allocation to 262144 kiB
[ 0.209964] PCI: System does not support PCI
[ 0.209967] PCI: System does not support PCI
[ 0.219894] NET: Registered protocol family 8
[ 0.219899] NET: Registered protocol family 20
[ 0.219997] AppArmor: AppArmor Filesystem Enabled
[ 0.220349] NET: Registered protocol family 2
[ 0.223832] Time: xen clocksource has been installed.
[ 0.247863] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 0.248046] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.248099] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.248150] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.248153] TCP reno registered
[ 0.259932] checking if image is initramfs... it is
[ 0.267054] Freeing initrd memory: 5348k freed
[ 0.267463] audit: initializing netlink socket (disabled)
[ 0.267475] audit(1210161485.575:1): initialized
[ 0.267591] VFS: Disk quotas dquot_6.5.1
[ 0.267608] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[ 0.267679] io scheduler noop registered
[ 0.267681] io scheduler anticipatory registered
[ 0.267683] io scheduler deadline registered
[ 0.267690] io scheduler cfq registered (default)
[ 0.267917] Xen virtual console successfully installed as xvc0
[ 0.267949] Event-channel device installed.
[ 0.275120] Successfully initialized TPM backend driver.
[ 0.277397] netfront: Initialising virtual ethernet driver...

Read more...

Russell Nash (russnash37) wrote :

I too can confirm that the hardy-proposed kernel for 32bit systems isn't fixing the issue here also. I can see RX/TX packets on the vif1.0 interface in Dom0, however, still no reply to pings / traffic from inside or outside DomU.

janevert (j-e-van-grootheest) wrote :

For me the image from proposed (17.31) works for and amd64 domU. I can ssh into the domU.

(note that this debian etch/testing domU previously worked ok using 2.6.18 and I only replaced the kernel)

janevert (j-e-van-grootheest) wrote :

kibe,

it would be really useful to have the full trace. I suspect that it is present in 'dmesg'.
Because it is now cut-off by the prompt.

Thanks.

janevert (j-e-van-grootheest) wrote :

Russell,

what kernel are you currently using? I.e. does that domU work with a different kernel?

Thanks.

HIRANO Takahito (hiranotaka) wrote :

Unfortunately, the fix on 17.31 was incomplete.
Try my kernel at http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/ , or wait for 17.32.

kibe (b-kix) wrote :
Download full text (8.2 KiB)

janevert,

sorry that was everything i got from xen.. the machine wasn't started at all and the bootprocess was cut off as well at that point.
I changed back a test-setting in the hwclock-scripts in the VM-root, so now there is a little more:

now xen states that the machine is already present, but it isn't visible in 'xm top'

------->
[root@xen02 mnt]# xm create crm.tarent.de -c
Using config file "/etc/xen/crm.tarent.de".
Error: Domain 'crm.tarent.de' already exists with ID '895'
[ 0.000000] Linux version 2.6.24-17-xen (buildd@palmer) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 1 16:58:53 UTC 2008 (Ubuntu 2.6.24-4.6-generic)
[ 0.000000] Reserving virtual address space above 0xf5800000
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
[ 0.000000] 0MB HIGHMEM available.
[ 0.000000] 264MB LOWMEM available.
[91148.048076] Zone PFN ranges:
[91148.048078] DMA 0 -> 4096
[91148.048080] Normal 4096 -> 67584
[91148.048081] HighMem 67584 -> 67584
[91148.048082] Movable zone start PFN for each node
[91148.048084] early_node_map[1] active PFN ranges
[91148.048086] 0: 0 -> 67584
[91148.051668] ACPI in unprivileged domain disabled
[91148.051673] Allocating PCI resources starting at 20000000 (gap: 10800000:ef800000)
[91148.051714] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 67056
[91148.051717] Kernel command line: root=/dev/xvda1 ro 2 console=xvc0
[91148.051872] Enabling fast FPU save and restore... done.
[91148.051878] Enabling unmasked SIMD FPU exception support... done.
[91148.051880] Initializing CPU#0
[91148.052014] PID hash table entries: 2048 (order: 11, 8192 bytes)
[91148.052045] Xen reported: 2327.524 MHz processor.
[ 0.115708] console [xvc0] enabled
[ 0.115760] Console: colour dummy device 80x25
[ 0.115866] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.115975] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.116014] Software IO TLB disabled
[ 0.116017] vmalloc area: d1000000-f53fe000, maxmem 2d800000
[ 0.120076] Memory: 249288k/270336k available (2224k kernel code, 12848k reserved, 1010k data, 212k init, 0k highmem)
[ 0.120085] virtual kernel memory layout:
[ 0.120085] fixmap : 0xf568d000 - 0xf57ff000 (1480 kB)
[ 0.120086] pkmap : 0xf5400000 - 0xf5600000 (2048 kB)
[ 0.120087] vmalloc : 0xd1000000 - 0xf53fe000 ( 579 MB)
[ 0.120088] lowmem : 0xc0000000 - 0xd0800000 ( 264 MB)
[ 0.120088] .init : 0xc042f000 - 0xc0464000 ( 212 kB)
[ 0.120089] .data : 0xc032c3c1 - 0xc0428d04 (1010 kB)
[ 0.120090] .text : 0xc0100000 - 0xc032c3c1 (2224 kB)
[ 0.120099] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 0.187637] Calibrating delay using timer specific routine.. 4657.59 BogoMIPS (lpj=9315188)
[ 0.187680] Security Framework initialized
[ 0.187688] SELinux: Disabled at boot.
[ 0.187694] AppArmor: AppArmor initialized
[ 0.187698] Failure registering capabilities with primary security module.
[ 0.187713] Mount-ca...

Read more...

Russell Nash (russnash37) wrote :

Thanks for the responses, Hirano & janevirt.

I tried Hirano's 32bit kernel and had no luck, now I'm on the hardy-proposed kernel and still have no luck.

To answer your question, janevirt, I'm using 2.6.24-17-xen. I'm installing via xen-tools, which seems (to my surprise) to be booting using dom0's kernel. I'm going to investigate setting up a domU which boots off of it's own older kernel, probably the gutsy one.

Russ.

kibe (b-kix) wrote :

rehi,

the "domain already present"-thing was a configuration fault of mine... I had a on_crash=reboot in the configuration of that domain.
so the server tried to start it repeatedly...

@russel:
I already tried a 32bit-hardy-system with gutsy-xen-kernel as domU with no luck.
the bootprocess stucks at "setting system clock". the mentioned workaround for that isn't working for me either

ps: why isn't a domain that shows up as "paused" in 'xm list' not visible in 'xm top'?

HIRANO Takahito (hiranotaka) wrote :

> I tried Hirano's 32bit kernel and had no luck, now I'm on the hardy-proposed kernel and still have no luck.

Can you tell me the log message of my kernel?

Russell Nash (russnash37) wrote :

Ok,

The good news here is that I'm an idiot!!! :)

I'm very, very new to xen and had accepted pretty much the defaults for the xend configuration.

I was setting up my DomU to use routed traffic, however, my xend config was configured to use bridging.

I switched from:
(network-script network-bridge)
(vif-script vif-bridge)

To:
(network-script network-route)
(vif-script vif-route)

and traffic is now flowing as it should.

Sorry for the false alarm! ...and thank you for all the responses and help.

Russ.

Steve Langasek (vorlon) wrote :

So is the conclusion here that the 32-bit 2.6.24-17.31 does work, or is there still a required fix missing?

HIRANO Takahito (hiranotaka) wrote :

> So is the conclusion here that the 32-bit 2.6.24-17.31 does work, or is there still a required fix missing?

If you are using Hardy as Dom0, I think you don't have any problems with 2.6.24-17.31.
If you are using older Xen hypervisors, you have to wait for a new release.

kibe (b-kix) wrote :

Steve,

yes there is still a Problem with 32-bit 2.6.24-17.31. See my log (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/218126/comments/40)...
This is with XEN 3.2 selfcompiled on 64bit CentOS 5.1

Benjamin

Hello,

I posted allready in a duplicate bug, haven`t seen, that this is the main thread.

I was fighting with the same error described here. I installed Hiranos kernel (32-bit version) on my AMD Athlon X2 5600+ server.
With that kernel I am now able to get my network up and running, however the network connectivity is still broken for large data transfers.

I use right now Hardy Heron for Dom0 and my DomU Clients, all with the patched Kernel.

My problem is:

Every outgoing tcp-connection (from a domU) stalls and finally hangs, as soon as there is more data than a couple of bytes going out.

Doing a "ls -laR /" in an ssh-session is allready enough. As soon as I transfer in a second session a large file from my DomU-Guest with FTP to another physical server in the internet, IP connectivity to the DomU is not longer possible, till I kill both processes and wait some time.

It is no problem transfering large files etc. TO my domU server, the problem only occurs when sending data out.

I am running the routed network with public IP addresses on all interfaces. I was now able to trace the issue a little further:

running an scp from a virtual domU host to another physical server in the same datacenter i had four tcpdumps sniffing.
I stopped the scp when the connection was stalling.

domU eth0: reports 1627 packets (all packets to dest-ip)
dom0 vif1.0: reports 1627 packets (all packets to dest-ip)
dom0 eth0: reports 1295 packets (all packets to dest-ip)
destination: reports 1295 packtes (all packets from domU-ip)

So I realized, that the packet-loss happens in the dom0 and not longer in the eth0-vifX.X connection.

However communication with the dom0 itself is no problem, neither in or out. Comparing the dumps from both interfaces of dom0 I see, that just every 3rd to 6th packet is missing on eth0 and will be finally resend if not confirmed with an ACK.

I have no firewalling (empty iptables) in dom0

Any ideas??

Just repeating: There is no problem in the other direction: From outside TO domU....

Regards

  Wolfgang

  Wolfgang

John Leach (johnleach) wrote :

Still crashing with linux-image-2.6.24-17-xen 2.6.24-17.31 on i386

[ 0.468544] kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:855!
[ 0.468551] invalid opcode: 0000 [#1] SMP

(full domU dmesg output attached)

on dom0 (centos 5.1):

[root@lion xen]# xm info
release : 2.6.18-53.el5xen
version : #1 SMP Mon Nov 12 03:26:12 EST 2007
machine : i686
nr_cpus : 2
nr_nodes : 1
sockets_per_node : 1
cores_per_socket : 2
threads_per_core : 1
cpu_mhz : 2672
hw_caps : bfebfbff:20100000:00000000:00000180:0000651d:00000000:00000001
total_memory : 1007
free_memory : 257
xen_major : 3
xen_minor : 1
xen_extra : .0-53.el5
xen_caps : xen-3.0-x86_32p
xen_pagesize : 4096
platform_params : virt_start=0xf5800000
xen_changeset : unavailable
cc_compiler : gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
cc_compile_by : mockbuild
cc_compile_domain :
cc_compile_date : Mon Nov 12 02:16:12 EST 2007
xend_config_format : 2

John Leach (johnleach) wrote :

The kernel linux-image-2.6.24-16-xen_2.6.24-16.30zng1_i386.deb made by HIRANO Takahito boots successfully and networking seems to work ok. This is with a Centos 5.1 dom0 on the same i386 box as was used in my last report (for 2.6.24-17.31)

http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/

So summary: proposed Ubuntu update 2.6.24-17.31 kernel does NOT work, 2.6.24-16.30zng1 does work.

Roger Nesbitt (mogest) wrote :

I am experiencing the same problem as Wolfgang two posts above. Unsure if it is related to bug 218126.

Running stock feisty dom0, upgraded domU to hardy and now having this problem. Tried Hirano Takahito's kernel on the domU yet the problem subsists.

The server is quite heavily loaded as a web and mail server, but operates fine until one of the 12-hourly rsyncs occur to back up the database. At the backup times, there is about a 50% chance that the network connection on the domU will become inactive during the transfer. The only way I have found to restart the network connection is destroy and recreate the domU.

Through the domU console, tcpdump shows arp packets being received by the domU but nothing sent out. ifconfig shows number of packets received increasing slowly (with the arp requests) and number of packets sent not increasing.

This domU has run fine on a feisty for a considerable time, and only started to exhibit this problem after upgrading to hardy.

chris lea (chris-lea) wrote :

I'm seeing another effect. Not sure if this is the correct bug for it or not, and my apologies if it's not.

I'm running Xen on 32 bit hardy, on an HP Proliant DL 140 machine. Using Hirano's kernel or the 2.6.24-17-xen from hardy-upcoming, networking "works". But only sort of. If I try and transfer a file of any appreciable size (say 10M) between any of the domU using any protocol, the network stalls out after about 2M go through. If it's only a very small file (say 100k) it will make it.

All the domU are in nat mode and are in the 192.168.1.0/24 IP space.

Transferring files between domU and dom0 works fine.

If I disable TCP checksumming as noted here:

http://lists.xensource.com/archives/html/xen-users/2007-09/msg00584.html

then I can transfer files around. But, the lack of checksumming makes all sorts of other things break.

I'm not sure what other information would be helpful, but I'm happy to provide it if anybody lets me know. I'm attaching my xend-config.sxp file.

janevert (j-e-van-grootheest) wrote :

Chris has a point there, about checksumming.

I have no problems with the 17.31 kernel on an amd64 domU. And dom0 is running the hardy kernel (-16.xx), also amd64. I'm using debians hypervisor (3.2.0-3~bpo4+2).
And I DO have checksumming offload off (as described using /etc/network/interfaces). It was necessary with previous kernels and I didn't even think about it.

There is a difference between checksum offload on and off. With offload OFF I have a througput (domU->dom0) 18M/s; with offload on it is only 12 (using an 36M file).
From domU to domU (same host) it doesn't matter whether offload is on or off.

But still no BUG.

chris lea (chris-lea) wrote :

Yeah, unfortunately for me, I'm planning on just wiping the box and trying to redo things using KVM at this point. Which I don't like for performance reasons. But the reason I found this issue is that I was trying to set up a mini-cluster for mogilefs to do development against. And if I can't store files of any real size (which I can't like this) then this setup is useless to me. If I turn off the checksumming, then I can move files around, but mogilefs stops working reliably in that case, so I'm pretty much up a creek. :(

mattsteven (matthew-matts) wrote :

HIRANO Takahito's x86 kernel worked for me and was extremely low hassle. dpkg -i kernel and reboot, magically works. Hope this can be figured out soon though.

chris lea (chris-lea) wrote :

D'oh! I should have read more closely.

What I'm seeing is exactly what Wolfgang notes in comment 48, and what Roger notes in comment 51.

Not that it makes it any more usable. But I'm a "me too" for this problem.

janevert (j-e-van-grootheest) wrote :

Chris, Wolfgang, Roger,

I'm not entirely sure about this, but it might be necessary to turn off checksum offload in all doms on the same machine. If I remember correctly, it is required to turn off the offload in dom0 as well. I know I have it off in all domUs and dom0.

chris lea (chris-lea) wrote :

@janevert

For the record, yes, I had to turn off checksumming on all the domU to get them to be able to transfer files reliably between themselves. I did not have to turn off checksumming on dom0 for this.

But, having said this, it doesn't really help me. With checksumming off, other things stop working as they should (MogileFS in my personal case) so I'm still not left with a viable resolution.

But, thanks much for trying to help out. :)

@janevert:

You are right, checksumming was turned on by default and my ethtool was broken, so turning of tx-offloading when bringing up the interface hasn`t worked.

So I can confirm too, that this problem was not related to the kernel itself.

Regards

  Wolfgang

I have similar issues with a Centos 5.1 Dom0 and a Hardy DomU.

I am fine if I use the gutsy Xen kernel (2.6.22-14.46), but get the same kernel panic with the vanilla Hardy Xen kernel.

I've just tried the kernel from proposed and whilst the VM boots a lot quicker, I still get the following error

[ 0.420184] io scheduler cfq registered (default)
[ 0.423651] Xen virtual console successfully installed as tty1
[ 0.423856] Event-channel device installed.
[ 0.443551] Successfully initialized TPM backend driver.
[ 0.459780] netfront: Initialising virtual ethernet driver.
[ 0.461803] xen-vbd: registered block device major 8
[ 0.516216] ------------[ cut here ]------------
[ 0.516241] kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:855!
[ 0.516269] invalid opcode: 0000 [#1] SMP
[ 0.516304] Modules linked in:
[ 0.516323]
[ 0.516337] Pid: 18, comm: xenwatch Not tainted (2.6.24-17-xen #1)
[ 0.516350] EIP: 0061:[<c0269cc3>] EFLAGS: 00010002 CPU: 0
[ 0.516370] EIP is at network_alloc_rx_buffers+0x503/0x510
[ 0.516382] EAX: 00000040 EBX: c90219b4 ECX: 00000041 EDX: 0000003f
[ 0.516393] ESI: 00007644 EDI: c9020480 EBP: 0000023f ESP: c7ed1edc
[ 0.516407] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: e021
[ 0.516420] Process xenwatch (pid: 18, ti=c7ed0000 task=c7ecabd0 task.ti=c7ed0000)
[ 0.516431] Stack: ffffffff c902119c c9020000 000007e0 00000100 003f4500 00000040 00000000

The system is an Athlon XP 2000+ running the vanilla Centos 5.1 Xen install.

janevert (j-e-van-grootheest) wrote :

@Tim, Hirano,

I believe that Hiranos last comment (comment 19) is that 17.31 only contains a partial fix.
Any idea when 17.32 with the correct fix is coming?

(BTW, Hirano is right about the line with BUG being incorrect. Original has [i] and fix has [j]. This BUG is producing the traces at netfront.c:855)

Roger Nesbitt (mogest) wrote :

I tried turning off checksumming, first just in domU, then in both dom0 and domU, but the network still stopped.

However I have figured out what the problem is. The domU's network is becoming unresponsive immediately after 4GB of data has been transmitted by the eth0 device, as reported by ifconfig, and the tx counter wraps around to 0. This happens about once every 24 hours on my machine, and most often happens when sending a big file - however it's not the throughput that is the problem.

Should I be opening a new report for this bug, or it related to the above?

Thank you, Ubuntu developers!

At THURSDAY, I've upgraded to Hardy. And Xen networking between domU and dom0 stopped working immediately. I've tried everything until today evening, when I've found this bug description. After I installed Linux 2.6.24-17-xen #1 SMP x86_64 GNU/Linux from hardy-proposed, networking between domU and dom0 works like before.

This bug is known TWO MONTHS (see #204010) and there were no release notes about it. It is not even considered as serious bug! Lots of people has many XEN based servers and this bug can stop all of them. And it's still not resolved, still not in the main tree. And this is LTS. I am really shocked, I have to examine my confidence in Ubuntu..

I will tomorrow test the problem with large files some are talking about and let you know. Thank you again for the solution, I hope it will be soon distributed as normal update.

mkl (klein-marian) wrote :

Jiři nemas dovod takto skaredo penit a utri si penu z ust. :)
Nasraty je kazdy. Aj ja, co som to reportoval, pozri vyssie.

I understand now the policy of Canonical is against the best interests of community.
The kernel freeze should not be applicable to such grevious bugs.

I suspect the Canonical plays a dishonest game with the users.
They intentionally don't fix some grevious bugs for server use so
the companies have to buy the support from them.

Shame on Canonical

HIRANO Takahito (hiranotaka) wrote :

My patch seems to be rejected for 17.32.
I'm asking the reason of that...

HIRANO Takahito (hiranotaka) wrote :

Not 17.32, but it will be 18.32. Sorry.

Just a 'me too'. Trying to boot the kernel from proposed updates works for a few seconds and results in an invalid opcode (0000). Dom0 is Debian Etch with Xen 3.1, domU is Hardy.

HIRANO Takahito (hiranotaka) wrote :
Changed in linux:
status: Fix Committed → In Progress
Wido den Hollander (wido) wrote :

I am running Hardy dom0 and domU's on i386 with the proposed kernel without any troubles at all.

This was just for the record.

Bruce McIntyre (bruskiza) wrote :
Download full text (11.7 KiB)

Hi Wido,

Good news.
Are you using LVM or loop-back devices for your DomU's?

B

On 20 May 2008, at 5:55 PM, Wido den Hollander wrote:

I am running Hardy dom0 and domU's on i386 with the proposed kernel
without any troubles at all.

This was just for the record.

--
xen guest kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/
debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'
https://bugs.launchpad.net/bugs/218126
You received this bug notification because you are a direct subscriber
of a duplicate bug.

Status in Source Package "linux" in Ubuntu: In Progress
Status in linux in Ubuntu Hardy: In Progress

Bug description:
[root@pps0355 ~]# xm create -c /etc/xen/pps-x12
Using config file "/etc/xen/pps-x12".
Started domain pps-x12
[ 0.000000] Linux version 2.6.24-16-xen (buildd@yellow) (gcc
version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 14:35:03 UTC
2008 (Ubuntu 2.6.24-4.6-generic)
[ 0.000000] Command line: root=/dev/sda1 ro console=/dev/xvc0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
[ 0.000000] end_pfn_map = 67584
[2311650.315719] Zone PFN ranges:
[2311650.315721] DMA 0 -> 4096
[2311650.315724] DMA32 4096 -> 1048576
[2311650.315725] Normal 1048576 -> 1048576
[2311650.315727] Movable zone start PFN for each node
[2311650.315728] early_node_map[1] active PFN ranges
[2311650.315730] 0: 0 -> 67584
[2311650.326160] No mptable found.
[2311650.326947] PERCPU: Allocating 22368 bytes of per cpu data
[2311650.326972] Built 1 zonelists in Zone order, mobility grouping
on. Total pages: 66660
[2311650.326977] Kernel command line: root=/dev/sda1 ro console=/dev/
xvc0
[2311650.327623] Initializing CPU#0
[2311650.327799] PID hash table entries: 2048 (order: 11, 16384 bytes)
[2311650.327835] Xen reported: 1992.446 MHz processor.
[ 0.026270] console [xvc-1] enabled
[ 0.026309] Console: colour dummy device 80x25
[ 0.026603] Dentry cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.026842] Inode-cache hash table entries: 32768 (order: 6, 262144
bytes)
[ 0.026906] Software IO TLB disabled
[ 0.029575] Memory: 232180k/270336k available (2530k kernel code,
29616k reserved, 1329k data, 220k init)
[ 0.092349] Calibrating delay using timer specific routine..
3988.62 BogoMIPS (lpj=7977259)
[ 0.092429] Security Framework initialized
[ 0.092441] SELinux: Disabled at boot.
[ 0.092451] AppArmor: AppArmor initialized
[ 0.092456] Failure registering capabilities with primary security
module.
[ 0.092483] Mount-cache hash table entries: 256
[ 0.092653] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
bytes/line)
[ 0.092657] CPU: L2 Cache: 1024K (64 bytes/line)
[ 0.092677] SMP alternatives: switching to UP code
[ 0.093284] Freeing SMP alternatives: 23k freed
[ 0.093435] Early unpacking initramfs... done
[ 0.112929] Brought up 1 CPUs
[ 0.113630] net_namespace: 120 bytes
[ 0.113635] failed to set up cpufreq notifier
[ 0.132384] Time: 165:165:165 Date: 165/165/65
[ 0.132421] NET: Registered protocol family 16
[ 0.1...

Wido den Hollander (wido) wrote :

I am using iSCSI as storage for my domU's.

The iSCSI Target server is also a Ubuntu Hardy machine (i386).

For the iSCSI initiator i use Open-iSCSI.

My domU's run Hardy of Debian Etch (i just found out) and are running with the new kernel for about a week now, no troubles found yet.

This is running i a test setup on my school at the moment (Dutch School for Higher education) and we have done some tests by pumping around 10GB of traffic to the guests without any issue's.

If needed i can some people access to the systems to take a look around.

whs (wolfram-heinen) wrote :

I am running Hardy dom0 and domU's on i386 with Hiranos kernel without any trouble at all for my mail-, web- and nameservers .
Hardware: HP ML110 (7 domU's), ML115 (4 domU's) and Tyan GS21 (4 domU's)
I'm using domU's in TAP:AIO uand LVM configurations without any problems.

Thank You Hirano for your kernel

Steve Langasek (vorlon) wrote :

Jiří,

It is not accurate to say that this bug is not considered serious; this bug is marked to be fixed before the first point release, and an attempt at fixing it has been included in the first stable release update of the kernel in 8.04.

However, the patch included in 2.6.24-17 is evidently incomplete, so a fixed kernel is contingent on another stable release update of the kernel (which is already in the works).

Based on comments, I'm not sure whether a sufficient patch is currently committed to the Ubuntu kernel team's git tree, but this will certainly be followed through on.

Bruce McIntyre (bruskiza) wrote :
Download full text (11.8 KiB)

Great news.

I shall give it a try.

Thanks guys, and thanks Hirano.

B
On 20 May 2008, at 10:55 PM, whs wrote:

I am running Hardy dom0 and domU's on i386 with Hiranos kernel without
any trouble at all for my mail-, web- and nameservers .
Hardware: HP ML110 (7 domU's), ML115 (4 domU's) and Tyan GS21 (4 domU's)
I'm using domU's in TAP:AIO uand LVM configurations without any
problems.

Thank You Hirano for your kernel

--
xen guest kernel bug: 'kernel BUG at /build/buildd/linux-2.6.24/
debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:785'
https://bugs.launchpad.net/bugs/218126
You received this bug notification because you are a direct subscriber
of a duplicate bug.

Status in Source Package "linux" in Ubuntu: In Progress
Status in linux in Ubuntu Hardy: In Progress

Bug description:
[root@pps0355 ~]# xm create -c /etc/xen/pps-x12
Using config file "/etc/xen/pps-x12".
Started domain pps-x12
[ 0.000000] Linux version 2.6.24-16-xen (buildd@yellow) (gcc
version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu Apr 10 14:35:03 UTC
2008 (Ubuntu 2.6.24-4.6-generic)
[ 0.000000] Command line: root=/dev/sda1 ro console=/dev/xvc0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000010800000 (usable)
[ 0.000000] end_pfn_map = 67584
[2311650.315719] Zone PFN ranges:
[2311650.315721] DMA 0 -> 4096
[2311650.315724] DMA32 4096 -> 1048576
[2311650.315725] Normal 1048576 -> 1048576
[2311650.315727] Movable zone start PFN for each node
[2311650.315728] early_node_map[1] active PFN ranges
[2311650.315730] 0: 0 -> 67584
[2311650.326160] No mptable found.
[2311650.326947] PERCPU: Allocating 22368 bytes of per cpu data
[2311650.326972] Built 1 zonelists in Zone order, mobility grouping
on. Total pages: 66660
[2311650.326977] Kernel command line: root=/dev/sda1 ro console=/dev/
xvc0
[2311650.327623] Initializing CPU#0
[2311650.327799] PID hash table entries: 2048 (order: 11, 16384 bytes)
[2311650.327835] Xen reported: 1992.446 MHz processor.
[ 0.026270] console [xvc-1] enabled
[ 0.026309] Console: colour dummy device 80x25
[ 0.026603] Dentry cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.026842] Inode-cache hash table entries: 32768 (order: 6, 262144
bytes)
[ 0.026906] Software IO TLB disabled
[ 0.029575] Memory: 232180k/270336k available (2530k kernel code,
29616k reserved, 1329k data, 220k init)
[ 0.092349] Calibrating delay using timer specific routine..
3988.62 BogoMIPS (lpj=7977259)
[ 0.092429] Security Framework initialized
[ 0.092441] SELinux: Disabled at boot.
[ 0.092451] AppArmor: AppArmor initialized
[ 0.092456] Failure registering capabilities with primary security
module.
[ 0.092483] Mount-cache hash table entries: 256
[ 0.092653] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
bytes/line)
[ 0.092657] CPU: L2 Cache: 1024K (64 bytes/line)
[ 0.092677] SMP alternatives: switching to UP code
[ 0.093284] Freeing SMP alternatives: 23k freed
[ 0.093435] Early unpacking initramfs... done
[ 0.112929] Brought up 1 CPUs
[ 0.113630] net_namespace: 120 bytes...

HIRANO Takahito (hiranotaka) wrote :
Changed in linux:
status: In Progress → Fix Committed
Martin Pitt (pitti) wrote :

linux 2.6.24-17.31 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released
Andreas Jellinghaus (tolonuga) wrote :

works for me:
xen0 amd64 hardy
xenU debian 4.0 32bit (with that kernel)
xenU hardy amd64 (with that kernel)

Castang Jerome (castang) wrote :

Hello,

Running:
dom0 -> 2.6.18-92.el5xen
domU (Ubuntu 8.04- 32 bits) -> 2.6.24-17-xen

I still have this error:
[ 0.616642] EIP: [<c0269cc3>] network_alloc_rx_buffers+0x503/0x510 SS:ESP e021:de817edc
[ 0.616650] Kernel panic - not syncing: Fatal exception in interrupt

domU is up-to-date (hardy-security).

As it been un-fixed ? :)

Thanks

Castang Jerome (castang) wrote :

Sorry,
this bug call it at line 855 in netfront.c, not at 785 (as said in title).

bstempi (brian-stempin) wrote :

I'm still showing symptoms similar to Chris Lea's.

Basically, if I do a large rsync between a domU (running gutsy) and an outside server, both my domU and dom0 network connections will fail. If I go to sit at the terminal, everything works correctly (minus networking, of course).

My Dom0 is running Hardy with a 2.6.24-28-xen kernel, using bridge networking specifically set to use only one of my NICs.

Can anyone else confirm this behavior?

Download full text (7.9 KiB)

Just tried with the updated kernel 2.6.24-18-xen and still have the same issues with a crash at netfront.c:855.

Centos 5.1 Dom0 running xen-3.0.3-41.el5_1.5

Full boot and crash output below

[ 0.000000] Linux version 2.6.24-18-xen (buildd@terranova) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 29 00:39:30 UTC 2008 (Ubuntu 2.6.24-4.6-generic)
[ 0.000000] Reserving virtual address space above 0xf5800000
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 0000000009e00000 (usable)
[ 0.000000] 0MB HIGHMEM available.
[ 0.000000] 158MB LOWMEM available.
[1226229.780011] Zone PFN ranges:
[1226229.780014] DMA 0 -> 4096
[1226229.780018] Normal 4096 -> 40448
[1226229.780020] HighMem 40448 -> 40448
[1226229.780022] Movable zone start PFN for each node
[1226229.780025] early_node_map[1] active PFN ranges
[1226229.780027] 0: 0 -> 40448
[1226229.788921] ACPI in unprivileged domain disabled
[1226229.788940] Allocating PCI resources starting at 10000000 (gap: 09e00000:f6200000)
[1226229.789057] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 40132
[1226229.789065] Kernel command line: root=/dev/sda2 ro root=UUID=87fea4c2-db1f-4743-bab0-da95e5c8e594 ro xencons=tty single
[1226229.789400] Enabling fast FPU save and restore... done.
[1226229.789409] Enabling unmasked SIMD FPU exception support... done.
[1226229.789414] Initializing CPU#0
[1226229.789643] PID hash table entries: 1024 (order: 10, 4096 bytes)
[1226229.789689] Xen reported: 1665.028 MHz processor.
[ 0.104278] console [tty0] enabled
[ 0.104498] Console: colour dummy device 80x25
[ 0.104715] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.104996] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[ 0.105077] Software IO TLB disabled
[ 0.105087] vmalloc area: ca800000-f53fe000, maxmem 2d800000
[ 0.110360] Memory: 129472k/161792k available (2225k kernel code, 23988k reserved, 1010k data, 212k init, 0k highmem)
[ 0.110400] virtual kernel memory layout:
[ 0.110401] fixmap : 0xf568d000 - 0xf57ff000 (1480 kB)
[ 0.110403] pkmap : 0xf5400000 - 0xf5600000 (2048 kB)
[ 0.110405] vmalloc : 0xca800000 - 0xf53fe000 ( 683 MB)
[ 0.110406] lowmem : 0xc0000000 - 0xc9e00000 ( 158 MB)
[ 0.110408] .init : 0xc042f000 - 0xc0464000 ( 212 kB)
[ 0.110409] .data : 0xc032c481 - 0xc0428d04 (1010 kB)
[ 0.110411] .text : 0xc0100000 - 0xc032c481 (2225 kB)
[ 0.110424] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 0.177796] Calibrating delay using timer specific routine.. 3332.30 BogoMIPS (lpj=6664617)
[ 0.177933] Security Framework initialized
[ 0.177945] SELinux: Disabled at boot.
[ 0.177958] AppArmor: AppArmor initialized
[ 0.177966] Failure registering capabilities with primary security module.
[ 0.177998] Mount-cache hash table entries: 512
[ 0.178248] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.178253] CPU: L2 Cache: 256K (64 bytes/line)
[ 0.178273] Compat vDSO mapped to f57fe000...

Read more...

I've just booted the same guest with the 2.6.24-19-xen kernel from http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/ without issue or crash.

I need to stress test the Xen guest a bit but it appears to have resolved the netfront:855 issue

Can anyone provide detail on what has been patched in this kernel compared with the Ubuntu supplied one?

Ok not looking that good with the xen kernel from http://www.il.is.s.u-tokyo.ac.jp/~hiranotaka/

[ 322.044307] logwatch[3122] general protection eip:b7ebd13c esp:bfb9a774 error:0
[ 542.537078] apache2[3167] general protection eip:b7ea08ab esp:bfeadaa4 error:0
[ 580.412190] logwatch[3175] general protection eip:b7ea813c esp:bfba6a84 error:0

There appears to be a conflict with the version of libc6-xen as if I move /lib/tls to /lib/tls/disabled.new then these apps start working correctly.

mattsteven (matthew-matts) wrote :

Hirano's kernels were good but still unstable with 3-4 oops a day on my dell sc1425. My own experience was that I had to use the stock xen kernel from xen.org (compiled most easily using the instructions from gentoo's wiki) before I saw any stability or decent performance on my hardy system. This might be a good choice for anyone else needing a stable system until this is properly tested. I'm disappointed and surprised that such an unstable kernel made it into Ubuntu.

aspasia (aspasia-sf) wrote :

Hello all,

I found this thread and attempted to follow. I am quite new to Ubuntu and XEN.

I have: dom0 - Centos 5.1
I would like to install Ubuntu hardy as a domU. The only Ubuntu version I can install properly with no problem is dapper (6.06).

What I have done so far:

1. Install dapper as DomU (using the virt-install Xen CLI tool)
2. Upgrade dapper - using Ubuntu instructions to upgrade.
3. Dapper is upgraded successfully to a kernel: 2.6.15-51-amd64-generic
4. I tried to upgrade to 8.04 - gksu "update-manager -c"
5. The long process completed successfully and I rebooted.
6. Upon reboot the new kernel was stuck - and would not boot - it seems like it is unable to load the virtual HDisks driver

7. I found this thread and downloaded the kernel and installed via: dpkg -i <file>

8. It generated the kernels:
vmlinuz-2.6.24-19-xen

and initrd image:
initrd.img-2.6.24-19-xen

9. Attempt to boot, I am unable to successfully boot - process pauses with the error:

"Error 13: ... unsupported executable format" ... (indicates the xen kernel) "

Am I missing something from the above steps?

please advise.

Best,

Aspasia.

MattW (matt-ender) wrote :

So I tried Hirano's -19~33 kernel, and I was able to boot, but crashed on update:

root@manticore:/home/matt# apt-get update
Get:1 http://archive.ubuntu.com hardy Release.gpg [191B]
Get:2 http://archive.ubuntu.com hardy Release [65.9kB]
Get:3 http://archive.ubuntu.com hardy/main Packages [1178kB]
Get:4 http://archive.ubuntu.com hardy/universe Packages [4297kB]
60% [3 Packages bzip2 6299648] [4 Packages 2110856/4297kB 49%][22563.394083] ------------[ cut here ]------------
[22563.394096] kernel BUG at /home/hiranotaka/src/debian/ubuntu-hardy/debian/build/custom-source-xen/drivers/xen/netfront/netfront.c:1460!
[22563.394100] invalid opcode: 0000 [#1] SMP
[22563.394103] Modules linked in: ipv6 evdev ext3 jbd mbcache
[22563.394110]
[22563.394113] Pid: 3065, comm: http Not tainted (2.6.24-19-xen #2)
[22563.394117] EIP: 0061:[<c026b907>] EFLAGS: 00010202 CPU: 0
[22563.394126] EIP is at netif_poll+0xd27/0xd80
[22563.394128] EAX: 00000001 EBX: cfc29a54 ECX: 00000000 EDX: fffffffd
[22563.394132] ESI: cfcd5ee0 EDI: cfc29a54 EBP: 00000000 ESP: cfcd5e20
[22563.394135] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[22563.394139] Process http (pid: 3065, ti=cfcd4000 task=d07e8910 task.ti=cfcd4000)
[22563.394141] Stack: cfcd5f64 cfcd5edc 00000000 cfc28480 cfcd5ee0 cfcd5ed0 cfcd5ec0 cfcd5ea8
[22563.394150] cfc2919c 80000000 cfc2ba10 0000f7f9 00000040 cfc28510 01000010 00000000
[22563.394160] cfc28000 000009f0 00000005 00000100 00000005 00000000 cfcd5edc c015f28e
[22563.394169] Call Trace:
[22563.394172] [<c015f28e>] generic_file_aio_write+0x6e/0xe0
[22563.394179] [<c02b6035>] net_rx_action+0x165/0x260
[22563.394185] [<c012bfe2>] __do_softirq+0x92/0x130
[22563.394191] [<c012c105>] do_softirq+0x85/0x90
[22563.394194] [<c0107110>] do_IRQ+0x40/0x70
[22563.394198] [<c0253b90>] evtchn_do_upcall+0xc0/0x1a0
[22563.394203] [<c0105a36>] hypervisor_callback+0x46/0x4e
[22563.394207] =======================
[22563.394208] Code: b8 01 00 00 00 8b 4c 24 34 86 41 fc c7 44 24 48 00 00 00 00 8b 44 24 48 81 c4 d4 00 00 00 5b 5e 5f 5d c3 85 c0 0f 84 08 f4 ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 64 a1 08 f0 45 c0 c1 e0 06
[22563.394258] EIP: [<c026b907>] netif_poll+0xd27/0xd80 SS:ESP 0069:cfcd5e20
[22563.394266] Kernel panic - not syncing: Fatal exception in interrupt

So it looks like my experience matches an earlier report that the kernel can boot, but will still crash on significant network traffic.

I've just upgraded the kernel in my Xen hardy guest to

linux-image-2.6.24-19-xen_2.6.24-19.33_i386.deb

I don't currently appear to have the network issues, although I haven't put a high network load on the image yet.

It booted cleanly with none of the usual issues.

Tim Gardner (timg-tpi) on 2008-06-26
Changed in linux:
assignee: timg-tpi → nobody
status: In Progress → Fix Released
assignee: timg-tpi → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers