XFS instability on armhf under load

Bug #1176977 reported by Ripal Nathuji on 2013-05-06
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Paolo Pisati

Bug Description

We've observed issues with XFS on ARM hardware on both 12.04 and 12.10. Details from ubuntu-bug are attached from a 12.10 installation. The specific load in this case is setting up and running a single node install of Swift and then exercising it with swift-bench for a prolonged period. After a few hours you get the following:

[361819.562519] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
[361819.566196] XFS (loop0): Mounting Filesystem
[361819.617186] XFS (loop0): Ending clean mount
[372224.630644] alloc_vmap_area: 18 callbacks suppressed
[372224.630656] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[372224.630687] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[372224.630716] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[372224.630730] XFS (loop0): xfs_buf_get: failed to map pages
[372224.630730]
[372224.630745] XFS (loop0): Internal error xfs_trans_cancel at line 1466 of file /build/buildd/linux-3.5.0/fs/xfs/xfs_trans.c. Caller 0xbf0465e0
[372224.630745]
[372224.645280] [<c00164cc>] (unwind_backtrace+0x0/0x104) from [<c04ed184>] (dump_stack+0x20/0x24)
[372224.645475] [<c04ed184>] (dump_stack+0x20/0x24) from [<bf03391c>] (xfs_error_report+0x60/0x6c [xfs])
[372224.645800] [<bf03391c>] (xfs_error_report+0x60/0x6c [xfs]) from [<bf0863f8>] (xfs_trans_cancel+0xfc/0x11c [xfs])
[372224.646117] [<bf0863f8>] (xfs_trans_cancel+0xfc/0x11c [xfs]) from [<bf0465e0>] (xfs_create+0x228/0x558 [xfs])
[372224.646387] [<bf0465e0>] (xfs_create+0x228/0x558 [xfs]) from [<bf03d7cc>] (xfs_vn_mknod+0x9c/0x180 [xfs])
[372224.646648] [<bf03d7cc>] (xfs_vn_mknod+0x9c/0x180 [xfs]) from [<bf03d8f0>] (xfs_vn_create+0x1c/0x20 [xfs])
[372224.646791] [<bf03d8f0>] (xfs_vn_create+0x1c/0x20 [xfs]) from [<c01359c0>] (vfs_create+0xb4/0x120)
[372224.646805] [<c01359c0>] (vfs_create+0xb4/0x120) from [<c0137c28>] (do_last+0x860/0x9bc)
[372224.646818] [<c0137c28>] (do_last+0x860/0x9bc) from [<c0137fc8>] (path_openat+0xcc/0x428)
[372224.646829] [<c0137fc8>] (path_openat+0xcc/0x428) from [<c0138444>] (do_filp_open+0x3c/0x90)
[372224.646844] [<c0138444>] (do_filp_open+0x3c/0x90) from [<c0128240>] (do_sys_open+0xfc/0x1d0)
[372224.646855] [<c0128240>] (do_sys_open+0xfc/0x1d0) from [<c0128340>] (sys_open+0x2c/0x30)
[372224.646868] [<c0128340>] (sys_open+0x2c/0x30) from [<c000e320>] (ret_fast_syscall+0x0/0x30)
[372224.646881] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1467 of file /build/buildd/linux-3.5.0/fs/xfs/xfs_trans.c. Return address = 0xbf08640c
[372224.647010] XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem
[372224.655289] XFS (loop0): Please umount the filesystem and rectify the problem(s)
[372227.273367] XFS (loop0): xfs_log_force: error 5 returned.

Ripal Nathuji (ripal-nathuji) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: precise quantal
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: armhf
Paolo Pisati (p-pisati) wrote :

i can easily reproduce it using a real xfs partition, after following the instruction found here to setup a single swift node:

http://docs.openstack.org/developer/swift/development_saio.html

and running swift-becn in a loop like this:

'while /bin/true; do swift-bench -x -c 20 -s 10 -n 100 -g 100 /etc/swift/swift-bench.conf; done'

flag@c13:~$ uname -a
Linux c13 3.5.0-28-highbank #48-Ubuntu SMP Wed Apr 24 03:17:37 UTC 2013 armv7l armv7l armv7l GNU/Linux
flag@c13:~$

dmesg:
...
[ 3037.399406] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[ 3037.399442] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[ 3037.399469] vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
[ 3037.399485] XFS (sda5): xfs_buf_get: failed to map pages
[ 3037.399485]
[ 3037.399501] XFS (sda5): Internal error xfs_trans_cancel at line 1466 of file /build/buildd/linux-3.5.0/fs/xfs/xfs_trans.c. Caller 0xbf0235e0
[ 3037.399501]
[ 3037.413789] [<c00164cc>] (unwind_backtrace+0x0/0x104) from [<c04ed624>] (dump_stack+0x20/0x24)
[ 3037.413985] [<c04ed624>] (dump_stack+0x20/0x24) from [<bf01091c>] (xfs_error_report+0x60/0x6c [xfs])
[ 3037.414321] [<bf01091c>] (xfs_error_report+0x60/0x6c [xfs]) from [<bf0633f8>] (xfs_trans_cancel+0xfc/0x11c [xfs])
[ 3037.414654] [<bf0633f8>] (xfs_trans_cancel+0xfc/0x11c [xfs]) from [<bf0235e0>] (xfs_create+0x228/0x558 [xfs])
[ 3037.414953] [<bf0235e0>] (xfs_create+0x228/0x558 [xfs]) from [<bf01a7cc>] (xfs_vn_mknod+0x9c/0x180 [xfs])
[ 3037.415239] [<bf01a7cc>] (xfs_vn_mknod+0x9c/0x180 [xfs]) from [<bf01a8d0>] (xfs_vn_mkdir+0x20/0x24 [xfs])
[ 3037.415393] [<bf01a8d0>] (xfs_vn_mkdir+0x20/0x24 [xfs]) from [<c0135758>] (vfs_mkdir+0xc4/0x13c)
[ 3037.415410] [<c0135758>] (vfs_mkdir+0xc4/0x13c) from [<c013884c>] (sys_mkdirat+0xdc/0xe4)
[ 3037.415422] [<c013884c>] (sys_mkdirat+0xdc/0xe4) from [<c0138878>] (sys_mkdir+0x24/0x28)
[ 3037.415437] [<c0138878>] (sys_mkdir+0x24/0x28) from [<c000e320>] (ret_fast_syscall+0x0/0x30)
[ 3037.415452] XFS (sda5): xfs_do_force_shutdown(0x8) called from line 1467 of file /build/buildd/linux-3.5.0/fs/xfs/xfs_trans.c. Return address = 0xbf06340c
[ 3037.416892] XFS (sda5): Corruption of in-memory data detected. Shutting down filesystem
[ 3037.425008] XFS (sda5): Please umount the filesystem and rectify the problem(s)
[ 3047.912480] XFS (sda5): xfs_log_force: error 5 returned.

on the other hand, i cannot reproduce it on R (3.8.0-19.30) since the swift node collapses after a bit of tests.

Paolo Pisati (p-pisati) wrote :

reproducible on 3.7 (vanilla + multiv7_defconfig) too:

XFS (sda5): Mounting Filesystem
XFS (sda5): Ending clean mount
vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
vmap allocation for size 2097152 failed: use vmalloc=<size> to increase size.
XFS (sda5): xfs_buf_get_map: failed to map pages

XFS (sda5): Internal error xfs_trans_cancel at line 1477 of file fs/xfs/xfs_trans.c. Caller 0xc016f718

[<c0013d18>] (unwind_backtrace+0x0/0xf8) from [<c01a7804>] (xfs_trans_cancel+0xe4/0x104)
[<c01a7804>] (xfs_trans_cancel+0xe4/0x104) from [<c016f718>] (xfs_create+0x1cc/0x400)
[<c016f718>] (xfs_create+0x1cc/0x400) from [<c0167bac>] (xfs_vn_mknod+0x84/0xe4)
[<c0167bac>] (xfs_vn_mknod+0x84/0xe4) from [<c00a8fc4>] (vfs_create+0x80/0xb4)
[<c00a8fc4>] (vfs_create+0x80/0xb4) from [<c00a978c>] (do_last.isra.37+0x794/0xb9c)
[<c00a978c>] (do_last.isra.37+0x794/0xb9c) from [<c00a9c3c>] (path_openat+0xa8/0x47c)
[<c00a9c3c>] (path_openat+0xa8/0x47c) from [<c00aa03c>] (do_filp_open+0x2c/0x80)
[<c00aa03c>] (do_filp_open+0x2c/0x80) from [<c009d3f4>] (do_sys_open+0xe4/0x170)
[<c009d3f4>] (do_sys_open+0xe4/0x170) from [<c000dec0>] (ret_fast_syscall+0x0/0x30)
XFS (sda5): xfs_do_force_shutdown(0x8) called from line 1478 of file fs/xfs/xfs_trans.c. Return address = 0xc01a7818
XFS (sda5): Corruption of in-memory data detected. Shutting down filesystem
XFS (sda5): Please umount the filesystem and rectify the problem(s)

flag@c13:~$ uname -a
Linux c13 3.7.0v36try4 #6 SMP Mon May 13 18:50:40 CEST 2013 armv7l armv7l armv7l GNU/Linux

Changed in linux (Ubuntu):
assignee: nobody → Paolo Pisati (p-pisati)
Paolo Pisati (p-pisati) wrote :

3.7.9 is affected, while 3.8.0 is not - i'm about to start a bisect.

Paolo Pisati (p-pisati) wrote :

Try this kernel:

http://people.canonical.com/~ppisati/linux-image-3.5.0-29-highbank_3.5.0-29.49~7c4cebe8_armhf.deb

it's a Q/generic highbank kernel + the attached patch and it fixes the fs corruption.

On the other hand, even with this fix applied, when the xfs partition reaches ~25% of its capacity, a test like this:

while /bin/true; do swift-bench -x -c 20 -s 10 -n 100 -g 100 /etc/swift/swift-bench.conf; done

collapses and swift-storage node reports 'out of space' while there's clearly still space and inodes (see /dev/sda5 below):

flag@c13:~$ uname -a
Linux c13 3.5.0-29-highbank #49~7c4cebe8 SMP Tue May 14 18:44:12 UTC 2013 armv7l armv7l armv7l GNU/Linux

flag@c13:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 225G 2.1G 212G 1% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 2.0G 4.0K 2.0G 1% /dev
tmpfs 405M 260K 404M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 2.0G 0 2.0G 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/sda1 228M 30M 186M 14% /boot
/dev/sda5 2.0G 569M 1.5G 28% /mnt/sdb1

flag@c13:~$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 14958592 74462 14884130 1% /
none 182027 1 182026 1% /sys/fs/cgroup
udev 177378 1361 176017 1% /dev
tmpfs 182027 807 181220 1% /run
none 182027 3 182024 1% /run/lock
none 182027 1 182026 1% /run/shm
none 182027 1 182026 1% /run/user
/dev/sda1 124496 35 124461 1% /boot
/dev/sda5 524288 237184 287104 46% /mnt/sdb1

flag@c13:~$ less /var/log/swift/storage1.error
...
May 14 15:33:27 c13 container-server ERROR __call__ error with PUT /sdb1/62777/AUTH_test/777cd4065dc9446da68d511a1aeb5e8a_1 : [Errno 28] No space left on device: '/srv/1/node/sdb1/containers/62777' (txn: tx276b80075bb9429d9bebb-005192916a)

we might have uncovered another issue here.

tags: added: patch
Paolo Pisati (p-pisati) wrote :

http://people.canonical.com/~ppisati/linux-image-3.5.0-31-highbank_3.5.0-31.52~549142a_armhf.deb

other than the patch attached in #6, this kernel contains two more patches that address the free space fragmentation we were seeing: this kernel completely solves the problem for me, please test it.

Ripal Nathuji (ripal-nathuji) wrote :

I tried the kernel at http://people.canonical.com/~ppisati/linux-image-3.5.0-31-highbank_3.5.0-31.52~549142a_armhf.deb , and I no longer get the kernel crash. I did see the out of space error from swift-bench, but it seems to be XFS running out of inodes:

user1@node4:/opt/stack/data/swift/drives/sdb1$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 231G 8.4G 211G 4% /
udev 2.0G 4.0K 2.0G 1% /dev
tmpfs 809M 196K 808M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 2.0G 0 2.0G 0% /run/shm
none 100M 0 100M 0% /run/user
cgroup 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/sda1 89M 15M 69M 18% /boot
/dev/loop0 972M 385M 588M 40% /opt/stack/data/swift/drives/sdb1

user1@node4:/opt/stack/data/swift/drives/sdb1$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 15376384 74156 15302228 1% /
udev 177400 1360 176040 1% /dev
tmpfs 182027 792 181235 1% /run
none 182027 2 182025 1% /run/lock
none 182027 1 182026 1% /run/shm
none 182027 1 182026 1% /run/user
cgroup 182027 9 182018 1% /sys/fs/cgroup
/dev/sda1 48192 27 48165 1% /boot
/dev/loop0 249984 249984 0 100% /opt/stack/data/swift/drives/sdb1

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers