zfs-fuse tools (zfs, zpool) become unstable when most RAM is in 'cached' state and free memory counter got low.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
zfs-fuse (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Under heavy file IO (moving tens of gigabytes from ext4fs to zfs in single mv operation) zfs-fuse daemon and tools become unstable (freezing when run, reporting errors, not showing all pools) and finally zfs-fuse daemon could crash. No significant errors are reported to syslog, except several warnings. No core file found.
root@nas01:~# dpkg -l | grep zfs
ii zfs-fuse 0.6.0-1 ZFS on FUSE
System version:
root@nas01:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 10.04.4 LTS
Release: 10.04
Codename: lucid
Kernel: Linux nas01 2.6.32-41-server #89-Ubuntu SMP Fri Apr 27 22:33:31 UTC 2012 x86_64 GNU/Linux
CPU: Pentium(R) Dual-Core CPU E6700 @ 3.20GHz
RAM: 16Gb
Freeze:
root@nas01:~# zfs list
^C
Unstable behavior:
root@nas01:~# zfs list
internal error: Bad address
Aborted
root@nas01:~# zfs list
internal error: Bad address
Aborted
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 732G 26K /nas
nas/Family 17,9G 732G 17,9G /nas/Family
nas/Store1 1,48T 732G 1,48T /nas/Store1
nas/Ph 65,9G 732G 65,9G /nas/Ph
nas/Private 8,31G 732G 8,31G /nas/Private
nas/Public 117G 732G 117G /nas/Public
nas/Alex 133G 732G 133G /nas/Alex
nas/old 150G 732G 150G /nas/old
nas/Store2 21K 732G 21K /nas/Store2
root@nas01:~# zfs list
internal error: Bad address
Aborted
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 732G 26K /nas
nas/Family 17,9G 732G 17,9G /nas/Family
nas/Store1 1,48T 732G 1,48T /nas/Store1
nas/Ph 65,9G 732G 65,9G /nas/Ph
nas/Private 8,31G 732G 8,31G /nas/Private
nas/Public 117G 732G 117G /nas/Public
nas/Alex 133G 732G 133G /nas/Alex
nas/old 150G 732G 150G /nas/old
nas/Store2 21K 732G 21K /nas/Store2
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 732G 26K /nas
nas/Family 17,9G 732G 17,9G /nas/Family
nas/Store1 1,48T 732G 1,48T /nas/Store1
nas/Ph 65,9G 732G 65,9G /nas/Ph
nas/Private 8,31G 732G 8,31G /nas/Private
nas/Public 117G 732G 117G /nas/Public
nas/Alex 133G 732G 133G /nas/Alex
nas/old 150G 732G 150G /nas/old
nas/Store2 21K 732G 21K /nas/Store2
root@nas01:~# zfs list
internal error: Bad address
Aborted
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 732G 26K /nas
nas/Family 17,9G 732G 17,9G /nas/Family
nas/Store1 1,48T 732G 1,48T /nas/Store1
nas/Ph 65,9G 732G 65,9G /nas/Ph
nas/Private 8,31G 732G 8,31G /nas/Private
nas/Public 117G 732G 117G /nas/Public
nas/Alex 133G 732G 133G /nas/Alex
nas/old 150G 732G 150G /nas/old
nas/Store2 21K 732G 21K /nas/Store2
root@nas01:~# zfs list
internal error: Bad address
Aborted
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 731G 26K /nas
nas/Family 17,9G 731G 17,9G /nas/Family
nas/Store1 1,48T 731G 1,48T /nas/Store1
nas/Ph 65,9G 731G 65,9G /nas/Ph
nas/Private 8,31G 731G 8,31G /nas/Private
nas/Public 117G 731G 117G /nas/Public
nas/Alex 133G 731G 133G /nas/Alex
nas/old 150G 731G 150G /nas/old
nas/Store2 21K 731G 21K /nas/Store2
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 731G 26K /nas
nas/Family 17,9G 731G 17,9G /nas/Family
nas/Store1 1,48T 731G 1,48T /nas/Store1
nas/Ph 65,9G 731G 65,9G /nas/Ph
nas/Private 8,31G 731G 8,31G /nas/Private
nas/Public 117G 731G 117G /nas/Public
nas/Alex 133G 731G 133G /nas/Alex
nas/old 150G 731G 150G /nas/old
nas/Store2 21K 731G 21K /nas/Store2
ZFS pool 'nas' disappears:
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
And appears again:
root@nas01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,96T 731G 26K /nas
nas/Family 17,9G 731G 17,9G /nas/Family
nas/Store1 1,48T 731G 1,48T /nas/Store1
nas/Ph 65,9G 731G 65,9G /nas/Ph
nas/Private 8,31G 731G 8,31G /nas/Private
nas/Public 117G 731G 117G /nas/Public
nas/Alex 133G 731G 133G /nas/Alex
nas/old 150G 731G 150G /nas/old
nas/Store2 21K 731G 21K /nas/Store2
root@
NAME USED AVAIL REFER MOUNTPOINT
backup 130K 293G 24K /backup
backup/nas 21K 293G 21K /backup/nas
backup/ws 21K 293G 21K /backup/ws
nas 1,70T 1001G 26K /nas
nas/Family 17,9G 1001G 17,9G /nas/Family
nas/Store1 1,22T 1001G 1,22T /nas/Store1
nas/Ph 65,9G 1001G 65,9G /nas/Ph
nas/Private 8,31G 1001G 8,31G /nas/Private
nas/Public 117G 1001G 117G /nas/Public
nas/Alex 133G 1001G 133G /nas/Alex
nas/old 150G 1001G 150G /nas/old
nas/Store2 21K 1001G 21K /nas/Store2
Crash:
root@
root@
connect: Connection refused
Please make sure that the zfs-fuse daemon is running.
internal error: failed to initialize ZFS library
root@
root 7320 31275 0 09:15 pts/3 00:00:00 grep zfs
root@
* Starting zfs-fuse zfs-fuse
...fail!
root@
-rw-r--r-- 1 root root 5 2012-06-04 21:20 /var/run/
Log:
/var/
/var/
/var/
/var/
Jun 5 06:28:16 nas01 kernel: [32913.939595] zfs-fuse: sending ioctl 2285 to a partition!
Jun 5 06:28:16 nas01 kernel: [32913.939601] zfs-fuse: sending ioctl 2285 to a partition!
Jun 5 06:28:16 nas01 kernel: [32913.956343] zfs-fuse: sending ioctl 2285 to a partition!
Jun 5 06:28:16 nas01 kernel: [32913.956349] zfs-fuse: sending ioctl 2285 to a partition!
May be this significant, in certain moment of files move, I tried to zfs create and got something like 'Insufficient memory' error. Couldn't reproduce either and keept no log of that event. But, host have 16Gb RAM, applications are using no more than 3-7 Gb and most of them seems to be not resident, all other memory is marked as 'cached' in top and /proc/meminfo. Usually during file move to zfs, memory listed as 90Mb free, 14Gb cached, like below.
top - 12:52:44 up 5:33, 6 users, load average: 0.52, 0.77, 0.71
Tasks: 246 total, 2 running, 243 sleeping, 0 stopped, 1 zombie
Cpu(s): 13.3%us, 13.8%sy, 0.0%ni, 49.4%id, 21.5%wa, 1.0%hi, 1.0%si, 0.0%st
Mem: 16467932k total, 16375684k used, 92248k free, 41964k buffers
Swap: 11718648k total, 0k used, 11718648k free, 15019772k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2050 root 20 0 1841m 277m 1604 S 38 1.7 28:00.34 zfs-fuse
21229 root 20 0 32032 4640 2404 R 15 0.0 5:10.48 mc
1729 root 20 0 97524 8836 4300 S 1 0.1 2:52.62 Xorg
36 root 20 0 0 0 0 S 1 0.0 1:59.62 kswapd0
1794 root 20 0 155m 19m 8116 S 1 0.1 1:32.27 lxdm-greeter-gt
23573 root 20 0 19356 1540 1048 R 1 0.0 0:02.70 top
2695 root 20 0 75296 3052 2220 S 0 0.0 0:00.73 cupsd
4534 root 20 0 282m 67m 33m S 0 0.4 0:11.53 hostd-worker
4896 root 20 0 0 0 0 S 0 0.0 1:12.08 vmware-rtc
5227 kng 20 0 292m 44m 32m S 0 0.3 0:11.20 vmware-tray
6341 root 20 0 930m 544m 505m S 0 3.4 0:57.73 vmware-vmx
7462 kng 20 0 263m 81m 47m S 0 0.5 0:20.40 vmware
8055 kng 20 0 120m 13m 4976 S 0 0.1 0:44.61 vmware-remotemk
8062 kng 20 0 120m 13m 4976 S 0 0.1 0:43.46 vmware-remotemk
24371 root 20 0 3512 1204 1008 S 0 0.0 0:00.01 captmon
1 root 20 0 23836 2112 1320 S 0 0.0 0:00.59 init
2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd
3 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/0
4 root 20 0 0 0 0 S 0 0.0 0:00.97 ksoftirqd/0
5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/1
7 root 20 0 0 0 0 S 0 0.0 0:03.83 ksoftirqd/1
8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
9 root 20 0 0 0 0 S 0 0.0 0:00.64 events/0
10 root 20 0 0 0 0 S 0 0.0 0:01.09 events/1
11 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset
12 root 20 0 0 0 0 S 0 0.0 0:00.01 khelper
13 root 20 0 0 0 0 S 0 0.0 0:00.00 netns
14 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr
15 root 20 0 0 0 0 S 0 0.0 0:00.00 pm
17 root 20 0 0 0 0 S 0 0.0 0:00.03 sync_supers
18 root 20 0 0 0 0 S 0 0.0 0:00.04 bdi-default
19 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/0
20 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/1
21 root 20 0 0 0 0 S 0 0.0 0:00.18 kblockd/0
22 root 20 0 0 0 0 S 0 0.0 0:00.11 kblockd/1
23 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpid
24 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_notify
25 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_hotplug
26 root 20 0 0 0 0 S 0 0.0 0:00.01 ata/0
27 root 20 0 0 0 0 S 0 0.0 0:00.01 ata/1
28 root 20 0 0 0 0 S 0 0.0 0:00.00 ata_aux
29 root 20 0 0 0 0 S 0 0.0 0:00.26 ksuspend_usbd
30 root 20 0 0 0 0 S 0 0.0 0:00.04 khubd
31 root 20 0 0 0 0 S 0 0.0 0:00.00 kseriod
32 root 20 0 0 0 0 S 0 0.0 0:00.00 kmmcd
35 root 20 0 0 0 0 S 0 0.0 0:00.02 khungtaskd
37 root 25 5 0 0 0 S 0 0.0 0:00.00 ksmd
38 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/0
39 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/1
40 root 20 0 0 0 0 S 0 0.0 0:00.00 ecryptfs-kthrea
41 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/0
42 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/1
52 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_0
53 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_1
55 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_2
59 root 20 0 0 0 0 S 0 0.0 0:00.01 scsi_eh_3
62 root 20 0 0 0 0 S 0 0.0 0:00.00 kstriped
63 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpathd/0
64 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpathd/1
65 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpath_handlerd
66 root 20 0 0 0 0 S 0 0.0 0:00.00 ksnapd
67 root 20 0 0 0 0 S 0 0.0 0:09.51 kondemand/0
68 root 20 0 0 0 0 S 0 0.0 0:07.57 kondemand/1
69 root 20 0 0 0 0 S 0 0.0 0:00.00 kconservative/0
70 root 20 0 0 0 0 S 0 0.0 0:00.00 kconservative/1
206 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_4
215 root 20 0 0 0 0 S 0 0.0 0:00.00 cciss_scan
292 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_5
308 root 20 0 0 0 0 S 0 0.0 0:00.62 kdmflush
311 root 20 0 0 0 0 S 0 0.0 0:00.46 kdmflush
314 root 20 0 0 0 0 S 0 0.0 0:00.00 kdmflush
317 root 20 0 0 0 0 S 0 0.0 0:00.24 kdmflush
320 root 20 0 0 0 0 S 0 0.0 0:00.03 kdmflush
350 root 20 0 0 0 0 S 0 0.0 0:01.30 jbd2/dm-0-8
351 root 20 0 0 0 0 S 0 0.0 0:00.00 ext4-dio-unwrit
version no more supported; not a 'security' problem, so no backport expected