zfs-fuse tools (zfs, zpool) become unstable when most RAM is in 'cached' state and free memory counter got low.

Bug #1008944 reported by Konstantin Gusenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-fuse (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Under heavy file IO (moving tens of gigabytes from ext4fs to zfs in single mv operation) zfs-fuse daemon and tools become unstable (freezing when run, reporting errors, not showing all pools) and finally zfs-fuse daemon could crash. No significant errors are reported to syslog, except several warnings. No core file found.

root@nas01:~# dpkg -l | grep zfs
ii zfs-fuse 0.6.0-1 ZFS on FUSE

System version:
  root@nas01:~# lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description: Ubuntu 10.04.4 LTS
  Release: 10.04
  Codename: lucid

Kernel: Linux nas01 2.6.32-41-server #89-Ubuntu SMP Fri Apr 27 22:33:31 UTC 2012 x86_64 GNU/Linux

CPU: Pentium(R) Dual-Core CPU E6700 @ 3.20GHz

RAM: 16Gb

Freeze:
  root@nas01:~# zfs list
  ^C

Unstable behavior:
  root@nas01:~# zfs list
  internal error: Bad address
  Aborted

  root@nas01:~# zfs list
  internal error: Bad address
  Aborted

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 732G 26K /nas
  nas/Family 17,9G 732G 17,9G /nas/Family
  nas/Store1 1,48T 732G 1,48T /nas/Store1
  nas/Ph 65,9G 732G 65,9G /nas/Ph
  nas/Private 8,31G 732G 8,31G /nas/Private
  nas/Public 117G 732G 117G /nas/Public
  nas/Alex 133G 732G 133G /nas/Alex
  nas/old 150G 732G 150G /nas/old
  nas/Store2 21K 732G 21K /nas/Store2

  root@nas01:~# zfs list
  internal error: Bad address
  Aborted

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 732G 26K /nas
  nas/Family 17,9G 732G 17,9G /nas/Family
  nas/Store1 1,48T 732G 1,48T /nas/Store1
  nas/Ph 65,9G 732G 65,9G /nas/Ph
  nas/Private 8,31G 732G 8,31G /nas/Private
  nas/Public 117G 732G 117G /nas/Public
  nas/Alex 133G 732G 133G /nas/Alex
  nas/old 150G 732G 150G /nas/old
  nas/Store2 21K 732G 21K /nas/Store2

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 732G 26K /nas
  nas/Family 17,9G 732G 17,9G /nas/Family
  nas/Store1 1,48T 732G 1,48T /nas/Store1
  nas/Ph 65,9G 732G 65,9G /nas/Ph
  nas/Private 8,31G 732G 8,31G /nas/Private
  nas/Public 117G 732G 117G /nas/Public
  nas/Alex 133G 732G 133G /nas/Alex
  nas/old 150G 732G 150G /nas/old
  nas/Store2 21K 732G 21K /nas/Store2

  root@nas01:~# zfs list
  internal error: Bad address
  Aborted

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 732G 26K /nas
  nas/Family 17,9G 732G 17,9G /nas/Family
  nas/Store1 1,48T 732G 1,48T /nas/Store1
  nas/Ph 65,9G 732G 65,9G /nas/Ph
  nas/Private 8,31G 732G 8,31G /nas/Private
  nas/Public 117G 732G 117G /nas/Public
  nas/Alex 133G 732G 133G /nas/Alex
  nas/old 150G 732G 150G /nas/old
  nas/Store2 21K 732G 21K /nas/Store2

  root@nas01:~# zfs list
  internal error: Bad address
  Aborted

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 731G 26K /nas
  nas/Family 17,9G 731G 17,9G /nas/Family
  nas/Store1 1,48T 731G 1,48T /nas/Store1
  nas/Ph 65,9G 731G 65,9G /nas/Ph
  nas/Private 8,31G 731G 8,31G /nas/Private
  nas/Public 117G 731G 117G /nas/Public
  nas/Alex 133G 731G 133G /nas/Alex
  nas/old 150G 731G 150G /nas/old
  nas/Store2 21K 731G 21K /nas/Store2

  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 731G 26K /nas
  nas/Family 17,9G 731G 17,9G /nas/Family
  nas/Store1 1,48T 731G 1,48T /nas/Store1
  nas/Ph 65,9G 731G 65,9G /nas/Ph
  nas/Private 8,31G 731G 8,31G /nas/Private
  nas/Public 117G 731G 117G /nas/Public
  nas/Alex 133G 731G 133G /nas/Alex
  nas/old 150G 731G 150G /nas/old
  nas/Store2 21K 731G 21K /nas/Store2

ZFS pool 'nas' disappears:
  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws

And appears again:
  root@nas01:~# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,96T 731G 26K /nas
  nas/Family 17,9G 731G 17,9G /nas/Family
  nas/Store1 1,48T 731G 1,48T /nas/Store1
  nas/Ph 65,9G 731G 65,9G /nas/Ph
  nas/Private 8,31G 731G 8,31G /nas/Private
  nas/Public 117G 731G 117G /nas/Public
  nas/Alex 133G 731G 133G /nas/Alex
  nas/old 150G 731G 150G /nas/old
  nas/Store2 21K 731G 21K /nas/Store2

  root@nas01:/NAS00/2Tb01/Store1# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  backup 130K 293G 24K /backup
  backup/nas 21K 293G 21K /backup/nas
  backup/ws 21K 293G 21K /backup/ws
  nas 1,70T 1001G 26K /nas
  nas/Family 17,9G 1001G 17,9G /nas/Family
  nas/Store1 1,22T 1001G 1,22T /nas/Store1
  nas/Ph 65,9G 1001G 65,9G /nas/Ph
  nas/Private 8,31G 1001G 8,31G /nas/Private
  nas/Public 117G 1001G 117G /nas/Public
  nas/Alex 133G 1001G 133G /nas/Alex
  nas/old 150G 1001G 150G /nas/old
  nas/Store2 21K 1001G 21K /nas/Store2

Crash:
  root@nas01:/NAS00/2Tb01/Store1# zfs list
  root@nas01:/NAS00/2Tb01/Store1# zfs list
  connect: Connection refused
  Please make sure that the zfs-fuse daemon is running.
  internal error: failed to initialize ZFS library

  root@nas01:/NAS00/2Tb01/Store1# ps -ef | grep zfs
  root 7320 31275 0 09:15 pts/3 00:00:00 grep zfs

  root@nas01:/NAS00/2Tb01/Store1# /etc/init.d/zfs-fuse start
   * Starting zfs-fuse zfs-fuse
     ...fail!

  root@nas01:/NAS00/2Tb01/Store1# ls -l /var/run/zfs*
  -rw-r--r-- 1 root root 5 2012-06-04 21:20 /var/run/zfs-fuse.pid

Log:
  /var/log/local7.log:Jun 5 06:28:50 nas01 zfs-fuse: put_nvlist: error Bad address on xcopyout
  /var/log/local7.log:Jun 5 06:30:15 nas01 zfs-fuse: put_nvlist: error Bad address on xcopyout
  /var/log/local7.log:Jun 5 08:57:25 nas01 zfs-fuse: put_nvlist: error Bad address on xcopyout
  /var/log/local7.log:Jun 5 08:57:49 nas01 zfs-fuse: put_nvlist: error Bad address on xcopyout

  Jun 5 06:28:16 nas01 kernel: [32913.939595] zfs-fuse: sending ioctl 2285 to a partition!
  Jun 5 06:28:16 nas01 kernel: [32913.939601] zfs-fuse: sending ioctl 2285 to a partition!
  Jun 5 06:28:16 nas01 kernel: [32913.956343] zfs-fuse: sending ioctl 2285 to a partition!
  Jun 5 06:28:16 nas01 kernel: [32913.956349] zfs-fuse: sending ioctl 2285 to a partition!

May be this significant, in certain moment of files move, I tried to zfs create and got something like 'Insufficient memory' error. Couldn't reproduce either and keept no log of that event. But, host have 16Gb RAM, applications are using no more than 3-7 Gb and most of them seems to be not resident, all other memory is marked as 'cached' in top and /proc/meminfo. Usually during file move to zfs, memory listed as 90Mb free, 14Gb cached, like below.

top - 12:52:44 up 5:33, 6 users, load average: 0.52, 0.77, 0.71
Tasks: 246 total, 2 running, 243 sleeping, 0 stopped, 1 zombie
Cpu(s): 13.3%us, 13.8%sy, 0.0%ni, 49.4%id, 21.5%wa, 1.0%hi, 1.0%si, 0.0%st
Mem: 16467932k total, 16375684k used, 92248k free, 41964k buffers
Swap: 11718648k total, 0k used, 11718648k free, 15019772k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2050 root 20 0 1841m 277m 1604 S 38 1.7 28:00.34 zfs-fuse
21229 root 20 0 32032 4640 2404 R 15 0.0 5:10.48 mc
 1729 root 20 0 97524 8836 4300 S 1 0.1 2:52.62 Xorg
   36 root 20 0 0 0 0 S 1 0.0 1:59.62 kswapd0
 1794 root 20 0 155m 19m 8116 S 1 0.1 1:32.27 lxdm-greeter-gt
23573 root 20 0 19356 1540 1048 R 1 0.0 0:02.70 top
 2695 root 20 0 75296 3052 2220 S 0 0.0 0:00.73 cupsd
 4534 root 20 0 282m 67m 33m S 0 0.4 0:11.53 hostd-worker
 4896 root 20 0 0 0 0 S 0 0.0 1:12.08 vmware-rtc
 5227 kng 20 0 292m 44m 32m S 0 0.3 0:11.20 vmware-tray
 6341 root 20 0 930m 544m 505m S 0 3.4 0:57.73 vmware-vmx
 7462 kng 20 0 263m 81m 47m S 0 0.5 0:20.40 vmware
 8055 kng 20 0 120m 13m 4976 S 0 0.1 0:44.61 vmware-remotemk
 8062 kng 20 0 120m 13m 4976 S 0 0.1 0:43.46 vmware-remotemk
24371 root 20 0 3512 1204 1008 S 0 0.0 0:00.01 captmon
    1 root 20 0 23836 2112 1320 S 0 0.0 0:00.59 init
    2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd
    3 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/0
    4 root 20 0 0 0 0 S 0 0.0 0:00.97 ksoftirqd/0
    5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/1
    7 root 20 0 0 0 0 S 0 0.0 0:03.83 ksoftirqd/1
    8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
    9 root 20 0 0 0 0 S 0 0.0 0:00.64 events/0
   10 root 20 0 0 0 0 S 0 0.0 0:01.09 events/1
   11 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset
   12 root 20 0 0 0 0 S 0 0.0 0:00.01 khelper
   13 root 20 0 0 0 0 S 0 0.0 0:00.00 netns
   14 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr
   15 root 20 0 0 0 0 S 0 0.0 0:00.00 pm
   17 root 20 0 0 0 0 S 0 0.0 0:00.03 sync_supers
   18 root 20 0 0 0 0 S 0 0.0 0:00.04 bdi-default
   19 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/0
   20 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/1
   21 root 20 0 0 0 0 S 0 0.0 0:00.18 kblockd/0
   22 root 20 0 0 0 0 S 0 0.0 0:00.11 kblockd/1
   23 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpid
   24 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_notify
   25 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_hotplug
   26 root 20 0 0 0 0 S 0 0.0 0:00.01 ata/0
   27 root 20 0 0 0 0 S 0 0.0 0:00.01 ata/1
   28 root 20 0 0 0 0 S 0 0.0 0:00.00 ata_aux
   29 root 20 0 0 0 0 S 0 0.0 0:00.26 ksuspend_usbd
   30 root 20 0 0 0 0 S 0 0.0 0:00.04 khubd
   31 root 20 0 0 0 0 S 0 0.0 0:00.00 kseriod
   32 root 20 0 0 0 0 S 0 0.0 0:00.00 kmmcd
   35 root 20 0 0 0 0 S 0 0.0 0:00.02 khungtaskd
   37 root 25 5 0 0 0 S 0 0.0 0:00.00 ksmd
   38 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/0
   39 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/1
   40 root 20 0 0 0 0 S 0 0.0 0:00.00 ecryptfs-kthrea
   41 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/0
   42 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/1
   52 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_0
   53 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_1
   55 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_2
   59 root 20 0 0 0 0 S 0 0.0 0:00.01 scsi_eh_3
   62 root 20 0 0 0 0 S 0 0.0 0:00.00 kstriped
   63 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpathd/0
   64 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpathd/1
   65 root 20 0 0 0 0 S 0 0.0 0:00.00 kmpath_handlerd
   66 root 20 0 0 0 0 S 0 0.0 0:00.00 ksnapd
   67 root 20 0 0 0 0 S 0 0.0 0:09.51 kondemand/0
   68 root 20 0 0 0 0 S 0 0.0 0:07.57 kondemand/1
   69 root 20 0 0 0 0 S 0 0.0 0:00.00 kconservative/0
   70 root 20 0 0 0 0 S 0 0.0 0:00.00 kconservative/1
  206 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_4
  215 root 20 0 0 0 0 S 0 0.0 0:00.00 cciss_scan
  292 root 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_5
  308 root 20 0 0 0 0 S 0 0.0 0:00.62 kdmflush
  311 root 20 0 0 0 0 S 0 0.0 0:00.46 kdmflush
  314 root 20 0 0 0 0 S 0 0.0 0:00.00 kdmflush
  317 root 20 0 0 0 0 S 0 0.0 0:00.24 kdmflush
  320 root 20 0 0 0 0 S 0 0.0 0:00.03 kdmflush
  350 root 20 0 0 0 0 S 0 0.0 0:01.30 jbd2/dm-0-8
  351 root 20 0 0 0 0 S 0 0.0 0:00.00 ext4-dio-unwrit

Revision history for this message
dino99 (9d9) wrote :

version no more supported; not a 'security' problem, so no backport expected

Changed in zfs-fuse (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.