Deal with stale network mount points (nfs, fuse, etc)

Bug #351927 reported by Andreas Hasenack on 2009-03-30
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Landscape Client
Medium
Thomas Herve
landscape-client (Ubuntu)
Undecided
Unassigned
Jaunty
Undecided
Unassigned
Karmic
Undecided
Unassigned
Lucid
Undecided
Unassigned

Bug Description

landscape-sysinfo does a stat() on a bunch of file systems when it is run.

This can cause some errors when we are dealing with stale mount points for NFS, fuse (when used with a network) and others.

For example, with default mount options for NFS, if the NFS server goes away, any client stat()ing that mount point will hang forever. This can be bad if landscape-sysinfo is run on every login, and annoying if it is run via cron. I'm not sure what can be done here, probably nothing short of ignoring such filesystems entirely.

There is something else that can happen where we can improve things by treating the error. For example, if the stale NFS mount point was mounted using -o soft,bg,intr, then after a while we get an error in the system logs and a backtrace in landscape-sysinfo:

2009-03-30 17:32:57,310 ERROR Disk plugin raised an exception.
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/landscape/sysinfo/sysinfo.py", line 98, in run
    result = plugin.run()
  File "/usr/lib/python2.5/site-packages/landscape/sysinfo/disk.py", line 35, in run
    self._statvfs)
  File "/usr/lib/python2.5/site-packages/landscape/lib/disk.py", line 38, in get_filesystem_for_path
    for info in get_mount_info(mounts_file, statvfs_):
  File "/usr/lib/python2.5/site-packages/landscape/lib/disk.py", line 25, in get_mount_info
    stats = statvfs_(mount_point)
OSError: [Errno 5] Input/output error: '/mnt/nfs'

Mar 30 17:32:57 nsn2 kernel: [30339.580111] nfs: server localhost not responding, timed out
Mar 30 17:33:45 nsn2 kernel: [30387.580233] nfs: server localhost not responding, timed out

This and similar OSErrors can be treated. Here is another one:
2009-03-30 20:25:37,050 ERROR Disk plugin raised an exception.
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/landscape/sysinfo/sysinfo.py", line 98, in run
    result = plugin.run()
  File "/usr/lib/python2.6/dist-packages/landscape/sysinfo/disk.py", line 35, in run
    self._statvfs)
  File "/usr/lib/python2.6/dist-packages/landscape/lib/disk.py", line 38, in get_filesystem_for_path
    for info in get_mount_info(mounts_file, statvfs_):
  File "/usr/lib/python2.6/dist-packages/landscape/lib/disk.py", line 25, in get_mount_info
    stats = statvfs_(mount_point)
OSError: [Errno 107] Transport endpoint is not connected: '/srv/stuff/test

Related branches

Changed in landscape-client:
importance: Undecided → Medium
Changed in landscape-client:
milestone: none → 1.0.29
Changed in landscape-client:
milestone: 1.0.29 → 1.0.x
tags: added: sooner-than-later
Changed in landscape-client:
milestone: 1.0.x → 1.5.0
Thomas Herve (therve) on 2010-03-22
Changed in landscape-client:
assignee: nobody → Thomas Herve (therve)
tags: removed: sooner-than-later
Thomas Herve (therve) on 2010-03-29
Changed in landscape-client:
status: New → Fix Committed
tags: added: testing
tags: removed: testing
tags: added: needs-testing
Andreas Hasenack (ahasenack) wrote :

Confirmed fixed, a stale nfs mount point doesn't hang landscape-sysinfo anymore.

tags: removed: needs-testing
Changed in landscape-client:
status: Fix Committed → Fix Released
affects: ubuntu → landscape-client (Ubuntu)
gadLinux (gad-aguilardelgado) wrote :

Hi,

I must confirm this problem. I found all my systems inaccessible because a launchpad-sysinfo process is launch on every login as usual. But now the problem is that it hangs 100% of time because stale nfs filesystem and this caused all my systems to be unable to show bash prompt.

This is a big problem so it MUST implement a timeout in all it's operations because it's not a critical program to run in all logins.

Thank you.

Andreas Hasenack (ahasenack) wrote :

A timeout doesn't help much. If a process touches a stale NFS mount point, it will be stuck in D state forever unless the mount was made with the intr or some other non-default option. Even if we forked a child to do just that, the child would be in D state and become a zombie as soon as landscape-sysinfo finished.

What we are doing in the package that should be in -proposed (or on its way there) is ignore NFS and other network related filesystems.

Accepted landscape-client into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in landscape-client (Ubuntu Karmic):
status: New → Fix Committed
tags: added: verification-needed
Changed in landscape-client (Ubuntu Jaunty):
status: New → Fix Committed
Martin Pitt (pitti) wrote :

Accepted landscape-client into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.5.0.1-0ubuntu0.9.04.0

---------------
landscape-client (1.5.0.1-0ubuntu0.9.04.0) jaunty-proposed; urgency=low

  * New upstream version
    - Fix smart-update failing its very first run (LP: #562496)
    - Depend on pythonX.Y-dbus and pythonX.Y-pycurl (LP: #563063)
    - Make only one request at a time to retrieve EC2 instances (LP: #567515)

  * New upstream version (LP: #557244)
    - Fix package-changer running before smart-update has completed (LP: #542215)
    - Report the version of Eucalyptus used to generate topology data (LP: #554007)
    - Enable the Eucalyptus plugin by default, if supported (LP: #546531)
    - Use a whitelist of allowed filesystem types to instead of a blacklist (LP: #351927)
    - Report the update-manager logs to the server (LP: #503384)
    - Turn off Curl's DNS caching for requests. (LP: #522668)
 -- Free Ekanayaka <email address hidden> Wed, 21 Apr 2010 12:31:28 +0200

Changed in landscape-client (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.5.0.1-0ubuntu0.9.10.0

---------------
landscape-client (1.5.0.1-0ubuntu0.9.10.0) karmic-proposed; urgency=low

  * New upstream version
    - Fix smart-update failing its very first run (LP: #562496)
    - Depend on pythonX.Y-dbus and pythonX.Y-pycurl (LP: #563063)
    - Make only one request at a time to retrieve EC2 instances (LP: #567515)

  * New upstream version (LP: #557244)
    - Fix package-changer running before smart-update has completed (LP: #542215)
    - Report the version of Eucalyptus used to generate topology data (LP: #554007)
    - Enable the Eucalyptus plugin by default, if supported (LP: #546531)
    - Use a whitelist of allowed filesystem types to instead of a blacklist (LP: #351927)
    - Report the update-manager logs to the server (LP: #503384)
    - Turn off Curl's DNS caching for requests. (LP: #522668)
 -- Free Ekanayaka <email address hidden> Wed, 21 Apr 2010 12:31:28 +0200

Changed in landscape-client (Ubuntu Karmic):
status: Fix Committed → Fix Released
Andreas Hasenack (ahasenack) wrote :

This was released for lucid already.

Changed in landscape-client (Ubuntu Lucid):
status: New → Fix Released
Changed in landscape-client (Ubuntu):
status: New → Fix Released
tags: removed: verification-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers