Comment 0 for bug 1821395

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

< NOTE: patches will be sent to kernel-team mailing list. >

[Impact]

 * fscache issue where jobs get hung when fscache disk is full.

 * trivial upstream fix; already applied in X/D, required in B/C:
   commit c5a94f434c82 ("fscache: fix race between enablement and
   dropping of object").

[Test Case]

 * Test kernel verified / regression-tested by reporter.

 * Apparently there's no simple test case,
   but these are the conditions to hit the problem:

   1) The active dataset size is equal to the cache disk size.
      The application reads the data over and over again.
   2) Disk is near full (90%+)
   3) cachefilesd in userspace is trying to cull the old objects
      while new objects are being looked up.
   4) new cachefiles are created and some fail with no disk space.
   5) race in dropping object state machine and
      deferred lookup state machine causes the hang.
   6) HUNG in fscache_wait_for_deferred_lookup for
      clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

[Regression Potential]

 * Low; contained in fscache; no further fixes applied upstream.

 * This patch is applied in a stable tree (linux-4.4.y).

[Original Description]

An user reported an fscache issue where jobs get hung when the fscache disk is full.

After investigation, it's been found to be an issue already reported/fixed upstream,
by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object").

This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco.

Apparently there's no simple test case, but these are the conditions to hit the problem:

1) The active dataset size is equal to the cache disk size.
   The application reads the data over and over again.
2) Disk is near full (90%+)
3) cachefilesd in userspace is trying to cull the old objects
   while new objects are being looked up.
4) new cachefiles are created and some fail with no disk space.
5) race in dropping object state machine and
   deferred lookup state machine causes the hang.
6) HUNG in fscache_wait_for_deferred_lookup for
   clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.