Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded desc = context deadline exceeded while the daemon is doing work

Bug #1875564 reported by Didier Roche on 2020-04-28
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
zsys (Ubuntu)
High
Didier Roche
Focal
High
Didier Roche

Bug Description

[Impact]
 * On slow system, zsysctl client can timeout while the daemon is still doing a lot of active work.
 * The daemon has now 2 phases:
   - Not started: specific timeout for the daemon to start before the client exits.
   - After startup, when proceeding client request: pulse in the grpc command from the daemon when any log is processed (just replaced by pulse bytes).

[Test Case]

 * 1. Create a bunch of datasets so that requests always timeout
 * 2. Upgrade to the new zsys version
 * 3. Redo the same request -> it should be slow to execute but don’t timeout.

[Regression Potential]
 * GRPC exchange hasn’t changed
 * However, we are now sending "." in non debug mode to the client to pulse progress.
 * We were already running a lot of command with -vv, sending more data (debug logs)

---

The client can timeout while the daemon is still doing a lot of active work.

There are 3 cases to take into account:
- daemon not started: give a timeout for the daemon to start before the client exits. Ideally, we would pulse back to the client, but the entrypoint isn’t reached out yet
- once the call starts:
 If other calls are in progress and there is mutex, ideally pulse it to the client (or give a new timeout)
- when it’s our turn:
the pulse to the daemon can be done by the log progress which we will thus always send to the client.

Note that if we wait for too long, we can imagine a pulsing progress bar on the CLI with which steps we are at.

This needs to change GRPC messages back but will drastically reduce timeouts that people can get when having a lot of datasets or in code path we didn’t optimize yet.

Didier Roche (didrocks) on 2020-04-28
Changed in zsys (Ubuntu):
status: New → Triaged
importance: Undecided → High
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zsys - 0.5.0

---------------
zsys (0.5.0) groovy; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix infinite GC loop (LP: #1870461)
  * Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded
    desc = context deadline exceeded while the daemon is doing work
    (LP: #1875564)
  * Stop taking automated or manual snapshot when there is less than 20% of
    free disk space (LP: #1876334)
  * Enable trim support for upgrading users (LP: #1881540)
  * Only clean up previously linked user datasets when unlinked under USERDATA
    (LP: #1881538)
  * Strategy for deleted user datasets via a new hidden command called by
    userdel (LP: #1870058)
  * Get better auto snapshots message when integrated to apt (LP: #1875420)
  * Update LastUsed on shutdown via a new hidden command service call
    (LP: #1881536)
  * Prevent segfault immediately after install when zfs kernel module isn't
    loaded (LP: #1881541)
  * Don’t try to autosave gdm user (and in general non system user), even if
    systemd --user is started for them. (LP: #1881539)
  * Prevent apt printing errors when zsys is removed without purge
    (LP: #1881535)
  * Some tests enhancements:
    - new tests for all the above
    - allow setting a different local socket for debugging/tests purposes only
    - ascii order datasets in golden files
  * Typos and messages fixes. Direct prints are not prefixed with INFO
    anymore.
  * Refreshed po and readme with the above.

 -- Didier Roche <email address hidden> Mon, 01 Jun 2020 09:26:52 +0200

Changed in zsys (Ubuntu):
status: Triaged → Fix Released
satmandu (satadru-umich) wrote :

Still seeing "ERROR couldn't connect to zsys daemon: timed out waiting for server handshake " after updating to groovy, but not seeing the "context deadline exceeded while the daemon is doing work" message any more.

Didier Roche (didrocks) wrote :

@satmandu: after how long do you get this message? Is this in every command? Do you mind reporting another bug with zsys with ubuntu-bug zsys so that we can collect all needed informations (number of datasets, logs and so…)

satmandu (satadru-umich) wrote :

Happens every time I do an apt upgrade. Reported at Bug #1882090: https://bugs.launchpad.net/ubuntu/+source/zsys/+bug/1882090

Didier Roche (didrocks) on 2020-06-09
description: updated
Changed in zsys (Ubuntu Focal):
importance: Undecided → High
assignee: nobody → Didier Roche (didrocks)
Changed in zsys (Ubuntu):
assignee: nobody → Didier Roche (didrocks)

Hello Didier, or anyone else affected,

Accepted zsys into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zsys/0.4.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zsys (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed verification-needed-focal
Jean-Baptiste Lallement (jibel) wrote :

SRU verification for Focal:
I have reproduced the problem with zsys 0.4.5 in focal and have verified that the version of zsys 0.4.6 in -proposed fixes the issue.

Marking as verification-done

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zsys - 0.4.6

---------------
zsys (0.4.6) focal; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix infinite GC loop (LP: #1870461)
  * Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded
    desc = context deadline exceeded while the daemon is doing work
    (LP: #1875564)
  * Stop taking automated or manual snapshot when there is less than 20% of
    free disk space (LP: #1876334)
  * Enable trim support for upgrading users (LP: #1881540)
  * Only clean up previously linked user datasets when unlinked under USERDATA
    (LP: #1881538)
  * Strategy for deleted user datasets via a new hidden command called by
    userdel (LP: #1870058)
  * Get better auto snapshots message when integrated to apt (LP: #1875420)
  * Update LastUsed on shutdown via a new hidden command service call
    (LP: #1881536)
  * Prevent segfault immediately after install when zfs kernel module isn't
    loaded (LP: #1881541)
  * Don’t try to autosave gdm user (and in general non system user), even if
    systemd --user is started for them. (LP: #1881539)
  * Prevent apt printing errors when zsys is removed without purge
    (LP: #1881535)
  * Some tests enhancements:
    - new tests for all the above
    - allow setting a different local socket for debugging/tests purposes only
    - ascii order datasets in golden files
  * Typos and messages fixes. Direct prints are not prefixed with INFO
    anymore.
  * Refreshed po and readme with the above.

 -- Didier Roche <email address hidden> Mon, 01 Jun 2020 09:26:52 +0200

Changed in zsys (Ubuntu Focal):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for zsys has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers