Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded desc = context deadline exceeded while the daemon is doing work
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
zsys (Ubuntu) |
Fix Released
|
High
|
Didier Roche-Tolomelli | ||
Focal |
Fix Released
|
High
|
Didier Roche-Tolomelli |
Bug Description
[Impact]
* On slow system, zsysctl client can timeout while the daemon is still doing a lot of active work.
* The daemon has now 2 phases:
- Not started: specific timeout for the daemon to start before the client exits.
- After startup, when proceeding client request: pulse in the grpc command from the daemon when any log is processed (just replaced by pulse bytes).
[Test Case]
* 1. Create a bunch of datasets so that requests always timeout
* 2. Upgrade to the new zsys version
* 3. Redo the same request -> it should be slow to execute but don’t timeout.
[Regression Potential]
* GRPC exchange hasn’t changed
* However, we are now sending "." in non debug mode to the client to pulse progress.
* We were already running a lot of command with -vv, sending more data (debug logs)
---
The client can timeout while the daemon is still doing a lot of active work.
There are 3 cases to take into account:
- daemon not started: give a timeout for the daemon to start before the client exits. Ideally, we would pulse back to the client, but the entrypoint isn’t reached out yet
- once the call starts:
If other calls are in progress and there is mutex, ideally pulse it to the client (or give a new timeout)
- when it’s our turn:
the pulse to the daemon can be done by the log progress which we will thus always send to the client.
Note that if we wait for too long, we can imagine a pulsing progress bar on the CLI with which steps we are at.
This needs to change GRPC messages back but will drastically reduce timeouts that people can get when having a lot of datasets or in code path we didn’t optimize yet.
Changed in zsys (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
description: | updated |
Changed in zsys (Ubuntu Focal): | |
importance: | Undecided → High |
assignee: | nobody → Didier Roche (didrocks) |
Changed in zsys (Ubuntu): | |
assignee: | nobody → Didier Roche (didrocks) |
This bug was fixed in the package zsys - 0.5.0
---------------
zsys (0.5.0) groovy; urgency=medium
[ Jean-Baptiste Lallement ]
[ Didier Roche ]
* Fix infinite GC loop (LP: #1870461)
* Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded
desc = context deadline exceeded while the daemon is doing work
(LP: #1875564)
* Stop taking automated or manual snapshot when there is less than 20% of
free disk space (LP: #1876334)
* Enable trim support for upgrading users (LP: #1881540)
* Only clean up previously linked user datasets when unlinked under USERDATA
(LP: #1881538)
* Strategy for deleted user datasets via a new hidden command called by
userdel (LP: #1870058)
* Get better auto snapshots message when integrated to apt (LP: #1875420)
* Update LastUsed on shutdown via a new hidden command service call
(LP: #1881536)
* Prevent segfault immediately after install when zfs kernel module isn't
loaded (LP: #1881541)
* Don’t try to autosave gdm user (and in general non system user), even if
systemd --user is started for them. (LP: #1881539)
* Prevent apt printing errors when zsys is removed without purge
(LP: #1881535)
* Some tests enhancements:
- new tests for all the above
- allow setting a different local socket for debugging/tests purposes only
- ascii order datasets in golden files
* Typos and messages fixes. Direct prints are not prefixed with INFO
anymore.
* Refreshed po and readme with the above.
-- Didier Roche <email address hidden> Mon, 01 Jun 2020 09:26:52 +0200