ERROR Service took too long to respond. Disconnecting client.

Bug #1907782 reported by Lars
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
zsys (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi Team!

I saw the other bug which happened, when bpool is not imported, but my bpool is imported, so I think it is another problem and so another bug.

I am running ubunto focal (20.04.1) with zsys 0.4.8.

When I do

  /sbin/zsysctl -vvv boot commit

I get

...
DEBUG ZFS: ending transaction
DEBUG ZFS: transaction done
DEBUG ZFS: transaction done
ZSys is adding automatic system snapshot to GRUB menu
DEBUG Sourcing file `/etc/default/grub'
DEBUG Sourcing file `/etc/default/grub.d/init-select.cfg'
DEBUG Generating grub configuration file ...
DEBUG
level=debug msg="Didn't receive any information from service in 30s"
level=error msg="Service took too long to respond. Disconnecting client."

Iported zpools:
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bpool 1,88G 717M 1,17G - - - 37% 1.00x ONLINE -
rpool 944G 570G 374G - - 14% 60% 1.00x ONLINE -

Revision history for this message
Lars (lollypop) wrote :
Revision history for this message
Lars (lollypop) wrote :

Clicked too early :-)

Thanks in advance for take a look at my bug and have a nice day!

    Lars

Revision history for this message
Jan (janhn) wrote :

After some time of using a zfs on root system, there are several issues which produce the error message described here and also kill performance of file browsers (eg. Save file dialog from a browser).

The service zsys-gc.service is permanently in a failed state:

$ systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
  zsys-gc.service loaded failed failed Clean up old snapshots to free space

Running apt commands produces the error:

ERROR Service took too long to respond. Disconnecting client.

Eventually the bpool ran out of space for new kernels and system upgrade started reporting dpkg errors related to installation of newer linux-image packages.

---

This may be multiple issues caused by having a lot of datasets (eg. ~1000) on rpool because of intensive use of docker and/or a lot of snapshots (eg. ~1300, ~400 of which are autozsys).

Running garbage collection manually does remove some of it:
$ sudo zsysctl -vvv service gc

Yet restarting the zsys-gc.service still fails:
$ sudo journalctl -f -u zsys-gc.service
systemd[1]: Starting Clean up old snapshots to free space...
zsysctl[1327202]: level=error msg="Service took too long to respond. Disconnecting client."
systemd[1]: zsys-gc.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: zsys-gc.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Clean up old snapshots to free space.

Is there a way and would it be meaningful to increase the timeout?
Are there any other ways to tune it to work, like reducing the number of maintained snapshots that garbage collection is aiming at keeping?
How to improve the performance?
What's the right way to clean up bpool of old images?

Thanks for any hints

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zsys (Ubuntu):
status: New → Confirmed
Revision history for this message
Laurent Dinclaux (dreadlox) wrote :

For me, this happens systematically after waking up from suspend (and I need to suspend my desktop computer because of energy consumption)

Same "Service took too long to respond. Disconnecting client." when doing an apt update or zsys-gc.service.

Restarting zsysd.service and zsysd.socket doesn't restore normal behavior. Only a reboot does.

pservit (pservit)
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Simon Sanladerer (jit-010101) wrote (last edit ):
Download full text (3.2 KiB)

I had the same issue here.

Edit 3: Went ahead disabled zsys and migrated to sanoid + save alias -> works properly and takes care of the space properly too.

No more issues with update-grub, bpool filling up and tree million snapshots when installing many packages.

----

So far it seems like only zsys is affected and its services, so automatic snapshotting on apt install currently does not work for me.

I do have to add that I created my own snapshot command before upgrading anything important after having quite a few issues with the way zsys does it trying to restore my system to a working state (kernel and apt-packages staying accross restored snapshots of zsysctl, which ... I could not really grasp why it would do that).

Maybe that has something to do with it?

Linux derpmachine 5.13.0-22-lowlatency #22~20.04.1-Ubuntu SMP PREEMPT Tue Nov 9 16:34:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

For now I dont really need it ... sanoid and my command works totally fine the regular zfs-way.

Already restored a few times after a few broken kernel testings and playing with different DEs.

---

Edit 1:

Reboot usually solves it.

Also found this here in regards to fine-tuning zsys yourself, including increasing the timeout for the service:

https://github.com/ubuntu/zsys/issues/155#issuecomment-758902487

----

Edit 2:

Despite the above from Edit 1 - it does not help. Increasing the timeout doesn't help. Messing with the parameters does not help. Even if you have sufficient space.

Just waiting for a few minutes after reboot the service crashes back again. Why? Absolutely no clue. There's no error-log or anything pointing towards it.

Despite having enought space, despite zfs in itself reporting everything is fine.

I might have to add that I'm also using the workaround for docker with an ext4 mount on /var/lib/docker to solve the performance issues but I doubt that has anything to do with it.

Seems totally unreliable for me at this state so I disabled it completely.

Destroyed all snapshots related to it and moved completely to sanoid which seems much more battle tested and stable in its purpose, which I verified is super fine with purging and snapshoting in itself.

If anything bad happens now I just boot to the shell and run the restore on zpool and bpool manually.

I do not understand why there is a need for a zsys daemon that's working as unreliable as this if zfs is working perfectly fine in itself out of the box.

Seems like this implementation behind zsys needs quite some work and/or refactoring done on the core concept before this can be considered stable (no offense, just guessing).

Also with a look at sanoid which appears to work much better in its current implementation ontop of zfs at least on the current LTS before 22.04 happens next year.

Oh wow - didn't even realize this issue persists since a year now (first reported december 2020).

That's not something trivial and not something to underestimate considering the design behind the two pools - bpool and rpool and bpool being so limited in space having no ability to grow it easily on devices like a laptop.

So my advice to anyone using Ubuntus Experimental ZFS-Install for ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.