make juju-reboot work on centos

Bug #1892029 reported by james beedy on 2020-08-18
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
High
Heather Lanigan

Bug Description

For some reason, juju-reboot only works on ubuntu. Can we make sure it works on at least centos too?

https://discourse.juju.is/t/juju-reboot-hook-tool-works-on-ubuntu-but-not-on-centos/3454/2

Thanks!

Erik Lönroth (sssler-scania) wrote :

This is really a key functionality that which if it doesn't work, introduces scenarios where a deployment may end up in a very bad state if the behavior of a juju reboot is not consistent across OS:es.

Pete Vander Giessen (petevg) wrote :

I was able to reproduce this issue on MAAS.

To do so:

1) Grab a model on a cloud w/ centos7 images (I used our MAAS).
2) Deploy a simple CentOS charm:

juju deploy cs:~erik-lonroth/tiny-bash-centos-0

3) Wait for a deploy, then open a debug-hooks session:

juju debug-hooks tiny-bash/0 update-status

4) Once the update-status hook fires (this may take a few minutes), dropping you into the hook context, run juju-reboot.

juju-reboot

5) After you exit out of the debugging session, the unit will reboot. You'll see the machine come back up, but the agent will be in a "lost" state. This is due to this error:

2020-08-20 20:12:51 ERROR juju.worker.proxyupdater proxyupdater.go:300 error writing apt proxy config file: open /etc/apt/apt.conf.d/95-juju-proxy-settings: no such file or directory

/etc/apt doesn't exist in CentOS, of course. Hence the failure.

I think that this isn't necessarily an issue w/ the juju-reboot command. But it is an issue with code the unit agent is running on reboot on a CentOS box.

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.9-beta1
Changed in juju:
milestone: 2.9-beta1 → 2.9-rc1
tags: added: centos community-feedback
Changed in juju:
assignee: nobody → Heather Lanigan (hmlanigan)
Heather Lanigan (hmlanigan) wrote :

I had a different experience reproducing this issue. The machine did not reboot, but the machine agent restarted. When I rebooted the centos machine with `shutdown -r now`, the machine agent was lost for a bit, then came back as expected.

The proxyupdater error is unrelated, it happens with that worker starts, but does not cause the machine agent to fail. I've filed a new bug for it lp:1901069.

Changed in juju:
status: Triaged → In Progress
Heather Lanigan (hmlanigan) wrote :

Juju creates a temporary script to shutdown/reboot a machine when juju-reboot is run. The problem comes with how the script is run:

`at -f <script> now`

"at" is not installed on centos by default. Working on a fix. Workaround for now (on centos units):

sudo yum install at
sudo systemctl start atd

Heather Lanigan (hmlanigan) wrote :

A message was logged in the /var/log/juju/machine-#.log file on the unit's machine. Not available via `juju debug-log` like other messages.

machine-0.log:2020-10-22 20:08:11 INFO juju.cmd.jujud machine.go:649 Reboot: Error executing reboot: exec: "at": executable file not found in $PATH.

There should be a better way to uncover this.

Pete Vander Giessen (petevg) wrote :

Thank you for the detective work and the correction on the root cause, @hmlanigan.

Would it be better just to factor out the dependency on at altogether? Installing via yum will fix the issue on CentOS, but won't fix the issue on other non Ubuntu Server machines, where at isn't necessarily installed by default. (It's not installed by default on Ubuntu Desktop, for example -- it's not necessarily a package that we can rely on appearing on any given system.)

Do we use the system package manager to install other dependencies? If so, we might need to plan some work to find alternatives for those packages, lest we find ourselves with an ever expanding list of conditional package manager calls to make, depending on the detected Linux distro. Eek!

Heather Lanigan (hmlanigan) wrote :

Looking for an alternative for `at` which is installed by default on various flavors of linux.

Other dependencies installed are juju-db on controllers, lxd and kvm. There really are not alternatives to any of other. The inability to install lxd or kvm, means that they would not be used for containers on that host.

Pete Vander Giessen (petevg) wrote :

Following up on the discussion this morning: I think the current plan is to use nohup and/or fork fork setid, as both are more portable.

@hmlanigan: does this sound accurate to you?

Heather Lanigan (hmlanigan) wrote :

https://github.com/juju/juju/pull/12197

Juju will be using `nohup` instead of `at`

james beedy (jamesbeedy) wrote :

Woohoo! Thank you!!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers