After a machine reboot, enable-auto-restars still requires a run-deferred-hooks action

Bug #1937307 reported by Garrett Neugent
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
charm-ovn-chassis
Triaged
High
Liam Young

Bug Description

In a customer cloud, I applied some sysconfig and nova-compute settings ( cpu-range for sysconfig, and cpu-dedicated-set for nova-compute) before restarting the machines. However, after restarting the machines for these settings to take effect, the subordinate ovn-chassis units still showed the message "Hooks skipped due to disabled auto restarts: configure_ovs, install".

As a workaround, running the "run-deferred-hooks" action resolved this, but I don't believe this should be necessary.

This cloud is running bionic/ussuri, and the ovn-chassis units are on cs:ovn-chassis-14.

Revision history for this message
Giuseppe Petralia (peppepetra) wrote :

We are hitting this frequently on production cloud. After rebooting machines, if operator doesn't run the deferred hooks, customer reports random issues with VMs networking.

Revision history for this message
Liam Young (gnuoy) wrote :

If you're doing planned maintenance then I would suggest setting enable-auto-restarts to True before you reboot the machine. I will take a look and see if it feasible for the charm to detect a reboot and let the hook run if a reboot has taken place.

Changed in charm-ovn-chassis:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Trent Lloyd (lathiat) wrote :

Sounds like a second bug here. It shouldn’t need a restart to work after reboot. We should triage and file that bug. Then this would be more cosmetic.

I would expect it to function after a reboot if jujud was disabled.

In terms of hard reboot the juju “start” hook runs on boot. Once upon a time config_changed would also run on boot but that was removed in one of the earlier juju 2.x revisions.

Liam Young (gnuoy)
Changed in charm-ovn-chassis:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :

I agree with Trents analysis here, we should focus on why a restart is needed after the reboot.

Is this an environment thats using either sriov or dpdk ?

Changed in charm-ovn-chassis:
status: Triaged → Incomplete
Revision history for this message
Garrett Neugent (thogarre) wrote :

Thanks for looking at this! This particular customer cloud is using sriov nodes.

Changed in charm-ovn-chassis:
status: Incomplete → New
Revision history for this message
Liam Young (gnuoy) wrote :

<tl;dr> As Trent suggests a new bug should be opened as connectivity to the guests should not rely on the charm taking an action at all after a reboot. In the new bug please include logs from the rebooted host.</tl;dr>

I have tried to reproduce what you are seeing and I cannot reproduce it exactly. I did have some issue with connectivity after a reboot with the charm version you are using (cs:ovn-chassis-14). Re-testing with cs:ovn-chassis-21 resolved my issues irrespective of whether enable-auto-restarts was True of False. This is quite confusing given that the point is that the charm should be irrelevant (the jujud is disabled scenario) I can only assume the new version of the ovn-chassis added a setting in the ovs db that is needed after a reboot, the enable-chassis-as-gw looks most likely. You could try upgrading to the latest version of the charm to see if that helps.

Having said all that I agree that when you do maintenance it would be useful to change `enable-auto-restarts` at the unit level, I've proposed something here:
https://bugs.launchpad.net/charm-ovn-chassis/+bug/1943970/comments/3 and feedback would be much appreciated.

Changed in charm-ovn-chassis:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.