Ceilometer-agent-compute service not running due to start-limit-hit

Bug #1947585 reported by Aurelien Lourot
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Ceilometer Agent Charm
Invalid
Undecided
Unassigned
OpenStack Nova Compute Charm
Fix Released
High
Unassigned
Train
Fix Committed
Undecided
Unassigned
Ussuri
Fix Committed
Undecided
Unassigned
Victoria
Fix Committed
Undecided
Unassigned
Wallaby
Fix Committed
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned

Bug Description

This is a spin-off of lp:1927277. On a fresh deployment, some ceilometer-agent units end up blocked due to the ceilometer-agent-compute not running.

The ceilometer-agent-compute.log show no problem.

`systemctl status ceilometer-agent-compute` shows the service as failed and `journalctl -u ceilometer-agent-compute` shows

Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: ceilometer-agent-compute.service: Start request repeated too quickly.
Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: ceilometer-agent-compute.service: Failed with result 'start-limit-hit'.
Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: Failed to start Ceilometer Agent Compute.

I suspect ceilometer-agent-compute was being brought up before the nova-compute service, although ceilometer-agent-compute has a dependency to nova-compute on service level. Or maybe ceilometer-agent-compute was being brought up at a very specific moment when nova-compute was down. This seems to happen often on certain labs and pretty much never on other labs, so this smells like a race condition.

Workaround: systemctl restart ceilometer-agent-compute

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
description: updated
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
tags: added: cdo-qa foundations-engine
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

+1, have also seen this on a fresh focal/ussuri deployment.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I'm seeing this issue with 21.10 charms release. Restarting `ceilometer-agent-compute` on `ceilometer-agent` is a workaround that works for me too.

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :

+1 I'm seeing this problem with focal/ussuri deployments as well

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Were are seeing this with sqa too. You can find the crashdumps here: https://solutions.qa.canonical.com/bugs/bugs/bug/1947585 under the testrun id and then bottom of the page.

tags: added: aubergine
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

From a recent discussion with @gnuoy here is what we think would be the best solution. Because this seems to be a race where the ceilometer-agent-compute service gets installed before the principal charm's nova-compute service, and because the principal charm receives the list of services from its subordinates, the principal charm (nova-compute) could simply request starting all services (including the subordinate ones) right after having installed the nova-compute service.

I'm thus moving this bug to the nova-compute charm.

Changed in charm-ceilometer-agent:
status: New → Invalid
Changed in charm-nova-compute:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/833886
Committed: https://opendev.org/openstack/charm-nova-compute/commit/140be9d0a99f65769899a710ea9d378a29a29e80
Submitter: "Zuul (22348)"
Branch: master

commit 140be9d0a99f65769899a710ea9d378a29a29e80
Author: Aurelien Lourot <email address hidden>
Date: Tue Mar 15 18:09:57 2022 +0100

    Restart failed subordinate services

    Change-Id: Id34e9c6f85886dbf880df0b7002110a40ef41ad6
    Closes-Bug: #1947585

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Changed in charm-nova-compute:
milestone: none → 22.04
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

This bug is still active in the SQA lab, all the recent occurrences can be found here:
https://solutions.qa.canonical.com/bugs/bugs/bug/1947585

It seems like the bug went away temporarily when the fix was released. Maybe an update about a month later interfered with the fix?

We see it on all Openstack versions that we are testing.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Looking at testrun https://solutions.qa.canonical.com/testruns/testRun/3320bd48-9629-4d8b-b21d-04fd05d9fd15, I think this is still the same bug that is occurring:

```
syslog-Dec 8 12:03:06 solqa-lab1-server-07 systemd[1]: nova-compute.service: Start request repeated too quickly.
syslog:Dec 8 12:03:06 solqa-lab1-server-07 systemd[1]: nova-compute.service: Failed with result 'start-limit-hit'.
syslog:Dec 8 12:03:06 solqa-lab1-server-07 systemd[1]: Failed to start OpenStack Compute.
```

It seems like the fix never made it to the ussuri/stable channel.

Testrun https://solutions.qa.canonical.com/testruns/testRun/dfb1e0b1-afb4-4f58-8754-bbd57939467f should have the fix because it is running yoga/stable but there we also see:
```
syslog-Dec 6 20:04:21 solqa-lab1-server-09 systemd[1]: nova-compute.service: Start request repeated too quickly.
syslog:Dec 6 20:04:21 solqa-lab1-server-09 systemd[1]: nova-compute.service: Failed with result 'exit-code'.
syslog:Dec 6 20:04:21 solqa-lab1-server-09 systemd[1]: Failed to start OpenStack Compute.
```

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/873281

Revision history for this message
Nobuto Murata (nobuto) wrote :

The task for charm-nova-compute is supposed to be "In progress" due to:
https://review.opendev.org/c/openstack/charm-nova-compute/+/875726

But I just don't have a permission to change the status from Fix Released.

Changed in charm-nova-compute:
status: Fix Released → In Progress
assignee: Aurelien Lourot (aurelien-lourot) → Liam Young (gnuoy)
Felipe Reyes (freyes)
Changed in charm-nova-compute:
status: In Progress → Fix Released
assignee: Liam Young (gnuoy) → nobody
Felipe Reyes (freyes)
Changed in charm-nova-compute:
status: Fix Released → In Progress
Revision history for this message
Bartosz Woronicz (mastier1) wrote :

The issue still occuring, on yoga/stable with Jammy.
Do you need anymore feedback help in solving that ?

Changed in charm-nova-compute:
status: In Progress → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/890664

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/890665

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/890666

Felipe Reyes (freyes)
Changed in charm-nova-compute:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/890699

Revision history for this message
Felipe Reyes (freyes) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-nova-compute (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/891904

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-nova-compute (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/892171

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-nova-compute (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/892172

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-nova-compute (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/892173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/875726
Committed: https://opendev.org/openstack/charm-nova-compute/commit/cb04103e08751107b0555b0c2b175a7097fedcc9
Submitter: "Zuul (22348)"
Branch: master

commit cb04103e08751107b0555b0c2b175a7097fedcc9
Author: Liam Young <email address hidden>
Date: Tue Feb 28 13:31:28 2023 +0000

    Do not manage subordinate service restarts

    The subordinate charms should manage the services that
    they deploys and configure, not the principle they are related to.
    This change switches the approach for restarting services
    from having the nova-compute charm doing it directly to having
    nova-compute triggering the restart by request a restart down
    the existing relations.

    Closes-Bug: #1947585

    Change-Id: I7419e39d68c70d21a11d03deeff9699421b0571e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-nova-compute (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/891904
Committed: https://opendev.org/openstack/charm-nova-compute/commit/12b6f4e3d2dcfa263ab48b6ca3fb079b83f98cc4
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 12b6f4e3d2dcfa263ab48b6ca3fb079b83f98cc4
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 17 16:05:09 2023 -0400

    Charm-helpers sync

    Change-Id: I512cc91b4fad7ebdbb9fcb1f7436e15eb3b2bf9f
    Related-Bug: #1947585

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/890664
Committed: https://opendev.org/openstack/charm-nova-compute/commit/e04440fd1a7287c7051c2e2e6895935bc6eb5ea4
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit e04440fd1a7287c7051c2e2e6895935bc6eb5ea4
Author: Aurelien Lourot <email address hidden>
Date: Tue Mar 15 18:09:57 2022 +0100

    Restart failed subordinate services

    Resolved Conflicts:
            hooks/nova_compute_hooks.py
            unit_tests/test_nova_compute_hooks.py

    Change-Id: Id34e9c6f85886dbf880df0b7002110a40ef41ad6
    Closes-Bug: #1947585
    (cherry picked from commit 140be9d0a99f65769899a710ea9d378a29a29e80)
    (cherry picked from commit 5465fe74b4e49e2a94e6db98cf00d6f7be2eaff6)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-nova-compute (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/892171
Committed: https://opendev.org/openstack/charm-nova-compute/commit/59a7973f1ae3abdd8f5b318d6428ee5505f8b457
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 59a7973f1ae3abdd8f5b318d6428ee5505f8b457
Author: Felipe Reyes <email address hidden>
Date: Mon Aug 21 14:41:25 2023 -0400

    Charm-helpers sync

    Change-Id: Ie5afedc6b88ee8d1067baeea88bd0f566d7fc286
    Related-Bug: #1947585

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/890665
Committed: https://opendev.org/openstack/charm-nova-compute/commit/59e5c578135be6bf978699651e9dfc9ed2803cfa
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 59e5c578135be6bf978699651e9dfc9ed2803cfa
Author: Aurelien Lourot <email address hidden>
Date: Tue Mar 15 18:09:57 2022 +0100

    Restart failed subordinate services

    Resolved Conflicts:
            hooks/nova_compute_hooks.py
            unit_tests/test_nova_compute_hooks.py

    Change-Id: Id34e9c6f85886dbf880df0b7002110a40ef41ad6
    Closes-Bug: #1947585
    (cherry picked from commit 140be9d0a99f65769899a710ea9d378a29a29e80)
    (cherry picked from commit 5465fe74b4e49e2a94e6db98cf00d6f7be2eaff6)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-nova-compute (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/892172
Committed: https://opendev.org/openstack/charm-nova-compute/commit/c19011018ce7ce67ed6b86bdc8aa3c536ac5e540
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit c19011018ce7ce67ed6b86bdc8aa3c536ac5e540
Author: Felipe Reyes <email address hidden>
Date: Mon Aug 21 14:44:07 2023 -0400

    Charm-helpers sync

    Change-Id: Ib0dec545dfdf6017574426714d9ceaf70c3b5920
    Related-Bug: #1947585

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/890666
Committed: https://opendev.org/openstack/charm-nova-compute/commit/1755906cc7f986b29ec2d1365109b41f0b7d44c0
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 1755906cc7f986b29ec2d1365109b41f0b7d44c0
Author: Aurelien Lourot <email address hidden>
Date: Tue Mar 15 18:09:57 2022 +0100

    Restart failed subordinate services

    Resolved Conflicts:
            hooks/nova_compute_hooks.py
            unit_tests/test_nova_compute_hooks.py

    Change-Id: Id34e9c6f85886dbf880df0b7002110a40ef41ad6
    Closes-Bug: #1947585
    (cherry picked from commit 140be9d0a99f65769899a710ea9d378a29a29e80)
    (cherry picked from commit 5465fe74b4e49e2a94e6db98cf00d6f7be2eaff6)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-nova-compute (stable/train)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/892173
Committed: https://opendev.org/openstack/charm-nova-compute/commit/81d4d49f8087781a7ce6491ff9cefa38881532a5
Submitter: "Zuul (22348)"
Branch: stable/train

commit 81d4d49f8087781a7ce6491ff9cefa38881532a5
Author: Felipe Reyes <email address hidden>
Date: Mon Aug 21 14:46:20 2023 -0400

    Charm-helpers sync

    Change-Id: I87d43479466e48d13ff836e9e5be760a867e411f
    Related-Bug: #1947585

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/train)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/890699
Committed: https://opendev.org/openstack/charm-nova-compute/commit/a146407dd7fffd8653be8a8a5eaff75d48a7e16a
Submitter: "Zuul (22348)"
Branch: stable/train

commit a146407dd7fffd8653be8a8a5eaff75d48a7e16a
Author: Aurelien Lourot <email address hidden>
Date: Tue Mar 15 18:09:57 2022 +0100

    Restart failed subordinate services

    Resolved Conflicts:
            hooks/nova_compute_hooks.py
            unit_tests/test_nova_compute_hooks.py

    Change-Id: Id34e9c6f85886dbf880df0b7002110a40ef41ad6
    Closes-Bug: #1947585
    (cherry picked from commit 140be9d0a99f65769899a710ea9d378a29a29e80)
    (cherry picked from commit 5465fe74b4e49e2a94e6db98cf00d6f7be2eaff6)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.