prometheus alertmanager fails to start on centos8 stream

Bug #1926463 reported by Radosław Piliszek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Piotr Parczewski
Wallaby
Fix Committed
High
Unassigned
Xena
Fix Released
High
Piotr Parczewski

Bug Description

Applies to master (Xena) and Wallaby at least. Reproducible in CI (CentOS 8 Stream).

Docker logs show only:

2021-04-24T19:11:11.546436853Z level=warn ts=2021-04-24T19:11:11.546Z caller=cluster.go:154 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
2021-04-24T19:11:11.549184209Z level=error ts=2021-04-24T19:11:11.548Z caller=main.go:256 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"

Revision history for this message
Piotr Parczewski (parczewski) wrote :

Happy to work on this one

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Here you go! :-)

Revision history for this message
Piotr Parczewski (parczewski) wrote :

Could not reproduce on Ubuntu binary all-in-one deployment (Ubuntu 20.04.2 LTS host)

level=info ts=2021-05-05T22:54:53.276Z caller=main.go:231 msg="Starting Alertmanager" version="(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)"
level=info ts=2021-05-05T22:54:53.276Z caller=main.go:232 build_context="(go=go1.13.5, user=root@00c3106655f8, date=20191211-14:13:14)"
level=info ts=2021-05-05T22:54:53.288Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=10.0.3.78 port=9094
level=info ts=2021-05-05T22:54:53.298Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2021-05-05T22:54:53.365Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/prometheus/alertmanager.yml
level=info ts=2021-05-05T22:54:53.368Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/prometheus/alertmanager.yml
level=info ts=2021-05-05T22:54:53.375Z caller=main.go:497 msg=Listening address=10.0.3.78:9093
level=info ts=2021-05-05T22:54:55.298Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000291901s
level=info ts=2021-05-05T22:55:03.300Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.002002616s

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I believe we need to add this explicit IP address the error mentions - your scenario probably included a private IP address that Prometheus used.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Updated - this only affects CentOS 8 Stream (perhaps non-stream too but we do not care).

summary: - prometheus alertmanager fails to start
+ prometheus alertmanager fails to start on centos8 stream
description: updated
Revision history for this message
Piotr Parczewski (parczewski) wrote :

It's starting to look like a CI-only issue - deployed C8 Stream on same Ubuntu host, works:

++ cat /run_command
+ CMD='/opt/prometheus_alertmanager/alertmanager --config.file=/etc/prometheus/alertmanager.yml --web.listen-address=10.0.3.78:9093 --web.external-url=http://10.14.10.254:9093 --storage.path /var/lib/prometheus'
+ ARGS=
+ sudo kolla_copy_cacerts
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/prometheus ]]
+++ stat -c %a /var/log/kolla/prometheus
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/prometheus
+ echo 'Running command: '\''/opt/prometheus_alertmanager/alertmanager --config.file=/etc/prometheus/alertmanager.yml --web.listen-address=10.0.3.78:9093 --web.external-url=http://10.14.10.254:9093 --storage.path /var/lib/prometheus'\'''
+ exec /opt/prometheus_alertmanager/alertmanager --config.file=/etc/prometheus/alertmanager.yml --web.listen-address=10.0.3.78:9093 --web.external-url=http://10.14.10.254:9093 --storage.path /var/lib/prometheus
Running command: '/opt/prometheus_alertmanager/alertmanager --config.file=/etc/prometheus/alertmanager.yml --web.listen-address=10.0.3.78:9093 --web.external-url=http://10.14.10.254:9093 --storage.path /var/lib/prometheus'
level=info ts=2021-05-06T09:54:17.604Z caller=main.go:231 msg="Starting Alertmanager" version="(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)"
level=info ts=2021-05-06T09:54:17.605Z caller=main.go:232 build_context="(go=go1.13.5, user=root@00c3106655f8, date=20191211-14:13:14)"
level=info ts=2021-05-06T09:54:17.623Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=10.0.3.78 port=9094
level=info ts=2021-05-06T09:54:17.628Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2021-05-06T09:54:17.706Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/prometheus/alertmanager.yml
level=info ts=2021-05-06T09:54:17.707Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/prometheus/alertmanager.yml
level=info ts=2021-05-06T09:54:17.716Z caller=main.go:497 msg=Listening address=10.0.3.78:9093
level=info ts=2021-05-06T09:54:19.629Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000228154s
level=info ts=2021-05-06T09:54:27.630Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.001328241s
root@piotr-kayobe:~# docker exec -ti prometheus_alertmanager bash
(prometheus-alertmanager)[prometheus@piotr-kayobe /]$ cat /etc/redhat-release
CentOS Stream release 8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/790058

Changed in kolla-ansible:
status: Triaged → In Progress
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Yes, but notice that CI simply does not use local addressing:

/opt/prometheus_alertmanager/alertmanager --config.file=/etc/prometheus/alertmanager.yml --web.listen-address=192.0.2.1:9093 --web.external-url=http://192.0.2.10:9093 --storage.path /var/lib/prometheus

These are documentation addresses (on purpose).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/790058
Committed: https://opendev.org/openstack/kolla-ansible/commit/b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72
Submitter: "Zuul (22348)"
Branch: master

commit b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72
Author: Piotr Parczewski <email address hidden>
Date: Thu May 6 14:45:10 2021 +0200

    Disable Alertmanager's peer gossip in non-HA deployments

    Reference:

    https://github.com/prometheus/alertmanager#turn-off-high-availability

    Closes-Bug: #1926463
    Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/790848

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/790849

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/790850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/790851

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/790851
Committed: https://opendev.org/openstack/kolla-ansible/commit/05b551b73c040c7c303ee07f134369582e9d574f
Submitter: "Zuul (22348)"
Branch: stable/train

commit 05b551b73c040c7c303ee07f134369582e9d574f
Author: Piotr Parczewski <email address hidden>
Date: Thu May 6 14:45:10 2021 +0200

    Disable Alertmanager's peer gossip in non-HA deployments

    Reference:

    https://github.com/prometheus/alertmanager#turn-off-high-availability

    Closes-Bug: #1926463
    Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
    (cherry picked from commit b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72)

tags: added: in-stable-train
tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/790850
Committed: https://opendev.org/openstack/kolla-ansible/commit/a99debd15f989c995dbf846a1a8908eb6a10b171
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit a99debd15f989c995dbf846a1a8908eb6a10b171
Author: Piotr Parczewski <email address hidden>
Date: Thu May 6 14:45:10 2021 +0200

    Disable Alertmanager's peer gossip in non-HA deployments

    Reference:

    https://github.com/prometheus/alertmanager#turn-off-high-availability

    Closes-Bug: #1926463
    Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
    (cherry picked from commit b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/790849
Committed: https://opendev.org/openstack/kolla-ansible/commit/edd64f3c4a9d95d13a9f35d1e0d8a7d24b8d0949
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit edd64f3c4a9d95d13a9f35d1e0d8a7d24b8d0949
Author: Piotr Parczewski <email address hidden>
Date: Thu May 6 14:45:10 2021 +0200

    Disable Alertmanager's peer gossip in non-HA deployments

    Reference:

    https://github.com/prometheus/alertmanager#turn-off-high-availability

    Closes-Bug: #1926463
    Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
    (cherry picked from commit b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/790848
Committed: https://opendev.org/openstack/kolla-ansible/commit/e0fc09cded62993e019da61ea79f5e25d33129c5
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit e0fc09cded62993e019da61ea79f5e25d33129c5
Author: Piotr Parczewski <email address hidden>
Date: Thu May 6 14:45:10 2021 +0200

    Disable Alertmanager's peer gossip in non-HA deployments

    Reference:

    https://github.com/prometheus/alertmanager#turn-off-high-availability

    Closes-Bug: #1926463
    Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
    (cherry picked from commit b300f7bc40bdcdeb0a520c1f3fcce85fe1b7ca72)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.3.2

This issue was fixed in the openstack/kolla-ansible 9.3.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 12.0.0.0rc2

This issue was fixed in the openstack/kolla-ansible 12.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 11.1.0

This issue was fixed in the openstack/kolla-ansible 11.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 10.3.0

This issue was fixed in the openstack/kolla-ansible 10.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 13.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 13.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.