veritas rabbitmq users do not get created

Bug #1739026 reported by Michele Baldessari
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Emilien Macchi

Bug Description

Currently if you deploy with veritas hyperscale, the hyperscale rabbitmq user does not get created.
The reason for this is that when the veritas rabbit user gets created via puppet there is no guarantee at all that rabbit is up and running so two things might happen:
A) The user creation simply fails and so does the deployment
B) The user might be created when the rabbit node is up but not clustered and so the user will be discarded once rabbit forms a proper cluster.

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/529470

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/529483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/529470
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=2baca3211a23dc723d9ffcf160cad9e0c5ee4912
Submitter: Zuul
Branch: stable/pike

commit 2baca3211a23dc723d9ffcf160cad9e0c5ee4912
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 12 14:56:33 2017 +0100

    Wait for rabbitmq_ready tag

    We need to wait for rabbitmq_ready exec so that rabbit is fully
    up. This can only happen if we add the tag for it.
    Also we need to make sure that launching the epmd process cannot
    happen. The reason for this is the following:
    When the puppet-rabbitmq module gets invoked (a simple facter run
    will be sufficient) inside the rabbitmq_init_bundle container it spawns
    an epmd process.
    Now if we wait for the Exec[rabbitmq-ready], it means that this epmd
    process is staying around until rabbit is up, but then will disappear
    suddenly when the rabbitmq_init_bundle container exits, which will
    subsequently confuse the rabbitmq cluster and make it fail.

    Partial-Bug: #1739026

    Co-Authored-By: Damien Ciabrini <email address hidden>
    Co-Authored-By: John Eckersberg <email address hidden>

    Change-Id: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79
    (cherry picked from commit bab6ec25323fd9a667efc08825e4bc7b1732f056)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/527404
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=bab6ec25323fd9a667efc08825e4bc7b1732f056
Submitter: Zuul
Branch: master

commit bab6ec25323fd9a667efc08825e4bc7b1732f056
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 12 14:56:33 2017 +0100

    Wait for rabbitmq_ready tag

    We need to wait for rabbitmq_ready exec so that rabbit is fully
    up. This can only happen if we add the tag for it.
    Also we need to make sure that launching the epmd process cannot
    happen. The reason for this is the following:
    When the puppet-rabbitmq module gets invoked (a simple facter run
    will be sufficient) inside the rabbitmq_init_bundle container it spawns
    an epmd process.
    Now if we wait for the Exec[rabbitmq-ready], it means that this epmd
    process is staying around until rabbit is up, but then will disappear
    suddenly when the rabbitmq_init_bundle container exits, which will
    subsequently confuse the rabbitmq cluster and make it fail.

    Partial-Bug: #1739026

    Co-Authored-By: Damien Ciabrini <email address hidden>
    Co-Authored-By: John Eckersberg <email address hidden>

    Change-Id: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/527403
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=2f33d74173b79117c962146ac2c88fe1e3836403
Submitter: Zuul
Branch: master

commit 2f33d74173b79117c962146ac2c88fe1e3836403
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 12 14:30:09 2017 +0100

    Fix up the rabbitmq-ready check

    So the current rabbitmq-ready exec has a few unexpected problems:

    1) The notify mechanism is not being called, but after discussion
    we're comfortable in calling this all the time, just like we do this
    for galera.
    2) Calling rabbitmqctl inside a container is problematic because
    the mere invocation of the cluster_status command will actually
    spawn an epmd process which will take the epmd port and which will
    subsequently make the rabbitmq-bundle started by pacemaker fail to
    form a cluster.

    For this reason (working around the rabbitmqctl issue is potentially
    doable once we upgrade to erlang 19.x but not with older versions)
    it is vital that this container gets spawned with /bin/epmd nooped
    to /bin/true.

    We now only proceed after rabbit tells us that it is part of a cluster.
    Just checking for rabbit being up is not enough because if the user gets
    created before the node joins a cluster, it might not be replicated
    (depending on the timing).

    Partial-Bug: #1739026

    Co-Authored-By: Damien Ciabrini <email address hidden>
    Co-Authored-By: John Eckersberg <email address hidden>
    Change-Id: I54c541d86782665ae0f689428a16edc155f87993
    Depends-On: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/529483
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=1bd793630d00692cede869190b8b83f68edc8caa
Submitter: Zuul
Branch: stable/pike

commit 1bd793630d00692cede869190b8b83f68edc8caa
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 12 14:30:09 2017 +0100

    Fix up the rabbitmq-ready check

    So the current rabbitmq-ready exec has a few unexpected problems:

    1) The notify mechanism is not being called, but after discussion
    we're comfortable in calling this all the time, just like we do this
    for galera.
    2) Calling rabbitmqctl inside a container is problematic because
    the mere invocation of the cluster_status command will actually
    spawn an epmd process which will take the epmd port and which will
    subsequently make the rabbitmq-bundle started by pacemaker fail to
    form a cluster.

    For this reason (working around the rabbitmqctl issue is potentially
    doable once we upgrade to erlang 19.x but not with older versions)
    it is vital that this container gets spawned with /bin/epmd nooped
    to /bin/true.

    We now only proceed after rabbit tells us that it is part of a cluster.
    Just checking for rabbit being up is not enough because if the user gets
    created before the node joins a cluster, it might not be replicated
    (depending on the timing).

    Partial-Bug: #1739026

    Co-Authored-By: Damien Ciabrini <email address hidden>
    Co-Authored-By: John Eckersberg <email address hidden>
    Change-Id: I54c541d86782665ae0f689428a16edc155f87993
    Depends-On: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79
    (cherry picked from commit 2f33d74173b79117c962146ac2c88fe1e3836403)

Changed in tripleo:
assignee: Michele Baldessari (michele) → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/528378
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=9fbfc6889dc8d7471a1ef409eceaa8ea6db2c8c1
Submitter: Zuul
Branch: master

commit 9fbfc6889dc8d7471a1ef409eceaa8ea6db2c8c1
Author: Michele Baldessari <email address hidden>
Date: Fri Dec 15 20:24:48 2017 +0100

    Only create veritas rabbitmq users on the bootstrap node

    Currently, when deploying an overcloud with veritas hyperscale, the
    hyperscale rabbitmq user gets created on all rabbit nodes. This is
    problematic on an HA deployment for two reasons:
    A) rabbitmq already mirrors users
    B) On non bootstrap node this will fail because the rabbit pcmk resource
    creation happens only on the bootstrap node, which means that when the
    hyperscale rabbitmq include kicks in on non-bootstrap nodes it will try
    to prefetch all rabbitmq users via 'rabbitmqctl -q list_users' and
    fail because there are no guarantees that rabbit is up and running since
    no rabbitmq puppet stuff gets executed there. I.e. on non-bootstrap
    nodes the Exec['rabbitmq-ready] -> Rabbitmq_users<||> won't work.

    While we're at it we move this rabbitmq user process to step2. rabbitmq
    get started and created at step2 so it makes no sense to invoke user
    creation before that.

    Note that this should be considered a short-term fix only. The proper
    fix should be (if possible) to move the veritas rabbitmq user
    configuration in the dedicated veritas service.

    Closes-Bug: #1739026

    Change-Id: I36950b9cec9f02a0de55292342a3a699df0b3cf1

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/529950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/529950
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=7534f0c398f6eadd944174cc84e8d0df5268fa11
Submitter: Zuul
Branch: stable/pike

commit 7534f0c398f6eadd944174cc84e8d0df5268fa11
Author: Michele Baldessari <email address hidden>
Date: Fri Dec 15 20:24:48 2017 +0100

    Only create veritas rabbitmq users on the bootstrap node

    Currently, when deploying an overcloud with veritas hyperscale, the
    hyperscale rabbitmq user gets created on all rabbit nodes. This is
    problematic on an HA deployment for two reasons:
    A) rabbitmq already mirrors users
    B) On non bootstrap node this will fail because the rabbit pcmk resource
    creation happens only on the bootstrap node, which means that when the
    hyperscale rabbitmq include kicks in on non-bootstrap nodes it will try
    to prefetch all rabbitmq users via 'rabbitmqctl -q list_users' and
    fail because there are no guarantees that rabbit is up and running since
    no rabbitmq puppet stuff gets executed there. I.e. on non-bootstrap
    nodes the Exec['rabbitmq-ready] -> Rabbitmq_users<||> won't work.

    While we're at it we move this rabbitmq user process to step2. rabbitmq
    get started and created at step2 so it makes no sense to invoke user
    creation before that.

    Note that this should be considered a short-term fix only. The proper
    fix should be (if possible) to move the veritas rabbitmq user
    configuration in the dedicated veritas service.

    Closes-Bug: #1739026

    Change-Id: I36950b9cec9f02a0de55292342a3a699df0b3cf1
    (cherry picked from commit 9fbfc6889dc8d7471a1ef409eceaa8ea6db2c8c1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.7

This issue was fixed in the openstack/puppet-tripleo 7.4.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.2.0

This issue was fixed in the openstack/puppet-tripleo 8.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.