pacemaker resources getting restarted while adding third party services to existing cloud

Bug #1946252 reported by Shyam
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

Description:
========================
Here is a third party service template(trilio datamover):
https://github.com/trilioData/triliovault-cfg-scripts/blob/master/redhat-director-scripts/rhosp16.1/services/trilio-datamover-api.yaml

Here is how we add haproxy entry for trilio datamover service:
https://github.com/trilioData/triliovault-cfg-scripts/blob/master/redhat-director-scripts/rhosp16.1/services/trilio-datamover-api.yaml#L166-L176

When we run deploy command, during it's execution, many pacemaker resource gets restarted and multiple resources gets into 'FAILED' state including haproxy bundle resource.

And finally deployment fails with following error. We are facing this issue from last 6 months. Earlier it was not there. We are facing it for RHOSP13 and RHOSP16.1
-------------------------------------------------------------
2021-04-12 22:12:16Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

Stack overcloud UPDATE_FAILED

overcloud.AllNodesDeploySteps.ControllerDeployment_Step2.0:
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: b366d035-2f3d-440c-ba44-013d371ef641
status: UPDATE_FAILED
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |
...
"Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Ipv6 instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions. at [\"/etc/puppet/modules/tripleo/manifests/pacemaker/haproxy_with_vip.pp\", 72]:",
"Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.",
"Completed $ docker run --name haproxy_init_bundle --label config_id=tripleo_step2 --label containeHeat Stack update failed.
Heat Stack update failed.
r_name=haproxy_init_bundle --label managed_by=paunch --label config_data={\"start_order\": 3, \"image\": \"10.8.17.23:8787/rhosp13/openstack-haproxy:latest\", \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1618261038\"], \"command\": [\"/docker_puppet_apply.sh\", \"2\", \"file,file_line,concat,augeas,tripleo::firewall::rule,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation\", \"include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::haproxy_bundle\", \"\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro\", \"/etc/puppet:/tmp/puppet-etc:ro\", \"/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro\", \"/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro\", \"/etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro\", \"/etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro\", \"/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro\", \"/etc/sysconfig:/etc/sysconfig:rw\", \"/usr/libexec/iptables:/usr/libexec/iptables:ro\", \"/usr/libexec/initscripts/legacy-actions:/usr/libexec/initscripts/legacy-actions:ro\", \"/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro\", \"/dev/shm:/dev/shm:rw\"], \"net\": \"host\", \"detach\": false, \"privileged\": true} --env=TRIPLEO_DEPLOY_IDENTIFIER=1618261038 --net=host --privileged=true --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro --volume=/etc/puppet:/tmp/puppet-etc:ro --volume=/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro --volume=/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume=/etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro --volume=/etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro --volume=/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro --volume=/etc/sysconfig:/etc/sysconfig:rw --volume=/usr/libexec/iptables:/usr/libexec/iptables:ro --volume=/usr/libexec/initscripts/legacy-actions:/usr/libexec/initscripts/legacy-actions:ro --volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro --volume=/dev/shm:/dev/shm:rw --cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 10.8.17.23:8787/rhosp13/openstack-haproxy:latest /docker_puppet_apply.sh 2 file,file_line,concat,augeas,tripleo::firewall::rule,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::haproxy_bundle "
]
}
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/47067653-a812-4df9-a4d7-1d99a44ac403_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=13 changed=8 unreachable=0 failed=1

(truncated, view all with --long)
deploy_stderr: |
----------------------------------------------------------------

Steps to Reproduce:
========================

1. Deploy RHOSP13 or RHOSP16.1 cloud
2. Deploy trilio datamover service using overcloud deploy command:
Here is it's install doc: https://docs.trilio.io/openstack/deployment/configuring-and-installing-triliovault/installing-on-rhosp
3. Deployment will fail with error mentioned in 'Description' section.
4. Here is trilio datamover integration code for reference: https://github.com/trilioData/triliovault-cfg-scripts/tree/master/redhat-director-scripts/rhosp16.1

Expected Result:
=========================

-- During Overcloud deploy/update any pacemaker resources should not get restarted.

Actual Result:
========================

Multiple pacemaker resource getting restarted during overcloud deploy/update command execution

Environment:
========================

RHOSP13 and RHOSP16.1

Logs and Configs:
=========================

--------Error trace---------------
2021-04-12 22:12:16Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

Stack overcloud UPDATE_FAILED

overcloud.AllNodesDeploySteps.ControllerDeployment_Step2.0:
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: b366d035-2f3d-440c-ba44-013d371ef641
status: UPDATE_FAILED
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |
...
"Warning: This method is deprecated, please use match expressions with Stdlib::Compat::Ipv6 instead. They are described at https://docs.puppet.com/puppet/latest/reference/lang_data_type.html#match-expressions. at [\"/etc/puppet/modules/tripleo/manifests/pacemaker/haproxy_with_vip.pp\", 72]:",
"Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.",
"Completed $ docker run --name haproxy_init_bundle --label config_id=tripleo_step2 --label containeHeat Stack update failed.
Heat Stack update failed.
r_name=haproxy_init_bundle --label managed_by=paunch --label config_data={\"start_order\": 3, \"image\": \"10.8.17.23:8787/rhosp13/openstack-haproxy:latest\", \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1618261038\"], \"command\": [\"/docker_puppet_apply.sh\", \"2\", \"file,file_line,concat,augeas,tripleo::firewall::rule,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation\", \"include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::haproxy_bundle\", \"\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro\", \"/etc/puppet:/tmp/puppet-etc:ro\", \"/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro\", \"/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro\", \"/etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro\", \"/etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro\", \"/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro\", \"/etc/sysconfig:/etc/sysconfig:rw\", \"/usr/libexec/iptables:/usr/libexec/iptables:ro\", \"/usr/libexec/initscripts/legacy-actions:/usr/libexec/initscripts/legacy-actions:ro\", \"/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro\", \"/dev/shm:/dev/shm:rw\"], \"net\": \"host\", \"detach\": false, \"privileged\": true} --env=TRIPLEO_DEPLOY_IDENTIFIER=1618261038 --net=host --privileged=true --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro --volume=/etc/puppet:/tmp/puppet-etc:ro --volume=/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro --volume=/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume=/etc/pki/tls/private/haproxy:/etc/pki/tls/private/haproxy:ro --volume=/etc/pki/tls/certs/haproxy:/etc/pki/tls/certs/haproxy:ro --volume=/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro --volume=/etc/sysconfig:/etc/sysconfig:rw --volume=/usr/libexec/iptables:/usr/libexec/iptables:ro --volume=/usr/libexec/initscripts/legacy-actions:/usr/libexec/initscripts/legacy-actions:ro --volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro --volume=/dev/shm:/dev/shm:rw --cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 10.8.17.23:8787/rhosp13/openstack-haproxy:latest /docker_puppet_apply.sh 2 file,file_line,concat,augeas,tripleo::firewall::rule,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::haproxy_bundle "
]
}
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/47067653-a812-4df9-a4d7-1d99a44ac403_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=13 changed=8 unreachable=0 failed=1

(truncated, view all with --long)
deploy_stderr: |

-------------------------------------------------

Overcloud deploy command:
===============================
openstack overcloud deploy --templates \
  -e /home/stack/templates/node-info.yaml \
  -e /home/stack/templates/hostname.yaml \
  -e /home/stack/templates/environment-rhel-registration.yaml \
  -e /home/stack/templates/rhel-registration-resource-registry.yaml \
  -e /home/stack/templates/multipath_heat_resource.yaml \
  -e /home/stack/templates/overcloud_images.yaml \
  -e /home/stack/templates/custom-env.yaml \
  -e /home/stack/templates/multipath.yaml \
  -e /home/stack/templates/ceph-config.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-internal-tls.yaml \
  -e /home/stack/templates/custom-domain.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml \
  -r /home/stack/templates/roles_data.yaml \
  --ntp-server 192.168.1.34 \
-e /home/stack/triliovault-cfg-scripts/redhat-director-scripts/rhosp13/environments/trilio_env.yaml \
-e /home/stack/triliovault-cfg-scripts/redhat-director-scripts/rhosp13/environments/trilio_env_tls_everywhere_dns.yaml \
-r /home/stack/templates/roles_data.yaml \
  --libvirt-type qemu \
  --log-file overcloud_deploy.log
==========================================================

I will attach all related templates to this bug.

Here is the github repo for trilio datamover related templates(It's public):
https://github.com/trilioData/triliovault-cfg-scripts/tree/master/redhat-director-scripts/rhosp16.1

Shyam (shyam.biradar)
description: updated
description: updated
summary: - While adding a haproxy entry in haproxy.cfg, haproxy pacemaker resource
- and other pacemaker resource getting restarted
+ pacemaker resources getting restarted while adding third party services
+ to existing cloud
Revision history for this message
Shyam (shyam.biradar) wrote :

Hi Team,

Can you please confirm this bug?

Thank you.

Revision history for this message
Shyam (shyam.biradar) wrote :

Templates used for deploy command attached.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The provided snippets show no real error. Please provide contents of /var/log/paunch.log and /var/log/extra

Changed in tripleo:
status: New → Incomplete
Revision history for this message
Damien Ciabrini (dciabrin) wrote :

If you can consistently reproduce that, could you collect sosreports of the three controller nodes?

Normally a pacemaker resource (e.g. galera) is restarted only when:
 . a config file for the service has changed (e.g /etc/my.cnf.d.galera.cnf)
 . the container name has changed
 . a pacemaker config has changed for this resource

Any other restart is unexpected, and I can't think of anything yet that would cause that.

Revision history for this message
Shyam (shyam.biradar) wrote :

Sure Bogdan, will attach those logs files.

Hi Damien, okay, I will collect the sosreports.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.