Bug #1713127 “tripleo fails to deploy in ci : Failed to call ref...” : Bugs : tripleo

Emilien Macchi (emilienm) on 2017-08-25

Changed in tripleo:
milestone:	none → pike-rc2

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2017-08-28:

#1

there are multiple hits reported (14 for multiple tripleo CI gates), this should be a high bug to get fixed in the pike scope

Changed in tripleo:
importance:	Medium → High

Emilien Macchi (emilienm) on 2017-08-30

tags:	added: ci
tags:	added: alert

Revision history for this message

Michele Baldessari (michele) wrote on 2017-08-30:

#2

Meh I think we lost the CIB collection capability in the CI when we moved to oooq? :/

Revision history for this message

Michele Baldessari (michele) wrote on 2017-08-30:

#3

So the issue seems that the stonith-enabled=false is never set and so pacemaker does not start the db and hence clustercheck fails.

On my working deployment I can see the following:
Aug 29 13:42:07 [19665] overcloud-controller-0 cib: info: cib_perform_op: ++ /cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options']: <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>

I can't see this here, so maybe we have some dependency issues here. What is a bit odd is that in the pacemaker tripleo profile we have the following (at step1):
    class { '::pacemaker':
      hacluster_pwd => hiera('hacluster_pwd'),
    }
    -> class { '::pacemaker::corosync':
      cluster_members => $pacemaker_cluster_members,
      setup_cluster => $pacemaker_master,
      cluster_setup_extras => $cluster_setup_extras,
      remote_authkey => $remote_authkey,
    }
    if $pacemaker_master {
      class { '::pacemaker::stonith':
        disable => !$enable_fencing,
        tries => $pcs_tries,
      }
    }

The creation of the galera resource happens at step 2, so we should be guaranteed to have stonith property set to false.

Revision history for this message

Michele Baldessari (michele) wrote on 2017-08-30:

#4

Oh I see the problem:
Aug 25 15:44:33 localhost os-collect-config: "Error: Execution of '/usr/bin/yum -d 0 -e 0 -y install fence-agents-all' returned 1: Error downloading packages:",
Aug 25 15:44:33 localhost os-collect-config: "Error: /Stage[main]/Pacemaker::Install/Package[fence-agents-all]/ensure: change from purged to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install fence-agents-all' returned 1: Error downloading packages:",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[pcsd]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/User[hacluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/File[etc-pacemaker]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/File[etc-pacemaker-authkey]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[Start Cluster tripleo_cluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[corosync]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[pacemaker]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]: Dependency Package[fence-agents-all] has failures: true",

Maybe since this is a multinode we should just preinstall fence-agents-all on the nodes in order to avoid these failures?

Oh I see the problem:
Aug 25 15:44:33 localhost os-collect-config: "Error: Execution of '/usr/bin/yum -d 0 -e 0 -y install fence-agents-all' returned 1: Error downloading packages:",
Aug 25 15:44:33 localhost os-collect-config: "Error: /Stage[main]/Pacemaker::Install/Package[fence-agents-all]/ensure: change from purged to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install fence-agents-all' returned 1: Error downloading packages:",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[pcsd]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/User[hacluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/File[etc-pacemaker]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/File[etc-pacemaker-authkey]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[Start Cluster tripleo_cluster]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[corosync]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Service/Service[pacemaker]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]: Dependency Package[fence-agents-all] has failures: true",
Aug 25 15:44:33 localhost os-collect-config: "Notice: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]: Dependency Package[fence-agents-all] has failures: true",

Maybe since this is a multinode we should just preinstall fence-agents-all on the nodes in order to avoid these failures?

Revision history for this message

Michele Baldessari (michele) wrote on 2017-08-31:

#5

Added elastic-recheck query here: https://review.openstack.org/499516

Revision history for this message

Michele Baldessari (michele) wrote on 2017-08-31:

#6

This is the full query to match clustercheck failures + fence-agents-all issue:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22Failed%20to%20call%20refresh%3A%20%2Fusr%2Fbin%2Fclustercheck%5C%22%20and%20message%3A%20%5C%22Error%3A%20Execution%20of%20'%2Fusr%2Fbin%2Fyum%20-d%200%20-e%200%20-y%20install%20fence-agents-all'%20returned%201%3A%20Error%20downloading%20packages%5C%22

Emilien Macchi (emilienm) on 2017-09-05

tags:

removed: alert

Emilien Macchi (emilienm) on 2017-09-05

Changed in tripleo:
milestone:	pike-rc2 → queens-1

Emilien Macchi (emilienm) on 2017-10-23

Changed in tripleo:
milestone:	queens-1 → queens-2

Alex Schultz (alex-schultz) on 2017-12-05

Changed in tripleo:
milestone:	queens-2 → queens-3

Emilien Macchi (emilienm) on 2018-01-26

Changed in tripleo:
milestone:	queens-3 → queens-rc1

Alex Schultz (alex-schultz) on 2018-03-02

Changed in tripleo:
milestone:	queens-rc1 → rocky-1

Alex Schultz (alex-schultz) on 2018-04-20

Changed in tripleo:
milestone:	rocky-1 → rocky-2

Emilien Macchi (emilienm) on 2018-06-05

Changed in tripleo:
milestone:	rocky-2 → rocky-3

Emilien Macchi (emilienm) on 2018-07-26

Changed in tripleo:
milestone:	rocky-3 → rocky-rc1

Alex Schultz (alex-schultz) on 2018-08-14

Changed in tripleo:
milestone:	rocky-rc1 → stein-1

Juan Antonio Osorio Robles (juan-osorio-robles) on 2018-10-30

Changed in tripleo:
milestone:	stein-1 → stein-2

Emilien Macchi (emilienm) on 2019-01-13

Changed in tripleo:
milestone:	stein-2 → stein-3

Michele Baldessari (michele) on 2019-02-19

Changed in tripleo:
status:	Triaged → Incomplete

Alex Schultz (alex-schultz) on 2019-03-14

Changed in tripleo:
milestone:	stein-3 → stein-rc1

Alex Schultz (alex-schultz) on 2019-04-15

Changed in tripleo:
milestone:	stein-rc1 → train-1

Alex Schultz (alex-schultz) on 2019-06-07

Changed in tripleo:
milestone:	train-1 → train-2

Alex Schultz (alex-schultz) on 2019-07-29

Changed in tripleo:
milestone:	train-2 → train-3

Alex Schultz (alex-schultz) on 2019-09-11

Changed in tripleo:
milestone:	train-3 → ussuri-1

Emilien Macchi (emilienm) on 2019-12-19

Changed in tripleo:
milestone:	ussuri-1 → ussuri-2

wes hayutin (weshayutin) on 2020-02-10

Changed in tripleo:
milestone:	ussuri-2 → ussuri-3

wes hayutin (weshayutin) on 2020-04-13

Changed in tripleo:
milestone:	ussuri-3 → ussuri-rc3

wes hayutin (weshayutin) on 2020-05-26

Changed in tripleo:
milestone:	ussuri-rc3 → victoria-1

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	victoria-1 → victoria-3

Kevin Carter (kevin-carter) on 2020-09-10

Changed in tripleo:
importance:	High → Wishlist

tripleo

tripleo fails to deploy in ci : Failed to call refresh: /usr/bin/clustercheck

Bug Description

Other bug subscribers

Remote bug watches