master promoter script looping and failing since 12/9

Bug #1737617 reported by Matt Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

It appears that the promoter script has been looping since 12/9, and is so unresponsive that serving logs is very slow and sometimes simply fails. The behavior started on 12/9.

The script runs every 10 minutes, and since 12/9 has been failing on the check for another promoter instance running. This is typically seen while uploading new container images upon promotion, as it can take > 30 mins to do so.

- http://38.145.33.13/master.log-20171208

2017-12-09 00:51:48,436 32059 DEBUG promoter rdo-master-promote-rdo_trunk-build-images at 2017-12-08T23:09:11, logs at https://thirdparty.logs.rdoproject.org/jenkins-rdo-master-promote-rdo_trunk-build-images-59/console.txt.gz
2017-12-09 00:51:48,436 32059 INFO promoter Skipping promotion of current-tripleo-rdo to current-tripleo-rdo-internal, missing successful jobs: ['periodic-master-rdo_trunk-virtbasic-1ctlr_1comp_64gb', 'promote-rhel-master-rdo_trunk-virtbasic-1ctlr_1comp_64gb', 'tripleo-quickstart-master-rdo_trunk-baremetal-dell_fc430_envb-single_nic_vlans', 'oooq-master-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans']
2017-12-09 00:51:48,436 32059 INFO promoter new hash found for {'timestamp': 1512517091, 'promote_name': 'current-tripleo-rdo', 'commit_hash': '58fc4c43eab8419bb5bd17234bcb4bc02188fdd2', 'distro_hash': 'b1259967731d6fb44ac854d1b8b5dc69279ab7b9'}: current-tripleo-rdo

2017-12-09 01:01:01,402 32149 ERROR promoter Another promoter process is running
2017-12-09 01:11:01,758 32244 ERROR promoter Another promoter process is running
2017-12-09 01:21:01,902 32320 ERROR promoter Another promoter process is running
2017-12-09 01:31:01,370 32397 ERROR promoter Another promoter process is running
2017-12-09 01:41:01,782 32479 ERROR promoter Another promoter process is running
2017-12-09 01:51:01,384 32557 ERROR promoter Another promoter process is running
2017-12-09 02:01:01,428 32651 ERROR promoter Another promoter process is running
2017-12-09 02:11:02,020 32741 ERROR promoter Another promoter process is running

...

- http://38.145.33.13/master.log-20171209
- http://38.145.33.13/master.log-20171210
- http://38.145.33.13/master.log

The last link promoted for master was

- https://trunk.rdoproject.org/centos7-master/c8/cc/c8cceebf8e648ce46219026f926047491135a66e_fcf8d179

on 12/7

Revision history for this message
Matt Young (halcyondude) wrote :
tags: added: ci promotion-blocker
Changed in tripleo:
importance: Undecided → Critical
milestone: none → queens-3
Matt Young (halcyondude)
tags: added: alert
Changed in tripleo:
status: New → Triaged
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

Removed alert since rdo cloud is still in upgrade mode, promotion is not expected to work.

tags: removed: alert
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

I commented out all cron jobs that start promotion until rdo cloud is upgraded and stable. Please don't forget to uncomment them.

wes hayutin (weshayutin)
tags: added: quickstart
Revision history for this message
Alan Pevec (apevec) wrote :

rdocloud upgrade finished, what is left to do here?

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.