pike promotion errors, overcloud images and containers were not promoted properly

Bug #1722640 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Gabriele Cerami

Bug Description

New promotion on current-tripleo/ 2017-10-10 15:41

Commit hash
commits:
- commit_branch: master
  commit_hash: 39c29e7c7facf47a6131570d9602f8a7737fa4be
  distgit_dir: /home/centos-pike/data/openstack-rally_distro/
  distro_hash: baed7867b233f1e9b49723f7a8636da681a0e324
  dt_build: '1507649820'
  dt_commit: '1507649566'

Errors: in http://38.145.33.13/pike.log
http://paste.openstack.org/show/623274/

Images are available, but not promoted
https://images.rdoproject.org/pike/rdo_trunk/?C=M;O=D

current-tripleo is pointing to 10-07 not 10-10
rdo container registry is down

<dmsimard> weshay, trown, adarazs: I put the registry in read only for a bit, trying a solution from upstream to resolve an issue
<trown> dmsimard: ack

Revision history for this message
David Moreau Simard (dmsimard) wrote :

The *container* image upload/tag/promotion/push failure is most certainly my fault.

I am trying to implement a definitive solution to the image pruning issues we have been having with the help of upstream openshift devs. This should not reproduce.

Revision history for this message
John Trowbridge (trown) wrote :

logs for the image failure:

2017-10-10 20:02:45,020 28459 INFO promoter Promoting the qcow image for dlrn hash 39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867 on pike to current-tripleo
2017-10-10 20:02:45,026 28459 ERROR promoter QCOW IMAGE UPLOAD FAILED LOGS BELOW:
2017-10-10 20:02:45,026 28459 ERROR promoter + RELEASE=pike
+ PROMOTED_HASH=39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867
+ LINK_NAME=current-tripleo
+ image_path=pike/rdo_trunk/current-tripleo
+ ssh_cmd='ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
+ pushd /tmp/
/tmp ~
+ mkdir -p 39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867
+ ln -s 39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867 stable
ln: failed to create symbolic link ‘stable/39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867’: File exists

2017-10-10 20:02:45,026 28459 ERROR promoter Command '['bash', '/home/centos/ci-config/ci-scripts/promote-images.sh', 'pike', '39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867', 'current-tripleo']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/home/centos/ci-config/ci-scripts/dlrnapi_promoter/dlrnapi_promoter.py", line 127, in tag_qcow_images
    stderr=subprocess.STDOUT
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['bash', '/home/centos/ci-config/ci-scripts/promote-images.sh', 'pike', '39c29e7c7facf47a6131570d9602f8a7737fa4be_baed7867', 'current-tripleo']' returned non-zero exit status 1
2017-10-10 20:02:45,026 28459 ERROR promoter END OF QCOW IMAGE UPLOAD FAILURE

Revision history for this message
John Trowbridge (trown) wrote :

https://review.rdoproject.org/r/10085 should fix the qcow image promote part of this bug.

Revision history for this message
David Moreau Simard (dmsimard) wrote :

The registry is once again available, I've posted details of the issue here: https://www.redhat.com/archives/rdo-list/2017-October/msg00037.html

Revision history for this message
wes hayutin (weshayutin) wrote :

Removing alert, hopefully one of the 4hr jobs (pike,master) will show this working by tomorrow, if not we'll poke at it by hand to confirm. The next build job should be able to give us the feedback we need.

tags: removed: alert
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

I don't think it's right solution for images problem, because the real issue is that promoter script can't detect that new and old hashes are the same and trying to promote anyway. I try to solve it here: https://review.rdoproject.org/r/10094

Let's revert this change with tmp dir: https://review.rdoproject.org/r/#/c/10095/

Revision history for this message
Attila Darazs (adarazs) wrote :

https://review.rdoproject.org/r/10094 reviewed -- needs some adjustment, but that's definitely a problem. I didn't notice this when full_hash was added. Thanks for noticing.

Also I agree with reverting that change with the /tmp/ dir.

Arx Cruz (arxcruz)
Changed in tripleo:
assignee: John Trowbridge (trown) → Gabriele Cerami (gcerami)
Revision history for this message
Gabriele Cerami (gcerami) wrote :

Cloud images link was corrected manually yesterday. Fix https://review.rdoproject.org/api/10099 for the link is merged.
Container images were uploaded manually launching the upload script. The crontab was modified on the setup playbook, but not on the promoter server. Should be ok now.
waiting for https://review.openstack.org/511334 to fix urls of the images and using overcloud images as undercloud image.

Changed in tripleo:
milestone: queens-1 → queens-2
Revision history for this message
Gabriele Cerami (gcerami) wrote :

Happened again, space problems on the image server. Appending cleanup code in every promotion

Revision history for this message
Alan Pevec (apevec) wrote :

@Gabriele what is the preventive action and who owns it?

Revision history for this message
Attila Darazs (adarazs) wrote :

This just happened now again. We need to correct it manually and fix the promoter ASAP.

Revision history for this message
Attila Darazs (adarazs) wrote :

Only the container image upload failed. I recreated the command run in the promoter script and got the same error:

failed: [localhost] (item=cinder-backup) => {"failed": true, "item": "cinder-backup", "msg": "Error pulling trunk.registry.rdoproject.org/pike/centos-binary-cinder-backup - code: None message: Get https://trunk.registry.rdoproject.org/v2/pike/centos-binary-cinder-backup/manifests/1e0dd6786e3a4326a37e47541844657524e689a9_81220c5f: unauthorized: authentication required"}

Something changed on the RDO registry side probably that prevents us from accessing it.

Revision history for this message
Attila Darazs (adarazs) wrote :

This seems to be a separate issue from this, something with the rdo registry not allowing anonymous pulls, so opening a different bug for that.

Changed in tripleo:
milestone: queens-2 → queens-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.