Missing tags in dockerhub, impossible to deploy a containerized undercloud

Bug #1764870 reported by Emilien Macchi
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Gabriele Cerami

Bug Description

http://logs.openstack.org/77/561377/22/check/tripleo-ci-centos-7-undercloud-containers/de6d319/logs/undercloud/home/zuul/undercloud_install.log.txt.gz#_2018-04-17_18_18_05

Impossible to deploy from current right now:

Exception: Image docker.io/tripleomaster/centos-binary-neutron-dhcp-agent has no tag 5d2dbadc0f0706f842df64957cd93004d45180df_d032039d.

Available tags: 17a57d99b10f5872520e1a395214f80c9cfd1a65_c30c070f, 1fc826411a407c006b76cf6bdfcd3beeb34a1c26_f1a4ee5a, 26e8be8e812511ba01531802ced768b9e405dcba_5d2515c6, 35414701c176a6288fc2ad141dad0f73624dcb94_43527485, 3b718f3fecc866332ec0663fa77e758f8346ab93_4204ba89, 3b85715c6488a692fcdf4e4c6a589d77c60eb5a6_a3792c8d, 465e46159f59def9ba550cef0833f0e6469ac072_00a5f8f9, 58fc4c43eab8419bb5bd17234bcb4bc02188fdd2_b1259967, a257e03a4068120a252e0e6a0e4b2cfb679b35e8_5889b624, a2e69c2c44417c85334944a4c46f91648aa0b97f_3791bf5d, a453b3fd9ef6472c8ff3b07311c8f365a610297b_8285f3e2, ac82ea9271a4ae3860528eaf8a813da7209e62a6_28eeb6c7, ad1779c561edabdd6446c5fa39f05e121a327d7f_710b9460, adbdd4ce7dbb10eaf1110c93f0c7fdee5f79b3c4_9f6039a1, b23b33707ba4f4bd0682e58be30e1d16c6232992_d032039d, bbcea6372c1c2cef1f46e77ebb4774b6b41a22ae_c198a819, bc4a4d35820cfa20be20587efaf07d6431716fa5_1f757656, c8cceebf8e648ce46219026f926047491135a66e_fcf8d179, current-tripleo-rdo-internal, current-tripleo-rdo, current-tripleo, d16c2eda6093b476a75ce13460371a7d3ac33857_e9ad9896, d25d88fb42c444bda718e61a9cd2042c6e5a397c_435ac047, dc2fd615b580f4ff5c7dfc103dada370ecdf4099_a1e63efa, f442a3aa35981c3d6d7e312599dde2a1b1d202c9_0468cca4, passed-ci, tripleo-ci-testing

The tool that pushed a tag probably stopped in the middle, we're missing this tag in a bunch of containers, neutron and nova included.

Revision history for this message
wes hayutin (weshayutin) wrote :

<weshay> dmsimard, centos-binary-neutron-dhcp-agent
<weshay> in master
<weshay> should have 5d2dbadc0f0706f842df64957cd93004d45180df
<weshay> not seeing it
<weshay> tristanC, do you have access to the rdo registry?
<weshay> dmsimard, tristanC looking for errors.. seeing the following
<weshay> for example.. diff container
<weshay> failed: [localhost] (item=[u'cinder-api', u'5d2dbadc0f0706f842df64957cd93004d45180df_d032039d']) => {"changed": false, "item": ["cinder-api", "5d2dbadc0f0706f842df64957cd93004d45180df_d032039d"], "msg": "Error searching for image trunk.registry.rdoproject.org/tripleomaster/centos-binary-cinder-api - 500 Server Error: Internal Server Error (\"{\"message\":\"layer does not exist\"}\")"}
<weshay> tristanC, dmsimard EmilienM ok.. rdoregistry just got the updates to tripleomaster

Revision history for this message
wes hayutin (weshayutin) wrote :

Looks liek there was some hiccup, but luckily we retry pushing containers every 10 minutes if there is a failure.

http://38.145.34.55/master.log

Looks looks like the rdo registry updated, and we're seeing updates in docker.io. I think we're ok.

Closing for now

Changed in tripleo:
status: Triaged → Invalid
Revision history for this message
wes hayutin (weshayutin) wrote :

Deployments may be working now, however the promotion is not working due to the 500 errors from both the rdo-registry and docker.io

Changed in tripleo:
status: Invalid → Triaged
wes hayutin (weshayutin)
Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Revision history for this message
Quique Llorente (quiquell) wrote :

I see only 500 errors at docker.io now, "Push images to rdoproject registry with named label" is working fine.

Revision history for this message
Alan Pevec (apevec) wrote :

https://hub.docker.com/r/tripleomaster/centos-binary-neutron-server/tags/

has now 5d2dbadc0f0706f842df64957cd93004d45180df_d032039d and few newer tags, so it should be resolved but would be good to look at root cause, if you have exact timestamps when uploads failed

PA is already built-in into promoter script, since it repushes containers in case of failure.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

There is a chain of problems:
1) The promoter server is failing pull and pushing some images. We are investigating problems with the device mapper and file system
2) the job is not using the correct tag, it's trying tags that are not fully promoted, and it shouldn't do that

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Opening new bug for 1): https://bugs.launchpad.net/tripleo/+bug/1765084
This bug should try to address 2)
That is, why is the undercloud container job trying to pull hashes that are not fully promoted

Revision history for this message
wes hayutin (weshayutin) wrote :

will track this in https://bugs.launchpad.net/tripleo/+bug/1765084

Closing the bug, as we now single thread the promotions in various releases.

tags: removed: alert
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.