python-dockery-py package conflict causing jobs to fail

Bug #1806853 reported by Marios Andreou
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Marios Andreou

Bug Description

A Transaction check error: for python-docker-py is causing errors. This has been seen in the standalone scenario jobs at [1] and [2] in rdo promotion at [3]. Trace looks like:

    Transaction check error:\\n file /usr/lib/python2.7/site-packages/docker/__init__.py from install of python-docker-py-1:1.10.6-7.el7.noarch conflicts with file from package python2-docker-3.3.0-1.el7.noarch\\n file /usr/lib/python2.7/site-packages/docker/api/__init__.py from install of python-docker-py-1:1.10.6-7.el7.noarch conflicts with file from package python2-docker-3.3.0-1.el7.noarch\\n file /usr/lib/python2.7/site-packages/docker/transport/__init__.py from install of python-docker-py-1:1.10.6-7.el7.noarch conflicts wit
    ...

[1] http://logs.openstack.org/08/619508/11/check/tripleo-ci-centos-7-scenario001-standalone/87e6424/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-12-04_20_06_18
[2] http://logs.openstack.org/20/619520/11/check/tripleo-ci-centos-7-scenario004-standalone/b5bd260/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-12-04_08_34_25
[3] https://ci.centos.org/job/tripleo-quickstart-promote-rocky-rdo_trunk-minimal/45/consoleText

Changed in tripleo:
importance: Undecided → High
milestone: none → stein-2
yatin (yatinkarel)
tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

spent some time digging today, this is blocking both scenario 1 [1] and scenario 4 [2] standalone jobs because ceph. The error is coming from ceph-ansible at [3]. Using git blame am wondering if this commit was previously hiding the error [4] by previously having python-docker-py first before docker; really not sure if that is why we are seeing this though, its just the most recent and possibly relevant commit there. going to ping ceph folks to check here now

[1] https://review.openstack.org/#/c/619508/
[2] https://review.openstack.org/#/c/619520
[3] https://github.com/ceph/ceph-ansible/blob/bc2daaeb716dda686d484ff2b7e282a606bbf91c/roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml#L45
[4] https://github.com/ceph/ceph-ansible/commit/d72340abbe8b2ba71e1b41696140725b736158fd#diff-cf71a52b1e4dd836cf2162cfea92c9f1R49

Revision history for this message
Marios Andreou (marios-b) wrote :

forgot to add.. both scenario1 and 4 have previously/recently passed the deployment (e.g. at [1]) which is what leads me to believe this might be some recently committed code causing this

[1] http://logs.openstack.org/08/619508/10/check/tripleo-ci-centos-7-scenario001-standalone/d3d5873/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-30_14_30_26

Revision history for this message
Giulio Fidente (gfidente) wrote :

ceph-ansible should be skipping the with_pkg tasks [1]

will check the config-download dir to see if --skip-tags is passed as expected ... maybe the param default value is overridden in some env file

1. https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L65

Revision history for this message
Marios Andreou (marios-b) wrote :

adding note here, had a chat with gfidente in oooq, he reckons the problem is that we are overriding the skip tags in [1]. This is bad because the tasks in [2] from which the error is coming are meant to be skipped, and are not, because we are overriding the skip tags.

Just posted https://review.openstack.org/623217 to test and added depends-on the scen1/4 https://review.openstack.org/619508 https://review.openstack.org/619520 lets see

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/scenario004-standalone.yaml#L50
[2] https://github.com/ceph/ceph-ansible/blob/bc2daaeb716dda686d484ff2b7e282a606bbf91c/roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml#L45

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
yatin (yatinkarel) wrote :

<<< forgot to add.. both scenario1 and 4 have previously/recently passed the deployment (e.g. at [1]) which <<< is what leads me to believe this might be some recently committed code causing this

<<< [1] http://logs.openstack.org/08/619508/10/check/tripleo-ci-centos-7-scenario001-standalone/d3d5873/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-30_14_30_26

So the reason that they passed on 30th and then failing, i am not sure about the commit u mentioned in ceph ansible is the cause for all issues(Note issue is seen in -minimal job in RDO phase1 as well where i don't think ceph-ansible is used, there it just fails while installing tripleoclient), what i can say it's related to CentOS 7.6 release around 2nd December which have created issues with the python-docker(python2-docker-3.3.0-1.el7.noarch) we ship in RDO and python-docker-py(python-docker-py-1.10.6-7.el7.noarch.rpm in 7.6 vs python-docker-py-1.10.6-4.el7.noarch.rpm in 7.5) from CentOS base repo.

and reason is in CentOS 7.6 they bumped the epoch[3] for python-docker-py which in RDO we used to Obsolete([1]), We need to handle the versions from Base and RDO(might need an update of python-docker to handle Obsoletes for python-docker-py also should consider overcloud pkg-map entry for docker-py). I am not aware of python-docker-py vs python-docker so need to check git history or if someone else(may be apevec, amoralej) who have context.

python-docker-py is to be installed in overcloud image[2], before 7.6 python2-docker was getting installed because of [1], and after 7.6 python-docker-py(with epoch) is installed. and later when python2-docker is being installed it fails.

[3]
diff --git a/python-docker-py.spec b/python-docker-py.spec
index 2c2e14b..3d1496c 100644
--- a/python-docker-py.spec
+++ b/python-docker-py.spec
@@ -8,7 +8,8 @@

 Name: python-%{project}
 Version: %{docker_py_version}
-Release: 5%{?dist}
+Epoch: 1
+Release: 6%{?dist}
 Summary: An API client for docker written in Python
 License: ASL 2.0
 URL: https://github.com/%{owner}/%{project}/
@@ -90,6 +91,9 @@ popd

 %changelog
+* Thu Oct 25 2018 Tomas Tomecek <email address hidden> - 1:1.10.6-6
+- increase epoch so pycreds can be updated
+
 * Thu Oct 25 2018 Tomas Tomecek <email address hidden> - 1.10.6-5
 - use correct version for pycreds #1641795

[1] [centos@dlrn ~]$ rpm -qp --obsoletes https://trunk.rdoproject.org/centos7-master/deps/latest/noarch/python2-docker-3.3.0-1.el7.noarch.rpm
python-docker < 3.3.0-1.el7
python-docker-py < 2

[2] http://git.openstack.org/cgit/openstack/tripleo-puppet-elements/tree/elements/overcloud-base/pkg-map#n9

Revision history for this message
Marios Andreou (marios-b) wrote :

it seems there are 2 issues then @ykarel. The commit i pointed to in comment #1 was not relevant to the issue I saw with the scenario1/4 standalone jobs, but rather it was the fact we were overriding something we should not have (see comment #4). With the fix in https://review.openstack.org/623217 the jobs are passing this error (ceph-ansible was trying to install something, it should not have been, so we had conflicts).

Maybe then we need a new bug for the centos issue. Freaky that they are both seen around the same time but sounds unrelated wdyt?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/623217
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=57c4f03c0d70d434507f87c3a8821613a230b602
Submitter: Zuul
Branch: master

commit 57c4f03c0d70d434507f87c3a8821613a230b602
Author: Marios Andreou <email address hidden>
Date: Thu Dec 6 16:17:41 2018 +0200

    Remove CephAnsibleSkipTags from scenario1/4 standalone ci envs

    We should not be overriding the skip tags otherwise we get the bug below

    Closes-Bug: #1806853
    Change-Id: I5a549eca8d2a750c751b193c24f77d3466acc2f9

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Alan Pevec (apevec) wrote :

There is still a packaging issue, RDO deps update with the proposed fix is
https://review.rdoproject.org/r/17746 Update python-docker with fixed upgrade path

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.3.0

This issue was fixed in the openstack/tripleo-heat-templates 10.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.