Bug #1786764 “Wrong versions of tripleo-common in container imag...” : Bugs : tripleo

Gabriele Cerami (gcerami) on 2018-08-13

Changed in tripleo:
status:	New → Triaged
importance:	Undecided → Critical
milestone:	none → rocky-rc1
assignee:	nobody → Gabriele Cerami (gcerami)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-08-13:

#2

The longer we're running w/o promotions, the higher chances are for a timeout. That is inevitable as update packages in containers for CI.

tags:

added: ci

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-08-13:

#3

Shall we push for promotions, even having the promotion blocked, then fix the blockers? I think we should, otherwise everything will become blocked like this issue

Revision history for this message

Jose Luis Franco (jfrancoa) wrote on 2018-08-13:

#4

After checking the logs for some time I found this in the mistral logs:

2018-08-10 17:12:35.386 7 ERROR mistral.engine.task_handler [req-2a2f73f7-9bab-4e1a-8517-18a2cdbb4b78 8472d43aa8d8497389896dd99b217bbc 333c5e13d8ab41dab559f11853625ce3 - default default] Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']], wf=tripleo.package_update.v1.package_update_plan, task=update]:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 63, in run_task
    task.run()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 390, in run
    self._run_new()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 419, in _run_new
    self._schedule_actions()
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 488, in _schedule_actions
    action.validate_input(input_dict)
  File "/usr/lib/python2.7/site-packages/mistral/engine/actions.py", line 326, in validate_input
    self.action_def.action_class
  File "/usr/lib/python2.7/site-packages/mistral/engine/utils.py", line 66, in validate_input
    raise exc.InputException(msg % tuple(msg_props))
InputException: Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']]
: InputException: Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']]
2018-08-10 17:12:35.395 7 INFO workflow_trace [req-2a2f73f7-9bab-4e1a-8517-18a2cdbb4b78 8472d43aa8d8497389896dd99b217bbc 333c5e13d8ab41dab559f11853625ce3 - default default] Task 'update' (a91e8766-d6d1-408d-950f-8d7892fd1fa7) [RUNNING -> ERROR, msg=Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']], wf=tripleo.package_update.v1.package_update_plan, task=update]:

http://logs.openstack.org/83/590683/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/d4d4120/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz#_2018-08-10_17_12_35_386

After checking the logs for some time I found this in the mistral logs:

2018-08-10 17:12:35.386 7 ERROR mistral.engine.task_handler [req-2a2f73f7-9bab-4e1a-8517-18a2cdbb4b78 8472d43aa8d8497389896dd99b217bbc 333c5e13d8ab41dab559f11853625ce3 - default default] Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']], wf=tripleo.package_update.v1.package_update_plan, task=update]:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 63, in run_task
    task.run()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 390, in run
    self._run_new()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 419, in _run_new
    self._schedule_actions()
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 488, in _schedule_actions
    action.validate_input(input_dict)
  File "/usr/lib/python2.7/site-packages/mistral/engine/actions.py", line 326, in validate_input
    self.action_def.action_class
  File "/usr/lib/python2.7/site-packages/mistral/engine/utils.py", line 66, in validate_input
    raise exc.InputException(msg % tuple(msg_props))
InputException: Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']]
: InputException: Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']]
2018-08-10 17:12:35.395 7 INFO workflow_trace [req-2a2f73f7-9bab-4e1a-8517-18a2cdbb4b78 8472d43aa8d8497389896dd99b217bbc 333c5e13d8ab41dab559f11853625ce3 - default default] Task 'update' (a91e8766-d6d1-408d-950f-8d7892fd1fa7) [RUNNING -> ERROR, msg=Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']], wf=tripleo.package_update.v1.package_update_plan, task=update]:

http://logs.openstack.org/83/590683/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/d4d4120/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz#_2018-08-10_17_12_35_386

Revision history for this message

Jose Luis Franco (jfrancoa) wrote on 2018-08-13:

#5

The error was probably inserted by some of these patches: https://review.openstack.org/#/q/topic:external-update-upgrade

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-13:

#6

tripleo_common.actions.package_update.UpdateStackAction, missing=['ceph_ansible_playbook']

^ that must be caused by some stale content in containers. In the latest code there is no ceph_ansible_playbook in anything update/upgrade related, both in tripleo-common and in tripleoclient.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-13:

#7

Here are CI results from 20 minutes ago and they're green:

https://review.openstack.org/#/c/591374/

Gabriele Cerami (gcerami) on 2018-08-14

tags:

added: alert

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-14:

#8

The current issue is different, now it complains of missing 'container_registry' instead of 'ceph_ansible_playbook'. It's desync between tripleo-common and tripleoclient versions after removal of deprecations we did recently. The content of repos is fine but appartently containers are not. The fix is simple -- we need containers with fresh RPM content.

Revision history for this message

Alan Bishop (alan-bishop) wrote on 2018-08-15:

#9

I also encountered "missing container registry"

http://logs.openstack.org/08/589208/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/aa98c24/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz#_2018-08-14_16_48_11_805

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-16: Related fix proposed to tripleo-common (master)

#10

Related fix proposed to branch: master
Review: https://review.openstack.org/592241

Revision history for this message

wes hayutin (weshayutin) wrote on 2018-08-16: Re: tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates times out on prepare

#11

The src for that error [2] I think is here [1]

[1] http://git.openstack.org/cgit/openstack/tripleo-common/tree/workbooks/package_update.yaml#n14
[2] http://logs.openstack.org/08/589208/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/aa98c24/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz#_2018-08-14_16_48_11_805

Is that input container_registry even used? Would something like this help?
https://review.openstack.org/#/c/592241/

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#12

It's not about input to the *workflow*, it's about input to the *action*. I think what we have is a desync within mistral containers. Likely we have this patch in one of the mistral containers (e.g. API or engine) but we don't have it in another (e.g. executor):

https://review.openstack.org/#/c/571186/

What i said earlier on this bug still holds true i think -- all we need is fresh content of containers, AFAICT. When i deployed with `update_containers: true`, the problem disappeared in my dev env.

If we promoted recently and it didn't fix the problem, we should to check our promotion process (did we leave out some image by accident maybe?).

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#13

Also, let's land this patch please, finishing parameter the removal: https://review.openstack.org/#/c/589487

Landing it is not a prerequisite to fixing the bug though. The state in repositories is already correct, likely it's container content which is broken.

Also notice that the job only fails on t-h-t patches and not on tripleo-common patches (on tripleo-common patches we actually do pull latest tripleo-common -- the one being tested -- into containers, i presume).

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#14

I saw a broken job here, the container images are "updated-20180815220725":

http://logs.openstack.org/02/573102/11/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/f89af9f/logs/undercloud/var/log/extra/docker/docker_allinfo.log.txt.gz

The error message talks about expecting this parameter which we recently removed:

https://review.openstack.org/#/c/571186/6/tripleo_common/actions/package_update.py

In other words, the error message talks about code which is not present anywhere since Aug 13. It shouldn't even know about that code. We have some problem in building the updated containers it seems.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#15

The mistral action definitions are in its database. Is it possible that we populate the database with non-updated mistral code (pre- August 13) and then we actually run Mistral processes with an updated container (post- August 15), without re-running `sudo mistral-db-manage populate`? That would result in such error messages i think.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#16

I pulled latest mistral-executor:current-tripleo image and it indeed has stale content.

docker.io/tripleomaster/centos-binary-mistral-executor current-tripleo bfb76e4acb94 47 hours ago 1.29 GB

()[mistral@57e2751fa018 /]$ grep -ri container_registry /usr/lib/python2.7/site-packages/tripleo_common
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: def __init__(self, timeout, container_registry,
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: self.container_registry = container_registry
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: if self.container_registry is not None:
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: update_env.update(self.container_registry)
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: if self.container_registry is not None:
/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py: parameters.update(self.container_registry['parameter_defaults'])

()[mistral@57e2751fa018 /]$ rpm -qa | grep tripleo-common
openstack-tripleo-common-9.2.1-0.20180811014734.336cd3c.el7.noarch
openstack-tripleo-common-containers-9.2.1-0.20180811014734.336cd3c.el7.noarch
python2-tripleo-common-9.2.1-0.20180811014734.336cd3c.el7.noarch
openstack-tripleo-common-container-base-9.2.1-0.20180811014734.336cd3c.el7.noarch

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#17

And then i ran with my locally updated mistral-executor image, "updated-20180814162422", and compared the results:

()[mistral@913f62985e6b /]$ grep -ri container_registry /usr/lib/python2.7/site-packages/tripleo_common

^ nothing :)

Revision history for this message

Marios Andreou (marios-b) wrote on 2018-08-16:

#18

folks adding a acomment as i came from https://bugs.launchpad.net/tripleo/+bug/1787226 which is marked duplicate of this. I added a comment about the missing container registry issue as reported above from jfrancoa abishop and others so duplicating here. If the ceph issue is different then lets use the two bugs one for each?

(copy paste from https://bugs.launchpad.net/tripleo/+bug/1787226):

* for the container registry parameter removal [1,2] it could only happen if you are using a python-tripleoclient with the change, and then tripleo-common w/out it. There is depends on though and they both merged 3 days ago. The multinode-oooq-container-updates job is green there too. However here you can see the error in [4,5] and it looks like

2018-08-15 10:11:40.190 ERROR /var/log/containers/mistral/engine.log: 7 ERROR mistral.engine.task_handler [req-fd8fbc0e-350e-4297-8a63-6eb0060ff25a 2bea853d47d643b38acfbd8f7c91504b 7f74b99fd5cc4cf0abaefee63f693317 - default default] Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['container_registry']], wf=tripleo.package_update.v1.package_update_plan, task=update]:

* for the nova cert error examples are at [5] (immediately following that container registry issue) and also [6] and looks like

2018-08-15 09:32:20.998 ERROR /var/log/containers/mistral/mistral-db-manage.log: 11 ERROR mistral.actions.openstack.action_generator.base [-] Failed to create action: nova.certs_convert_into_with_meta: AttributeError: 'Client' object has no attribute 'certs'

[0] http://zuul.openstack.org/builds.html?job_name=tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates
[1] https://review.openstack.org/#/c/570893/ python-tripleoclient
[2] https://review.openstack.org/#/c/571186/ tripleo-common
[3] https://bugs.launchpad.net/tripleo/+bug/1787227
[4] http://logs.openstack.org/48/588148/3/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/8a5b8b2/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-16_08_22_08_249
[5] http://logs.openstack.org/70/585370/2/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/027eb5f/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-15_09_32_20_998
[6] http://logs.openstack.org/48/588148/3/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/8a5b8b2/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-16_07_33_58_362

folks adding a acomment as i came from https://bugs.launchpad.net/tripleo/+bug/1787226 which is marked duplicate of this. I added a comment about the missing container registry issue as reported above from jfrancoa abishop and others so duplicating here. If the ceph issue is different then lets use the two bugs one for each?

(copy paste from https://bugs.launchpad.net/tripleo/+bug/1787226):

* for the container registry parameter removal [1,2] it could only happen if you are using a python-tripleoclient with the change, and then tripleo-common w/out it. There is depends on though and they both merged 3 days ago. The multinode-oooq-container-updates job is green there too. However here you can see the error in [4,5] and it looks like

2018-08-15 10:11:40.190 ERROR /var/log/containers/mistral/engine.log: 7 ERROR mistral.engine.task_handler [req-fd8fbc0e-350e-4297-8a63-6eb0060ff25a 2bea853d47d643b38acfbd8f7c91504b 7f74b99fd5cc4cf0abaefee63f693317 - default default] Failed to run task [error=Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['container_registry']], wf=tripleo.package_update.v1.package_update_plan, task=update]:

* for the nova cert error examples are at [5] (immediately following that container registry issue) and also [6] and looks like

2018-08-15 09:32:20.998 ERROR /var/log/containers/mistral/mistral-db-manage.log: 11 ERROR mistral.actions.openstack.action_generator.base [-] Failed to create action: nova.certs_convert_into_with_meta: AttributeError: 'Client' object has no attribute 'certs'

[0] http://zuul.openstack.org/builds.html?job_name=tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates
[1] https://review.openstack.org/#/c/570893/ python-tripleoclient
[2] https://review.openstack.org/#/c/571186/ tripleo-common
[3] https://bugs.launchpad.net/tripleo/+bug/1787227
[4] http://logs.openstack.org/48/588148/3/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/8a5b8b2/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-16_08_22_08_249
[5] http://logs.openstack.org/70/585370/2/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/027eb5f/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-15_09_32_20_998
[6] http://logs.openstack.org/48/588148/3/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/8a5b8b2/logs/undercloud/var/log/extra/errors.txt.gz#_2018-08-16_07_33_58_362

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#19

The working container has:

()[mistral@af2c5169e7cf /]$ rpm -qa | grep tripleo-common
openstack-tripleo-common-container-base-9.2.1-0.20180814123159.042f43d.el7.noarch
openstack-tripleo-common-containers-9.2.1-0.20180814123159.042f43d.el7.noarch
openstack-tripleo-common-9.2.1-0.20180814123159.042f43d.el7.noarch
python2-tripleo-common-9.2.1-0.20180814123159.042f43d.el7.noarch

It would probably help if we'd be able to get to the containers which are used in CI and check the RPM versions there. Note that the containers in CI have the "updated" part of name *later* than what i have in my working local env, yet they still hit the problem. There might be something fishy in how we build/update containers for CI...

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#20

Ok we may be closing in on the root cause:

https://github.com/openstack/tripleo-quickstart-extras/blob/ff959379665658b454df5507ea09cc265fffdb9e/roles/undercloud-deploy/templates/containers-prepare-parameter.yaml.j2#L15

https://github.com/openstack/ansible-role-tripleo-modify-image#role-variables

"If set, packages from this repo will be updated. Other repos will only be used for dependencies of these updates."

Perhaps thanks to ^^ jobs running on tripleo-common patches get correct/fresh tripleo-common, and update job passes, but jobs running on t-h-t patches don't get fresh tripleo-common, and update job fails. We either need to tag `current-tripleo` images much more often, or we need to use ansible-role-tripleo-modify-image to update more packages than just the one we're gating.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-16:

#21

Had a call with Wes about this bug, it's probably two issues:

* Aside from updating RPMs from the gating repo, we should update the packages from delorean-current repo too.

* We should update all container images and not just some. (What triggered this bug was probably that different containers had different version of tripleo-common. Patch interdependencies probably played no part in this. One patch without any depends-on is enough to break things, if it happens to be installed e.g. in mistral-api container but not in mistral-executor container.)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-16: Fix proposed to tripleo-quickstart-extras (master)

#22

Fix proposed to branch: master
Review: https://review.openstack.org/592577

Changed in tripleo:
assignee:	Gabriele Cerami (gcerami) → wes hayutin (weshayutin)
status:	Triaged → In Progress

Revision history for this message

wes hayutin (weshayutin) wrote on 2018-08-16: Re: tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates times out on prepare

#23

The problem is that any container that is affected by [1] will get updated by dlrn-current too. This will mismatch the versions of [2]. So ensure that [2] are all at the same level across all of the containers we need to make the change in [3].

The concern is that the package update will take too long, and it might. However the includepkgs protects us a bit from that [4].

[1] packages_for_update="$(repoquery --disablerepo='*' --enablerepo={{ gating_repo_name }} --qf %{NAME} -a 2>{{ working_dir }}/repoquery.err.log | sort -u | xargs)"

[2] #includepkgs=instack,instack-undercloud,os-apply-config,os-collect-config,os-net-config,os-refresh-config,python-tripleoclient*,python*-tripleo-common,openstack-tripleo-*,puppet-*,python-paunch

[3] https://review.openstack.org/592577

[4] [root@undercloud yum.repos.d]# cat delorean-current.repo | grep include
includepkgs=instack,instack-undercloud,os-apply-config,os-collect-config,os-net-config,os-refresh-config,python-tripleoclient*,python*-tripleo-common,openstack-tripleo-*,puppet-*,python-paunch
[root@undercloud yum.repos.d]# repoquery -q --repoid=delorean-current | wc -l
221
[root@undercloud yum.repos.d]# vi delorean-current.repo
[root@undercloud yum.repos.d]# repoquery -q --repoid=delorean-current | wc -l
1091

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-17: Related fix proposed to tripleo-quickstart-extras (master)

#24

Related fix proposed to branch: master
Review: https://review.openstack.org/592784

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-17: Change abandoned on tripleo-common (master)

#25

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/592241

OpenStack Infra (hudson-openstack) on 2018-08-17

Changed in tripleo:
assignee:	wes hayutin (weshayutin) → Jiří Stránský (jistr)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-17: Related fix proposed to ansible-role-tripleo-modify-image (master)

#26

Related fix proposed to branch: master
Review: https://review.openstack.org/593169

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-21: Related fix merged to ansible-role-tripleo-modify-image (master)

#27

Reviewed: https://review.openstack.org/593169
Committed: https://git.openstack.org/cgit/openstack/ansible-role-tripleo-modify-image/commit/?id=7b587fe0f5b0b9ead2adbd9d450109f5fe1c6696
Submitter: Zuul
Branch: master

commit 7b587fe0f5b0b9ead2adbd9d450109f5fe1c6696
Author: Alex Schultz <email address hidden>
Date: Fri Aug 17 14:21:27 2018 -0600

Only do yum update when needed

    Currently the logic for this results in a full yum update when no
    package updates are found in the provided repository. This can lead to
    job timeouts when nothing was built in CI because it effectively does a
    yum update on every container and applies other system packages rather
    than ones that actually changed.

There is a larger issue in that we can still get out of sync with the
host OS.

Change-Id: Iaf41691ea3cb6e78186741ac5e15614fb73f89ff
Related-Bug: #1786764

OpenStack Infra (hudson-openstack) on 2018-08-21

Changed in tripleo:
assignee:	Jiří Stránský (jistr) → Sorin Sbarnea (ssbarnea)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-22: Fix merged to tripleo-quickstart-extras (master)

#28

Reviewed: https://review.openstack.org/592577
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=5d8f67f74aa2e6d54263b0455ad16c35347a1bb5
Submitter: Zuul
Branch: master

commit 5d8f67f74aa2e6d54263b0455ad16c35347a1bb5
Author: Wes Hayutin <email address hidden>
Date: Thu Aug 16 12:46:33 2018 -0400

add delorean-current to repolist for updates

    The list of containers that are updated are generated
    from a list of rpms from the gating repo. These
    containers are updated w/ the gating repo and dlrn-current.

    That makes the above set of containers out of sync w/
    the rest of the containers. The list of containers
    that are updated needs to include changes required
    by dlrn-current AND the gating repo.

    The --enablerepo parameter for repoquery seems to support
    comma-delimited lists, we'll take advantage of that so that we don't
    need edit ansible-role-tripleo-modify-image parameter interface.

    Co-Authored-By: Jiri Stransky <email address hidden>
    Closes-Bug: #1786764
    Change-Id: Ie12021ace7e9eb1695aa97ac5d97f3b948be9d86

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

wes hayutin (weshayutin) wrote on 2018-08-22: Re: tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates times out on prepare

#29

This needs to be fixed for all upstream releases.

include_pkgs can be used on repos that have tripleo jobs in their gate.
do not use wild cards, be explicit

Changed in tripleo:
status:	Fix Released → Triaged

Revision history for this message

Sorin Sbarnea (ssbarnea) wrote on 2018-08-23:

#30

I am working now on updating the patch to avoid using wildcards for all versions.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-24: Related fix proposed to tripleo-upgrade (master)

#31

Related fix proposed to branch: master
Review: https://review.openstack.org/596381

Alex Schultz (alex-schultz) on 2018-08-24

Changed in tripleo:
milestone:	rocky-rc1 → rocky-rc2

Revision history for this message

Sorin Sbarnea (ssbarnea) wrote on 2018-08-25: Re: tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates times out on prepare

#32

I am not sure what happens but while trying to configure a timeout for the update command in order to avoid failure to collect logs, I found the last line from http://logs.openstack.org/81/596381/2/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/41a018f/logs/undercloud/home/zuul/overcloud_update_prepare.log.txt.gz

WARNING tripleoclient.plugin [-] Waiting for messages on queue 'tripleo' with no timeout.

This makes me believe that this will never finish. My bug will improve the output and ease reading build logs but will not fix this bug.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-29:

#33

The patch did not fix the issue, it's still very the same problem:

InputException: Invalid input [name=tripleo.package_update.update_stack, class=tripleo_common.actions.package_update.UpdateStackAction, missing=['container_registry']]

There's still something broken in the way we run the ansible modify-image role, or in the modify-image role itself.

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-30:

#34

After promotion the updates job is now passing, but there's no reason to believe that the root cause is now fixed. I changed the bug title.

summary:

- tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates times
- out on prepare
+ Wrong versions of tripleo-common in container images updated in CI

Revision history for this message

Jiří Stránský (jistr) wrote on 2018-08-30:

#35

Sagi most likely found the root cause, here's what will hopefully be fix things: https://review.openstack.org/#/c/598089/

Sorin Sbarnea (ssbarnea) on 2018-09-02

description:

updated

Revision history for this message

Sorin Sbarnea (ssbarnea) wrote on 2018-09-02:

#36

I don't think that we can close this because the scenario never run succesfully since Sagi patch was merged two days ago.

http://cistatus.tripleo.org/#tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates

Still, the breakages could be caused by other reasons as I was not able to see the same error.

Revision history for this message

yatin (yatinkarel) wrote on 2018-09-04:

#37

<<< Still, the breakages could be caused by other reasons as I was not able to see the same error.

>> Correct, The current failures are after https://review.openstack.org/#/c/573476/, which merged around the time for sagi's patch, updates jobs need to adopt changes as per bp/container-prepare-workflow(https://review.openstack.org/#/q/topic:bp/container-prepare-workflow+(status:open+OR+status:merged))

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2018-09-04:

#38

Moving milestone to Stein-1 as this is not required for Rocky RC2.

Changed in tripleo:
milestone:	rocky-rc2 → stein-1

Sorin Sbarnea (ssbarnea) on 2018-09-04

Changed in tripleo:
assignee:	Sorin Sbarnea (ssbarnea) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-05: Fix proposed to ansible-role-tripleo-modify-image (master)

#39

Fix proposed to branch: master
Review: https://review.openstack.org/600273

Changed in tripleo:
assignee:	nobody → Steve Baker (steve-stevebaker)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-05: Fix proposed to tripleo-quickstart-extras (master)

#40

Fix proposed to branch: master
Review: https://review.openstack.org/600277

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-10: Fix merged to tripleo-quickstart-extras (master)

#41

Reviewed: https://review.openstack.org/600277
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a8b048e1162846d1460c51de706b4c08bf0a9473
Submitter: Zuul
Branch: master

commit a8b048e1162846d1460c51de706b4c08bf0a9473
Author: Steve Baker <email address hidden>
Date: Thu Sep 6 11:20:42 2018 +1200

Don't set compare_host_packages:True

    Now that the undercloud is containerized, there will be very few host
    packages to compare to, so there is a high risk that required package
    updates will be skipped.

    This is a strategy inherited from container-update.py that was
    intended to avoid unnecessary calls to yum update, however we now have
    a better approach using the repoquery, so host package comparison is
    no longer required, and probably causing some of the instances of bug

This strategy is removed from the role in change Iab7b9d6377494001d904bb84b058ea293d73110c

Change-Id: I3bb0ba1f56daf475b7498283a5b7e6dcd1540e7d
Partial-Bug: #1786764

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-09-13:

#42

This is probably fixed with https://review.openstack.org/#/c/599315/ finally?

OpenStack Infra (hudson-openstack) on 2018-09-17

Changed in tripleo:
assignee:	Steve Baker (steve-stevebaker) → Emilien Macchi (emilienm)

Revision history for this message

wes hayutin (weshayutin) wrote on 2018-09-18:

#43

openstack-tripleo-common-container-base.noarch
9.3.1-0.20180918151848.c794510.el7 @gating-repo

http://logs.openstack.org/22/603322/3/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/edb58a0/logs/undercloud/var/log/extra/docker/containers/heat_api/docker_info.log.txt.gz

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-28: Fix merged to ansible-role-tripleo-modify-image (master)

#44

Reviewed: https://review.openstack.org/600273
Committed: https://git.openstack.org/cgit/openstack/ansible-role-tripleo-modify-image/commit/?id=c9d085729f62dfcfeeaecccf36c3c0161414afb7
Submitter: Zuul
Branch: master

commit c9d085729f62dfcfeeaecccf36c3c0161414afb7
Author: Steve Baker <email address hidden>
Date: Thu Sep 6 10:35:38 2018 +1200

Remove compare_host_packages strategy

    Now that the undercloud is containerized, there will be very few host
    packages to compare to, so there is a high risk that required package
    updates will be skipped.

    This is a strategy inherited from container-update.py that was
    intended to avoid unnecessary calls to yum update, however we now have
    a better approach using the repoquery, so host package comparison is
    no longer required, and probably causing some of the instances of bug

Change-Id: Iab7b9d6377494001d904bb84b058ea293d73110c
Partial-Bug: #1786764

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-09: Change abandoned on tripleo-upgrade (master)

#45

Change abandoned by Sorin Sbarnea (<email address hidden>) on branch: master
Review: https://review.openstack.org/596381
Reason: True, timestamper_cmd should need escaping to work inside a subshell and this is a read challenge.

I will abandon it as I no longer have time to address it, I will revive it if I see other timeouts happening.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-14: Fix included in openstack/tripleo-quickstart-extras 2.1.1

#46

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-25: Change abandoned on tripleo-quickstart-extras (master)

#47

Change abandoned by Sagi Shnaidman (<email address hidden>) on branch: master
Review: https://review.opendev.org/592784
Reason: because long time passed since last update

tripleo

Wrong versions of tripleo-common in container images updated in CI

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches