Overcloude deploy error:Timed out waiting for messages from Execution

Bug #1792296 reported by Quique Llorente
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Juan Antonio Osorio Robles

Bug Description

At featureset019 in master promotions overcloud deploy fails in the following point
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-master/308d696/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-09-13_02_53_40
       " (at /etc/puppet/modules/stdlib/lib/puppet/Timed out waiting for messages from Execution (ID: 99ab8566-aec2-46be-b83c-e60fd7e1eda1, State: RUNNING). The WebSocket timed out before the Workflow completed.
2018-09-13 02:53:40 |
2018-09-13 02:53:40 | END return value: 1
2018-09-13 02:53:40 | functions/deprecation.rb:28:in `deprecation')",
2018-09-13 02:53:40 | " with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
2018-09-13 02:53:40 | " with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
2018-09-13 02:53:40 | " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
2018-09-13 02:53:40 | " with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
2018-09-13 02:53:40 | " with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
2018-09-13 02:53:40 | "Warning: tag is a metaparam; this value will inherit to all contained resources in the tripleo::firewall::rule definition",
2018-09-13 02:53:40 |

summary: - Overcloude deploy error: cannot load such file --
- hiera/backend/module_data_backend"
+ Overcloude deploy error:Timed out waiting for messages from Execution
description: updated
Revision history for this message
chandan kumar (chkumar246) wrote :
Download full text (4.1 KiB)

On subnode-2, manila and glance api died to resulted in TImed out due to DB connection error
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-master/308d696/logs/subnode-2/var/log/containers/glance/api.log.txt.gz#_2018-09-13_01_56_30_564

2018-09-13 01:56:30.564 20 ERROR glance Traceback (most recent call last):
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/bin/glance-manage", line 10, in <module>
2018-09-13 01:56:30.564 20 ERROR glance sys.exit(main())
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/cmd/manage.py", line 563, in main
2018-09-13 01:56:30.564 20 ERROR glance return CONF.command.action_fn()
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/cmd/manage.py", line 395, in sync
2018-09-13 01:56:30.564 20 ERROR glance self.command_object.sync(CONF.command.version)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/cmd/manage.py", line 154, in sync
2018-09-13 01:56:30.564 20 ERROR glance curr_heads = alembic_migrations.get_current_alembic_heads()
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/db/sqlalchemy/alembic_migrations/__init__.py", line 52, in get_current_alembic_heads
2018-09-13 01:56:30.564 20 ERROR glance engine = db_api.get_engine()
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/db/sqlalchemy/api.py", line 97, in get_engine
2018-09-13 01:56:30.564 20 ERROR glance facade = _create_facade_lazily()
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/glance/db/sqlalchemy/api.py", line 87, in _create_facade_lazily
2018-09-13 01:56:30.564 20 ERROR glance _FACADE = session.EngineFacade.from_config(CONF)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1352, in from_config
2018-09-13 01:56:30.564 20 ERROR glance expire_on_commit=expire_on_commit, _conf=conf)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1273, in __init__
2018-09-13 01:56:30.564 20 ERROR glance slave_connection=slave_connection)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 496, in _start
2018-09-13 01:56:30.564 20 ERROR glance engine_args, maker_args)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 520, in _setup_for_connection
2018-09-13 01:56:30.564 20 ERROR glance sql_connection=sql_connection, **engine_kwargs)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator
2018-09-13 01:56:30.564 20 ERROR glance return wrapped(*args, **kwargs)
2018-09-13 01:56:30.564 20 ERROR glance File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 202, in create_engine
2018-09-13 01:56:30.564 20 ERR...

Read more...

Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Changed in tripleo:
status: New → Triaged
Revision history for this message
Michele Baldessari (michele) wrote :
Download full text (6.3 KiB)

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-master/36edbf9/logs/subnode-2/var/log/cluster/corosync.log.txt.gz

1. pacemaker failed to start the haproxy bundle container
Sep 13 08:45:51 [30345] upstream-centos-7-rdo-cloud-tripleo-0000174334 pengine: info: common_print: ip-192.168.24.11 (ocf::heartbeat:IPaddr2): Stopped
Sep 13 08:45:51 [30345] upstream-centos-7-rdo-cloud-tripleo-0000174334 pengine: info: common_print: ip-192.168.24.12 (ocf::heartbeat:IPaddr2): Stopped
Sep 13 08:45:51 [30345] upstream-centos-7-rdo-cloud-tripleo-0000174334 pengine: info: container_print: Docker container: haproxy-bundle [192.168.24.1:8787/tripleomaster/centos-binary-haproxy:pcmklatest]
Sep 13 08:45:51 [30345] upstream-centos-7-rdo-cloud-tripleo-0000174334 pengine: info: common_print: haproxy-bundle-docker-0 (ocf::heartbeat:docker): Stopped

Sep 13 08:00:26 [30341] upstream-centos-7-rdo-cloud-tripleo-0000174334 cib: info: cib_perform_op:▸ + /cib: @num_updates=1↲
Sep 13 08:00:26 [30341] upstream-centos-7-rdo-cloud-tripleo-0000174334 cib: info: cib_perform_op:▸ + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='haproxy-bundle-docker-0']/lrm_rsc_op[@id='haproxy-bundle-docker-0_last_0']: @operation_key=haproxy-bundle-docker-0_start_0, @operation=start, @transition-key=60:27:0:df70d12c-f2cc-45fb-a677-64838b575c74, @transition-magic=-1:193;60:27:0:df70d12c-f2cc-45fb-a677-64838b575c74, @call-id=-1, @rc-code=193, @op-status=-1, @las↲
Sep 13 08:00:26 [30341] upstream-centos-7-rdo-cloud-tripleo-0000174334 cib: info: cib_process_request:▸ Completed cib_modify operation for section status: OK (rc=0, origin=upstream-centos-7-rdo-cloud-tripleo-0000174334/crmd/226, version=0.28.1)↲
Sep 13 08:00:26 docker(haproxy-bundle-docker-0)[73283]: INFO: checking for nsenter, which is required when 'monitor_cmd' is specified↲
Sep 13 08:00:26 docker(haproxy-bundle-docker-0)[73283]: INFO: running container haproxy-bundle-docker-0 for the first time↲
Sep 13 08:00:27 docker(haproxy-bundle-docker-0)[73283]: INFO: monitor cmd exit code = 137↲
Sep 13 08:00:27 docker(haproxy-bundle-docker-0)[73283]: INFO: stdout/stderr:↲
Sep 13 08:00:27 docker(haproxy-bundle-docker-0)[73283]: ERROR: waiting on monitor_cmd to pass after start↲
Sep 13 08:00:28 docker(haproxy-bundle-docker-0)[73283]: ERROR: Newly created docker container exited after start↲
Sep 13 08:00:28 [30343] upstream-centos-7-rdo-cloud-tripleo-0000174334 lrmd: notice: operation_finished:▸ haproxy-bundle-docker-0_start_0:73283:stderr [ ocf-exit-reason:waiting on monitor_cmd to pass after start ]↲
Sep 13 08:00:28 [30343] upstream-centos-7-rdo-cloud-tripleo-0000174334 lrmd: notice: operation_finished:▸ haproxy-bundle-docker-0_start_0:732...

Read more...

Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Michele Baldessari (michele) wrote :

So via https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-master/36edbf9/logs/subnode-2/etc/ceph/mycephcluster.conf.txt.gz we can see that it seems that THT set up the right address for rados. THT code in docker/services/ceph-ansible/ceph-rgw.yaml does:
  CephRgwAnsibleVars:
    type: OS::Heat::Value
    properties:
      type: json
      value:
        vars:
          radosgw_keystone: true
          radosgw_keystone_ssl: false
          radosgw_address_block: {get_param: [ServiceData, net_cidr_map, {get_param: [ServiceNetMap, CephRgwNetwork]}]}
          radosgw_civetweb_port: {get_param: [EndpointMap, CephRgwInternal, port]}

And indeed the network is correct:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-master/36edbf9/logs/undercloud/var/lib/mistral/config-download-latest/ceph-ansible/group_vars/rgws.yml.txt.gz

radosgw_address_block: 192.168.24.0/24
radosgw_civetweb_port: '8080'
radosgw_keystone: true
radosgw_keystone_ssl: false

Giulio suspects an issue with the container itself

rgw frontends = civetweb port=192.168.24.3:8080 num_threads=100

Revision history for this message
Giulio Fidente (gfidente) wrote :

should be fixed in ceph-ansible 3.1.3 which appeared in cbs on sept 12th

Revision history for this message
Quique Llorente (quiquell) wrote :
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/603323

Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/603367

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/603369

Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

Removed alert, use alert for gate failures or other failures blocking tripleo upstream that have not yet been identified and patched.

https://logs.rdoproject.org/43/13943/33/check/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset019-rocky/684a22c/

tags: removed: alert
Revision history for this message
wes hayutin (weshayutin) wrote :

Sorry a little more context, sounds like we are very confident in the patch resolving the issue. The alert may just be noise at this moment, earlier it was very helpful :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/603583

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (stable/queens)

Change abandoned by Quique Llorente (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/603583

Changed in tripleo:
assignee: Quique Llorente (quiquell) → wes hayutin (weshayutin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/rocky)

Reviewed: https://review.openstack.org/603367
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ed76b7681943370e6ceaef412804516ef0abb0da
Submitter: Zuul
Branch: stable/rocky

commit ed76b7681943370e6ceaef412804516ef0abb0da
Author: Quique Llorente <email address hidden>
Date: Tue Sep 18 14:49:56 2018 +0200

    Upgrade docker ceph container

    Upgrades to v3.1.0-stable-3.1-luminous-centos-7-x86_64 to read config
    from ceph conf file [1]

    [1] https://github.com/ceph/ceph-container/commit/24bd34a6ed748d390c889bf07acc8b02c931f37d

    Change-Id: I2658011d5d50cd6f7c6d12e1923e4a3b15f64010
    Closes-Bug: 1792296
    (cherry picked from commit c7945101b360c01d988c5e991926389037a8ff7e)

tags: added: in-stable-rocky
Changed in tripleo:
assignee: wes hayutin (weshayutin) → Quique Llorente (quiquell)
Changed in tripleo:
assignee: Quique Llorente (quiquell) → Juan Antonio Osorio Robles (juan-osorio-robles)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/603323
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a33a92ef9f6f95423b6a7ed458b0d4d2aa50ad6b
Submitter: Zuul
Branch: master

commit a33a92ef9f6f95423b6a7ed458b0d4d2aa50ad6b
Author: Quique Llorente <email address hidden>
Date: Tue Sep 18 10:28:52 2018 +0200

    Upgrade docker ceph container

    Upgrades to v3.1.0-stable-3.1-luminous-centos-7-x86_64 to read config
    from ceph conf file [1]

    [1] https://github.com/ceph/ceph-container/commit/24bd34a6ed748d390c889bf07acc8b02c931f37d

    Change-Id: I9c4b6626655b69ea783aa341d6bff2906dd25f1e
    Closes-Bug: 1792296

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Quique Llorente (quiquell) wrote :

Can't believe it...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/603369
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=80768092ca8077be68ed1c2e24c3b9ad0560b5b8
Submitter: Zuul
Branch: stable/queens

commit 80768092ca8077be68ed1c2e24c3b9ad0560b5b8
Author: Quique Llorente <email address hidden>
Date: Tue Sep 18 15:00:33 2018 +0200

    Upgrade docker ceph container

    Upgrades to v3.1.0-stable-3.1-luminous-centos-7-x86_64 to read config
    from ceph conf file [1]

    [1] https://github.com/ceph/ceph-container/commit/24bd34a6ed748d390c889bf07acc8b02c931f37d

    Closes-Bug: 1792296
    Change-Id: I2658011d5d50cd6f7c6d12e1923e4a3b15f64010
    (cherry picked from commit c7945101b360c01d988c5e991926389037a8ff7e)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/603322
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=baf8b364cbfec83cbd2d8dfa7436736a772fe622
Submitter: Zuul
Branch: master

commit baf8b364cbfec83cbd2d8dfa7436736a772fe622
Author: Quique Llorente <email address hidden>
Date: Tue Sep 18 10:27:53 2018 +0200

    Upgrade docker ceph container

    Upgrades to v3.1.0-stable-3.1-luminous-centos-7-x86_64 to read config
    from ceph conf file [1]

    [1] https://github.com/ceph/ceph-container/commit/24bd34a6ed748d390c889bf07acc8b02c931f37d

    Change-Id: I2658011d5d50cd6f7c6d12e1923e4a3b15f64010
    Closes-Bug: 1792296

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.0.0

This issue was fixed in the openstack/tripleo-common 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 8.6.7

This issue was fixed in the openstack/tripleo-common 8.6.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.5.0

This issue was fixed in the openstack/tripleo-common 9.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.