ci: OVB based jobs are not collecting logs from OC nodes

Bug #1755891 reported by Matt Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Matt Young

Bug Description

As of {unknown_date} we do not have logs being collected and/or persisted for OC nodes from OVB jobs.

Example:

- https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8

We do have logs for the UC, but not OC nodes. In this specific case as well OC deploy was successful:

- https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8/console.txt.gz#_2018-03-14_11_41_28_692

Tags: ci quickstart
wes hayutin (weshayutin)
tags: added: alert
Matt Young (halcyondude)
Changed in tripleo:
milestone: none → rocky-3
milestone: rocky-3 → rocky-1
wes hayutin (weshayutin)
tags: added: quickstart
Matt Young (halcyondude)
description: updated
Revision history for this message
Matt Young (halcyondude) wrote :

Looks like OC nodes are unreachable at the point in time when collect logs is invoked. root cause unknown (so far)

---

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8/console.txt.gz#_2018-03-14_10_01_33_327

+(./toci_quickstart.sh:66): QUICKSTART_COLLECTLOGS_CMD='

/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart/bin/ansible-playbook
/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart/playbooks/collect-logs.yml
-vv
--extra-vars @/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart/config/release/tripleo-ci/promotion-testing-hash-queens.yml
--extra-vars @/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart/config/general_config/featureset020.yml
--extra-vars @/opt/stack/new/tripleo-ci/toci-quickstart/config/testenv/ovb.yml --extra-vars @/opt/stack/new/tripleo-ci/toci-quickstart/config/testenv/ovb-rdocloud.yml
--extra-vars local_working_dir=/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart
--extra-vars virthost=undercloud
--inventory /home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/.quickstart/hosts
--extra-vars tripleo_root=/opt/stack/new
--extra-vars working_dir=/home/jenkins
--extra-vars validation_args='\''--validation-errors-nonfatal'\''
--extra-vars @/opt/stack/new/tripleo-ci/toci-quickstart/config/collect-logs.yml
--extra-vars artcl_collect_dir=/home/jenkins/workspace/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/logs
--tags all
--skip-tags teardown-all '

---

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8/quickstart_collect_logs.log

fatal: [overcloud-controller-foo-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host
", "unreachable": true}
fatal: [overcloud-novacompute-bar-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host
", "unreachable": true}

Revision history for this message
Matt Young (halcyondude) wrote :

Last change to where collect logs is invoked landed 8 days ago

- https://github.com/openstack-infra/tripleo-ci/commit/765389d1782fe7e05eac106e8a27554f876bf1f9

This was merged 5-march:

- https://github.com/openstack-infra/tripleo-ci/commit/dc4a553f3a55492bf09d1b801fe699a0a44af97b
- https://review.rdoproject.org/r/#/c/12442

there's no ref'd bug / details other than commit message

---

Run logs collection after job finished in RDO CI

we run logs collection in post playbook in upstream infra CI,
do the same in third-party CI whith OVB and multinode jobs.
Collect logs fter job is finished.

---

Looking at the job history (in this case for queens)

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens

17b22fe/ 2018-03-08 03:50
5234057/ 2018-03-07 19:08
88c8758/ 2018-03-07 13:02
b949520/ 2018-03-07 04:24

# b949520 is the first missing OC logs
# f24ab30 is the last job with OC logs

f24ab30/ 2018-03-06 19:08
3a7dcbe/ 2018-03-06 12:56
e724251/ 2018-03-06 07:38
0d4c7d0/ 2018-03-05 22:43
b86511d/ 2018-03-05 13:08

Revision history for this message
Matt Young (halcyondude) wrote :

cross-ref'ing with master periodic job logs...the same point in time is when things started failing:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/?C=M;O=D

6b8323e 2018-03-08 05:04
8263cc3 2018-03-07 19:55
02f4bc0 2018-03-07 13:48
b9b0389 2018-03-07 05:11

# b9b0389 is the first missing OC logs
# 4c5a355 is the last job with OC logs

4c5a355 2018-03-06 20:07
a049ff7 2018-03-06 13:11
70df410 2018-03-06 07:41
43182cd 2018-03-05 22:47
bf3f34e 2018-03-05 12:58

Revision history for this message
Matt Young (halcyondude) wrote :

This could be fallout from recent tripleo-ci squad work:

- https://trello.com/c/2e1WPhLn/581-add-logs-collection-after-job-in-rdo-jobs

However is just a working theory, investigating

Revision history for this message
Matt Young (halcyondude) wrote :

discussion in IRC about this from #oooq

http://paste.openstack.org/show/700998

Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
John Trowbridge (trown) wrote :
Revision history for this message
yatin (yatinkarel) wrote :
Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Alex Schultz (alex-schultz) wrote :

logs are being captured, closing this out. feel free to reopen if it happens again.

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/9a4960c/

tags: removed: alert
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.