In sc000 overcloud upgrade job, the upgrade step ran for an hour before it got killed:
2018-05-18 12:38:06.244828 | primary | TASK [tripleo-upgrade : run overcloud major upgrade in each of the roles/hostgroups] ***
2018-05-18 12:38:06.285055 | primary | Friday 18 May 2018 12:38:06 +0000 (0:00:00.128) 0:17:21.847 ************
2018-05-18 13:36:40.456113 | primary | +(./toci_quickstart.sh:101): exit_value=143
#####################################################################
However, in package_update.log i can see that Ansible only ran for 10 minutes before getting stuck here:
2018-05-18 12:48:21,823 p=25658 u=mistral | TASK [Check-mode for Run deployment InstanceIdDeployment] **********************
2018-05-18 12:48:21,846 p=25658 u=mistral | skipping: [centos-7-inap-mtl01-0004085172] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-18 12:48:21,866 p=25658 u=mistral | TASK [Lookup deployment UUID] **************************************************
2018-05-18 12:48:22,008 p=25658 u=mistral | ok: [centos-7-inap-mtl01-0004085172] => {"ansible_facts": {"deployment_uuid": "13a7b14b-f8a7-4111-b8cb-32ec1299bb3d"}, "changed": false, "failed": false}
2018-05-18 12:48:22,029 p=25658 u=mistral | TASK [Render deployment file for NetworkDeployment] ****************************
2018-05-18 12:48:23,287 p=25658 u=mistral | changed: [centos-7-inap-mtl01-0004085172] => {"changed": true, "checksum": "8a0681ea339a48aaef58f4e8a9ee82be718329a5", "dest": "/var/lib/heat-config/tripleo-config-download/NetworkDeployment-13a7b14b-f8a7-4111-b8cb-32ec1299bb3d", "failed": false, "gid": 0, "group": "root", "md5sum": "f7b354dd15fa143d0384da7366bfc2f5", "mode": "0644", "owner": "root", "secontext": "system_u:object_r:var_lib_t:s0", "size": 8650, "src": "/home/tripleo-admin/.ansible/tmp/ansible-tmp-1526647702.16-142551103574098/source", "state": "file", "uid": 0}
2018-05-18 12:48:23,310 p=25658 u=mistral | TASK [Check if deployed file exists for NetworkDeployment] *********************
2018-05-18 12:48:23,918 p=25658 u=mistral | ok: [centos-7-inap-mtl01-0004085172] => {"changed": false, "failed": false, "stat": {"exists": false}}
2018-05-18 12:48:23,939 p=25658 u=mistral | TASK [Check previous deployment rc for NetworkDeployment] **********************
2018-05-18 12:48:23,968 p=25658 u=mistral | skipping: [centos-7-inap-mtl01-0004085172] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-18 12:48:23,995 p=25658 u=mistral | TASK [Remove deployed file for NetworkDeployment when previous deployment failed] ***
2018-05-18 12:48:24,028 p=25658 u=mistral | skipping: [centos-7-inap-mtl01-0004085172] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-18 12:48:24,050 p=25658 u=mistral | TASK [Force remove deployed file for NetworkDeployment] ************************
2018-05-18 12:48:24,079 p=25658 u=mistral | skipping: [centos-7-inap-mtl01-0004085172] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-18 12:48:24,099 p=25658 u=mistral | TASK [Run deployment NetworkDeployment] ****************************************
#####################################################################
And in journal on the overcloud there seem to be issues with network from that point on:
May 18 12:48:24 centos-7-inap-mtl01-0004085172 sudo[132607]: tripleo-admin : TTY=pts/0 ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ejafdcinazmlurfyjpdrifvituqdmpyo; HEAT_SHELL_CONFIG=/var/lib/heat-config/tripleo-config-download/NetworkDeployment-13a7b14b-f8a7-4111-b8cb-32ec1299bb3d /usr/bin/python /home/tripleo-admin/.ansible/tmp/ansible-tmp-1526647704.2-18691295782855/command.py; rm -rf "/home/tripleo-admin/.ansible/tmp/ansible-tmp-1526647704.2-18691295782855/" > /dev/null 2>&1
May 18 12:48:24 centos-7-inap-mtl01-0004085172 ansible-command[132610]: Invoked with warn=True executable=None _uses_shell=True _raw_params=/usr/libexec/os-refresh-config/configure.d/55-heat-config
exit $(jq .deploy_status_code /var/lib/heat-config/deployed/13a7b14b-f8a7-4111-b8cb-32ec1299bb3d.notify.json) removes=None creates=None chdir=None stdin=None
May 18 12:48:24 centos-7-inap-mtl01-0004085172 ovs-vsctl[132627]: ovs|00001|vsctl|INFO|Called as ovs-vsctl del-br br-ex
May 18 12:48:24 centos-7-inap-mtl01-0004085172 kernel: device br-ex left promiscuous mode
May 18 12:48:24 centos-7-inap-mtl01-0004085172 kernel: device vxlan_sys_4789 left promiscuous mode
May 18 12:48:24 centos-7-inap-mtl01-0004085172 kernel: device ovs-system left promiscuous mode
May 18 12:48:25 centos-7-inap-mtl01-0004085172 corosync[123189]: [TOTEM ] The network interface is down.
May 18 12:48:25 centos-7-inap-mtl01-0004085172 corosync[123189]: [TOTEM ] adding new UDPU member {192.168.24.11}
May 18 12:48:26 centos-7-inap-mtl01-0004085172 ntpd[9327]: Deleting interface #16 br-ex, 192.168.24.13#123, interface stats: received=0, sent=0, dropped=0, active_time=263 secs
May 18 12:48:26 centos-7-inap-mtl01-0004085172 ntpd[9327]: Deleting interface #15 br-ex, 192.168.24.7#123, interface stats: received=0, sent=0, dropped=0, active_time=263 secs
May 18 12:48:26 centos-7-inap-mtl01-0004085172 ntpd[9327]: Deleting interface #8 vxlan_sys_4789, fe80::a4fe:1dff:fe00:4f6d#123, interface stats: received=0, sent=0, dropped=0, active_time=2675 secs
May 18 12:48:26 centos-7-inap-mtl01-0004085172 ntpd[9327]: Deleting interface #7 br-ex, fe80::8427:bff:feea:bc4e#123, interface stats: received=0, sent=0, dropped=0, active_time=2675 secs
May 18 12:48:26 centos-7-inap-mtl01-0004085172 ntpd[9327]: Deleting interface #4 br-ex, 192.168.24.11#123, interface stats: received=0, sent=0, dropped=0, active_time=2675 secs
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.365359775Z" level=debug msg="Running health check for container 17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652 ..."
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.368264345Z" level=debug msg="starting exec command 7cfa08a9c9e71417a0e4f5d06a88b107b4a0d86816f9b46575e65352a9cc8d06 in container 17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652"
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.370045078Z" level=debug msg="attach: stdout: begin"
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.370194263Z" level=debug msg="attach: stderr: begin"
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.508910392Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"start-process\", Id:\"17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652\", Status:0x0, Pid:\"7cfa08a9c9e71417a0e4f5d06a88b107b4a0d86816f9b46575e65352a9cc8d06\", Timestamp:(*timestamp.Timestamp)(0xc421eba7d0)}"
May 18 12:48:28 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:28.509054443Z" level=debug msg="libcontainerd: event unhandled: type:\"start-process\" id:\"17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652\" pid:\"7cfa08a9c9e71417a0e4f5d06a88b107b4a0d86816f9b46575e65352a9cc8d06\" timestamp:<seconds:1526647708 nanos:508332080 > "
May 18 12:48:28 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[132759]: ERROR: Unable to find nic or netmask.
May 18 12:48:28 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[132760]: ERROR: Unable to find nic or netmask.
May 18 12:48:28 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[132771]: ERROR: [findif] failed
May 18 12:48:28 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[132772]: ERROR: [findif] failed
May 18 12:48:28 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.13_monitor_10000:132714:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:28 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.7_monitor_10000:132713:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:28 centos-7-inap-mtl01-0004085172 crmd[123204]: notice: centos-7-inap-mtl01-0004085172-ip-192.168.24.13_monitor_10000:30 [ ocf-exit-reason:Unable to find nic or netmask.\n ]
May 18 12:48:28 centos-7-inap-mtl01-0004085172 crmd[123204]: notice: centos-7-inap-mtl01-0004085172-ip-192.168.24.7_monitor_10000:29 [ ocf-exit-reason:Unable to find nic or netmask.\n ]
May 18 12:48:29 centos-7-inap-mtl01-0004085172 kernel: iptables dropped: IN=eth0 OUT= MAC=fa:16:3e:37:b2:4d:e8:04:62:1c:a6:80:08:00 SRC=39.135.17.40 DST=198.72.124.15 LEN=40 TOS=0x00 PREC=0x00 TTL=39 ID=58492 DF PROTO=TCP SPT=161 DPT=28392 WINDOW=0 RES=0x00 ACK RST URGP=0
May 18 12:48:38 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:38.538695466Z" level=debug msg="containerd: process exited" id=17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652 pid=7cfa08a9c9e71417a0e4f5d06a88b107b4a0d86816f9b46575e65352a9cc8d06 status=1 systemPid=132696
May 18 12:48:38 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:38.539849970Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"exit\", Id:\"17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652\", Status:0x1, Pid:\"7cfa08a9c9e71417a0e4f5d06a88b107b4a0d86816f9b46575e65352a9cc8d06\", Timestamp:(*timestamp.Timestamp)(0xc421ebab90)}"
May 18 12:48:38 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:38.540206880Z" level=debug msg="attach: stderr: end"
May 18 12:48:38 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:38.540245986Z" level=debug msg="attach: stdout: end"
May 18 12:48:38 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:38.540378589Z" level=debug msg="Health check for container 17ad03eb2fc01e447c99c5ba473965406f13bd5c495cda3e0a7df65fd1597652 done (exitCode=1)"
May 18 12:48:38 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[132898]: ERROR: Unable to find nic or netmask.
May 18 12:48:38 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[132899]: ERROR: Unable to find nic or netmask.
May 18 12:48:38 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[132910]: ERROR: [findif] failed
May 18 12:48:38 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[132911]: ERROR: [findif] failed
May 18 12:48:38 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.13_monitor_10000:132852:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:38 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.7_monitor_10000:132853:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:43 centos-7-inap-mtl01-0004085172 haproxy[125073]: Server keystone_public/centos-7-inap-mtl01-0004085172.internalapi.localdomain is DOWN, reason: Layer4 timeout, check duration: 2001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 18 12:48:43 centos-7-inap-mtl01-0004085172 haproxy[125073]: proxy keystone_public has no server available!
May 18 12:48:44 centos-7-inap-mtl01-0004085172 haproxy[125073]: Server keystone_admin/centos-7-inap-mtl01-0004085172.ctlplane.localdomain is DOWN, reason: Layer4 timeout, check duration: 2001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 18 12:48:44 centos-7-inap-mtl01-0004085172 haproxy[125073]: proxy keystone_admin has no server available!
May 18 12:48:46 centos-7-inap-mtl01-0004085172 os-collect-config[4045]: HTTPConnectionPool(host='192.168.24.1', port=8080): Max retries exceeded with url: /v1/AUTH_e6f8fb51f05d4913bc88f5c1007e8c88/ov-ntroller-4i3nvifbtabn-deployed-server-xwvxubp7c3qr/754fede3-7932-42fa-adf6-f5c69161a4a7?temp_url_sig=e2110c15b086458fb8c32e4cf58df8944bc3b8ae&temp_url_expires=2147483586 (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa128698f50>, 'Connection to 192.168.24.1 timed out. (connect timeout=10.0)'))
May 18 12:48:46 centos-7-inap-mtl01-0004085172 os-collect-config[4045]: Source [request] Unavailable.
May 18 12:48:49 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[133047]: ERROR: Unable to find nic or netmask.
May 18 12:48:49 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[133048]: ERROR: Unable to find nic or netmask.
May 18 12:48:49 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[133059]: ERROR: [findif] failed
May 18 12:48:49 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[133060]: ERROR: [findif] failed
May 18 12:48:49 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.13_monitor_10000:133001:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:49 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.7_monitor_10000:133002:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:54 centos-7-inap-mtl01-0004085172 dockerd-current[9480]: time="2018-05-18T12:48:54.328870686Z" level=debug msg="clean 10 unused exec commands"
May 18 12:48:59 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[133187]: ERROR: Unable to find nic or netmask.
May 18 12:48:59 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[133186]: ERROR: Unable to find nic or netmask.
May 18 12:48:59 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.7)[133198]: ERROR: [findif] failed
May 18 12:48:59 centos-7-inap-mtl01-0004085172 IPaddr2(ip-192.168.24.13)[133199]: ERROR: [findif] failed
May 18 12:48:59 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.7_monitor_10000:133141:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:59 centos-7-inap-mtl01-0004085172 lrmd[123201]: notice: ip-192.168.24.13_monitor_10000:133140:stderr [ ocf-exit-reason:Unable to find nic or netmask. ]
May 18 12:48:59 centos-7-inap-mtl01-0004085172 kernel: iptables dropped: IN=eth0 OUT= MAC=fa:16:3e:37:b2:4d:e8:04:62:1c:a6:80:08:00 SRC=5.188.9.25 DST=198.72.124.15 LEN=40 TOS=0x00 PREC=0x00 TTL=247 ID=32945 PROTO=TCP SPT=56669 DPT=45848 WINDOW=1024 RES=0x00 SYN URGP=0
http:// logs.openstack. org/25/ 558425/ 14/experimental /tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 4cccec5/