2022-06-21 07:36:03 |
Takashi Kajinami |
description |
Description
===========
The puppet-glance-tripleo-standalone job started to fail consistently.
Example:
https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e
Looking at the deployment log, it seems ovn-dbs-bundle resource fails to start
and pacemaker does not start the vip resource because of location constraint.
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_475/846784/8/check/puppet-glance-tripleo-standalone/4757380/logs/undercloud/var/log/extra/pcs.txt
```
Full List of Resources:
* ip-192.168.24.3 (ocf:heartbeat:IPaddr2): Stopped
* Container bundle: haproxy-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-haproxy:pcmklatest]:
* haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started standalone
* Container bundle: galera-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-mariadb:pcmklatest]:
* galera-bundle-0 (ocf:heartbeat:galera): Promoted standalone
* Container bundle: rabbitmq-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-rabbitmq:pcmklatest]:
* rabbitmq-bundle-0 (ocf:heartbeat:rabbitmq-cluster): Started standalone
* Container bundle: ovn-dbs-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-ovn-northd:pcmklatest]:
* ovn-dbs-bundle-0 (ocf:ovn:ovndb-servers): Unpromoted standalone
Failed Resource Actions:
* ovndb_servers promote on ovn-dbs-bundle-0 could not be executed (Timed Out: Resource agent did not complete within 2m) at Tue Jun 21 06:41:09 2022 after 2m1ms
```
Looking at journal log, it seems ovn-nbctl command crashes with core dump.
```
Jun 21 06:41:08 standalone.localdomain kernel: traps: ovn-nbctl[212704] trap invalid opcode ip:55d658f09ba8 sp:7ffcdc0e3140 error:0 in ovn-nbctl[55d658f05000+5c000]
Jun 21 06:41:08 standalone.localdomain systemd[1]: Started Process Core Dump (PID 212705/UID 0).
Jun 21 06:41:08 standalone.localdomain systemd-coredump[212707]: Process 212704 (ovn-nbctl) of user 0 dumped core.
Module /usr/bin/ovn-nbctl with build-id 2798d30ce0833d6e0fcabb6d8a0a98cba4da707d
Module linux-vdso.so.1 with build-id 932e8861e1b4a3fa34f93ff803210fc441bcd188
Module libnghttp2.so.14 with build-id 7eadbd56a0e5bcd3d8a6b39b9bab2327e380283a
Module libpython3.9.so.1.0 with build-id bbe909b82db5ae1835b0022275d690951734a378
Module libevent-2.1.so.7 with build-id af406c254338ff6ceff47360cba92cdcf233cf14
Module libprotobuf-c.so.1 with build-id 46661ae5d66cbaa2aa82b1b765472bdfa4712a24
Module ld-linux-x86-64.so.2 with build-id 1d95aae3e4174446d3b885ad234d4f7e573e71db
Module libz.so.1 with build-id 25486226566596e403da5485fb0ec85deed6b9fa
Module libc.so.6 with build-id 14830f7e71953d5f0dac317543ac1e3fcdd874f5
Module libunbound.so.8 with build-id def32d1bb7a7d99c59bf62e00c628af0246afa91
Module libm.so.6 with build-id 3eb525d2e163793ef2e888d5bb46e104d11a3201
Module libcap-ng.so.0 with build-id fdca0a301667e15db99d726152b57feeb35e4dbe
Module libcrypto.so.3 with build-id ea50b2486363fd2ce58686de4fe12956a9fa4622
Module libssl.so.3 with build-id 6a3692862938d5df4111a2474b84f3ee9124f941
Stack trace of thread 4928:
#0 0x000055d658f09ba8 n/a (/usr/bin/ovn-nbctl + 0x16ba8)
ELF object binary architecture: AMD x86-64
```
Steps to reproduce
==================
* Deploy standalone with ml2+ovn enabled
Expected result
===============
* Deployment should succeed without any error
Actual result
=============
* Deployment fails because vip is not started
Environment
===========
* The problem is observed only in master so far
Logs & Configs
==============
See https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e |
Description
===========
The puppet-glance-tripleo-standalone job started failong consistently.
Example:
https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e
Looking at the deployment log, it seems ovn-dbs-bundle resource fails to start
and pacemaker does not start the vip resource because of location constraint.
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_475/846784/8/check/puppet-glance-tripleo-standalone/4757380/logs/undercloud/var/log/extra/pcs.txt
```
Full List of Resources:
* ip-192.168.24.3 (ocf:heartbeat:IPaddr2): Stopped
* Container bundle: haproxy-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-haproxy:pcmklatest]:
* haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started standalone
* Container bundle: galera-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-mariadb:pcmklatest]:
* galera-bundle-0 (ocf:heartbeat:galera): Promoted standalone
* Container bundle: rabbitmq-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-rabbitmq:pcmklatest]:
* rabbitmq-bundle-0 (ocf:heartbeat:rabbitmq-cluster): Started standalone
* Container bundle: ovn-dbs-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-ovn-northd:pcmklatest]:
* ovn-dbs-bundle-0 (ocf:ovn:ovndb-servers): Unpromoted standalone
Failed Resource Actions:
* ovndb_servers promote on ovn-dbs-bundle-0 could not be executed (Timed Out: Resource agent did not complete within 2m) at Tue Jun 21 06:41:09 2022 after 2m1ms
```
Looking at journal log, it seems ovn-nbctl command crashes with core dump.
```
Jun 21 06:41:08 standalone.localdomain kernel: traps: ovn-nbctl[212704] trap invalid opcode ip:55d658f09ba8 sp:7ffcdc0e3140 error:0 in ovn-nbctl[55d658f05000+5c000]
Jun 21 06:41:08 standalone.localdomain systemd[1]: Started Process Core Dump (PID 212705/UID 0).
Jun 21 06:41:08 standalone.localdomain systemd-coredump[212707]: Process 212704 (ovn-nbctl) of user 0 dumped core.
Module /usr/bin/ovn-nbctl with build-id 2798d30ce0833d6e0fcabb6d8a0a98cba4da707d
Module linux-vdso.so.1 with build-id 932e8861e1b4a3fa34f93ff803210fc441bcd188
Module libnghttp2.so.14 with build-id 7eadbd56a0e5bcd3d8a6b39b9bab2327e380283a
Module libpython3.9.so.1.0 with build-id bbe909b82db5ae1835b0022275d690951734a378
Module libevent-2.1.so.7 with build-id af406c254338ff6ceff47360cba92cdcf233cf14
Module libprotobuf-c.so.1 with build-id 46661ae5d66cbaa2aa82b1b765472bdfa4712a24
Module ld-linux-x86-64.so.2 with build-id 1d95aae3e4174446d3b885ad234d4f7e573e71db
Module libz.so.1 with build-id 25486226566596e403da5485fb0ec85deed6b9fa
Module libc.so.6 with build-id 14830f7e71953d5f0dac317543ac1e3fcdd874f5
Module libunbound.so.8 with build-id def32d1bb7a7d99c59bf62e00c628af0246afa91
Module libm.so.6 with build-id 3eb525d2e163793ef2e888d5bb46e104d11a3201
Module libcap-ng.so.0 with build-id fdca0a301667e15db99d726152b57feeb35e4dbe
Module libcrypto.so.3 with build-id ea50b2486363fd2ce58686de4fe12956a9fa4622
Module libssl.so.3 with build-id 6a3692862938d5df4111a2474b84f3ee9124f941
Stack trace of thread 4928:
#0 0x000055d658f09ba8 n/a (/usr/bin/ovn-nbctl + 0x16ba8)
ELF object binary architecture: AMD x86-64
```
Steps to reproduce
==================
* Deploy standalone with ml2+ovn enabled
Expected result
===============
* Deployment should succeed without any error
Actual result
=============
* Deployment fails because vip is not started
Environment
===========
* The problem is observed only in master so far
Logs & Configs
==============
See https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e |
|