tripleo

Bug #1979276
Comment #0

Comment 0 for bug 1979276

Revision history for this message

Takashi Kajinami (kajinamit) wrote on 2022-06-21:

Description
===========

The puppet-glance-tripleo-standalone job started to fail consistently.

Example:
https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e

Looking at the deployment log, it seems ovn-dbs-bundle resource fails to start
and pacemaker does not start the vip resource because of location constraint.

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_475/846784/8/check/puppet-glance-tripleo-standalone/4757380/logs/undercloud/var/log/extra/pcs.txt

```
Full List of Resources:
  * ip-192.168.24.3 (ocf:heartbeat:IPaddr2): Stopped
  * Container bundle: haproxy-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started standalone
  * Container bundle: galera-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-mariadb:pcmklatest]:
    * galera-bundle-0 (ocf:heartbeat:galera): Promoted standalone
  * Container bundle: rabbitmq-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0 (ocf:heartbeat:rabbitmq-cluster): Started standalone
  * Container bundle: ovn-dbs-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0 (ocf:ovn:ovndb-servers): Unpromoted standalone

Failed Resource Actions:
* ovndb_servers promote on ovn-dbs-bundle-0 could not be executed (Timed Out: Resource agent did not complete within 2m) at Tue Jun 21 06:41:09 2022 after 2m1ms
```

Looking at journal log, it seems ovn-nbctl command crashes with core dump.

                                                                 Module /usr/bin/ovn-nbctl with build-id 2798d30ce0833d6e0fcabb6d8a0a98cba4da707d
                                                                 Module linux-vdso.so.1 with build-id 932e8861e1b4a3fa34f93ff803210fc441bcd188
                                                                 Module libnghttp2.so.14 with build-id 7eadbd56a0e5bcd3d8a6b39b9bab2327e380283a
                                                                 Module libpython3.9.so.1.0 with build-id bbe909b82db5ae1835b0022275d690951734a378
                                                                 Module libevent-2.1.so.7 with build-id af406c254338ff6ceff47360cba92cdcf233cf14
                                                                 Module libprotobuf-c.so.1 with build-id 46661ae5d66cbaa2aa82b1b765472bdfa4712a24
                                                                 Module ld-linux-x86-64.so.2 with build-id 1d95aae3e4174446d3b885ad234d4f7e573e71db
                                                                 Module libz.so.1 with build-id 25486226566596e403da5485fb0ec85deed6b9fa
                                                                 Module libc.so.6 with build-id 14830f7e71953d5f0dac317543ac1e3fcdd874f5
                                                                 Module libunbound.so.8 with build-id def32d1bb7a7d99c59bf62e00c628af0246afa91
                                                                 Module libm.so.6 with build-id 3eb525d2e163793ef2e888d5bb46e104d11a3201
                                                                 Module libcap-ng.so.0 with build-id fdca0a301667e15db99d726152b57feeb35e4dbe
                                                                 Module libcrypto.so.3 with build-id ea50b2486363fd2ce58686de4fe12956a9fa4622
                                                                 Module libssl.so.3 with build-id 6a3692862938d5df4111a2474b84f3ee9124f941
                                                                 Stack trace of thread 4928:
                                                                 #0 0x000055d658f09ba8 n/a (/usr/bin/ovn-nbctl + 0x16ba8)
                                                                 ELF object binary architecture: AMD x86-64
```

Steps to reproduce
==================
* Deploy standalone with ml2+ovn enabled

Expected result
===============
* Deployment should succeed without any error

Actual result
=============
* Deployment fails because vip is not started

Environment
===========
* The problem is observed only in master so far

Logs & Configs
==============
See https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e

Description
===========

The puppet-glance-tripleo-standalone job started to fail consistently.

Example:
https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e

Looking at the deployment log, it seems ovn-dbs-bundle resource fails to start
and pacemaker does not start the vip resource because of location constraint.

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_475/846784/8/check/puppet-glance-tripleo-standalone/4757380/logs/undercloud/var/log/extra/pcs.txt

```
Full List of Resources:
  * ip-192.168.24.3	(ocf:heartbeat:IPaddr2):	 Stopped
  * Container bundle: haproxy-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0	(ocf:heartbeat:podman):	 Started standalone
  * Container bundle: galera-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf:heartbeat:galera):	 Promoted standalone
  * Container bundle: rabbitmq-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf:heartbeat:rabbitmq-cluster):	 Started standalone
  * Container bundle: ovn-dbs-bundle [127.0.0.1:5001/tripleomastercentos9/openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0	(ocf:ovn:ovndb-servers):	 Unpromoted standalone

Failed Resource Actions:
  * ovndb_servers promote on ovn-dbs-bundle-0 could not be executed (Timed Out: Resource agent did not complete within 2m) at Tue Jun 21 06:41:09 2022 after 2m1ms
```

Looking at journal log, it seems ovn-nbctl command crashes with core dump.

```
Jun 21 06:41:08 standalone.localdomain kernel: traps: ovn-nbctl[212704] trap invalid opcode ip:55d658f09ba8 sp:7ffcdc0e3140 error:0 in ovn-nbctl[55d658f05000+5c000]
Jun 21 06:41:08 standalone.localdomain systemd[1]: Started Process Core Dump (PID 212705/UID 0).
Jun 21 06:41:08 standalone.localdomain systemd-coredump[212707]: Process 212704 (ovn-nbctl) of user 0 dumped core.
                                                                 
                                                                 Module /usr/bin/ovn-nbctl with build-id 2798d30ce0833d6e0fcabb6d8a0a98cba4da707d
                                                                 Module linux-vdso.so.1 with build-id 932e8861e1b4a3fa34f93ff803210fc441bcd188
                                                                 Module libnghttp2.so.14 with build-id 7eadbd56a0e5bcd3d8a6b39b9bab2327e380283a
                                                                 Module libpython3.9.so.1.0 with build-id bbe909b82db5ae1835b0022275d690951734a378
                                                                 Module libevent-2.1.so.7 with build-id af406c254338ff6ceff47360cba92cdcf233cf14
                                                                 Module libprotobuf-c.so.1 with build-id 46661ae5d66cbaa2aa82b1b765472bdfa4712a24
                                                                 Module ld-linux-x86-64.so.2 with build-id 1d95aae3e4174446d3b885ad234d4f7e573e71db
                                                                 Module libz.so.1 with build-id 25486226566596e403da5485fb0ec85deed6b9fa
                                                                 Module libc.so.6 with build-id 14830f7e71953d5f0dac317543ac1e3fcdd874f5
                                                                 Module libunbound.so.8 with build-id def32d1bb7a7d99c59bf62e00c628af0246afa91
                                                                 Module libm.so.6 with build-id 3eb525d2e163793ef2e888d5bb46e104d11a3201
                                                                 Module libcap-ng.so.0 with build-id fdca0a301667e15db99d726152b57feeb35e4dbe
                                                                 Module libcrypto.so.3 with build-id ea50b2486363fd2ce58686de4fe12956a9fa4622
                                                                 Module libssl.so.3 with build-id 6a3692862938d5df4111a2474b84f3ee9124f941
                                                                 Stack trace of thread 4928:
                                                                 #0  0x000055d658f09ba8 n/a (/usr/bin/ovn-nbctl + 0x16ba8)
                                                                 ELF object binary architecture: AMD x86-64
```

Steps to reproduce
==================
* Deploy standalone with ml2+ovn enabled

Expected result
===============
* Deployment should succeed without any error

Actual result
=============
* Deployment fails because vip is not started

Environment
===========
* The problem is observed only in master so far

Logs & Configs
==============
See https://zuul.opendev.org/t/openstack/build/4757380fddac4d59a02f778887727c0e