StarlingX

Bug #1837792
Comment #8

Comment 8 for bug 1837792

Revision history for this message

Angie Wang (angiewang) wrote on 2019-08-16:

Application-apply aborted issue was reproduced on WCP-3_6.

Some of the db-sync pods failed, ie.. neutron-db-sync, placement-db-sync...
During the neutron and placement db syncing, the mariadb ingress was reloaded unexpectedly (see the following logs) which caused the DB connection terminated and db sync failed.

mariadb-ingress-6ff964556d-8pg62
I0802 16:55:20.751467 38 controller.go:211] backend reload required
I0802 16:55:20.813445 38 controller.go:220] ingress backend successfully reloaded...

db-sync restart not working because the db table already exists(alembic_version table has not been updated to the right migration version yet before db connection lost).

From the log, unable to determine the root cause of mariadb ingress reloads. Usually, if ingress reloaded, there would have related logs.

Note that this issue only happened when the initial installation of the system and apply stx-openstack right after the system is up.

Found something interesting, the OSD of WCP-3_6 for both controllers are on HDD disk which is slow disk.
Mariadb is on OSD and there are lots of db reads and writes during applying nova, placement and neutron.
After changing OSD to SSD disk for that lab, I am not be able to reproduce that, apply works well.

But we may still want some chart changes that can handle the db sync failure. Ie... drop DB/tables and retry.