Bug #1895822 “centos8 standalone-upgrade-ussuri fails tempest pi...” : Bugs : tripleo

Marios Andreou (marios-b) on 2020-09-17

tags:

added: promotion-blocker

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-17:

#1

spent some more time digging at logs. I am not clear yet if this is an issue with HA/mysql or if it is an issue with ovs/ovn networking. I am leaning towards networking at the moment.

I'll reach out to network and pidone squads to check here - adding pointers to some error messages in the logs I came across just now:

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/mysql/mysqld.log

                * 2020-09-15 14:32:08 0 [Note] InnoDB: Starting shutdown...
                * 2020-09-15 14:32:09 0 [Note] /usr/libexec/mysqld: Shutdown complete
                * 2020-09-15 14:32:30 0 [Note] WSREP: Found saved state: cebd6089-f754-11ea-ac23-9b5df17a204a:8702, safe_to_bootstrap: 1
                * 2020-09-15 14:32:30 0 [Note] /usr/libexec/mysqld: ready for connections.
        Version: '10.3.17-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
                  2020-09-15 14:32:31 0 [Note] InnoDB: Buffer pool(s) load completed at 200915 14:32:31
                * 2020-09-15 14:38:29 259 [Warning] Aborted connection 259 to db: 'nova_api' user: 'nova_api' host: '192.168.24.1' (Got an error reading communication packets)

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/openvswitch/ovn-controller.log

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/stdouts/ovn-dbs-bundle.log

* 2020-09-15T13:16:18.332804574+00:00 stderr F (operation_finished) notice: ovndb_servers_start_0:48:stderr [ ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ]

spent some more time digging at logs. I am not clear yet if this is an issue with HA/mysql or if it is an issue with ovs/ovn networking. I am leaning towards networking at the moment.

I'll reach out to network and pidone squads to check here - adding pointers to some error messages in the logs I came across just now:

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/mysql/mysqld.log

* 2020-09-15 14:32:08 0 [Note] InnoDB: Starting shutdown...
                * 2020-09-15 14:32:09 0 [Note] /usr/libexec/mysqld: Shutdown complete
                * 2020-09-15 14:32:30 0 [Note] WSREP: Found saved state: cebd6089-f754-11ea-ac23-9b5df17a204a:8702, safe_to_bootstrap: 1
                * 2020-09-15 14:32:30 0 [Note] /usr/libexec/mysqld: ready for connections.
        Version: '10.3.17-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
                  2020-09-15 14:32:31 0 [Note] InnoDB: Buffer pool(s) load completed at 200915 14:32:31
                * 2020-09-15 14:38:29 259 [Warning] Aborted connection 259 to db: 'nova_api' user: 'nova_api' host: '192.168.24.1' (Got an error reading communication packets)

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/openvswitch/ovn-controller.log

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/stdouts/ovn-dbs-bundle.log

* 2020-09-15T13:16:18.332804574+00:00 stderr F (operation_finished) 	notice: ovndb_servers_start_0:48:stderr [ ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ]

Revision history for this message

Martin Mágr (mmagr) wrote on 2020-09-17:

#2

Download full text (16.2 KiB)

I'm facing the same issue with OVS based networking:

Deploy fail:
020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | TASK [tripleo-keystone-resources : Check Keystone public endpoint status] ******
2020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | Wednesday 16 September 2020 10:52:58 -0400 (0:00:07.246) 0:28:33.359 ***
2020-09-16 10:52:58,899 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,255 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,611 p=83091 u=mistral n=ansible | failed: [undercloud] (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,016 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,271 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,677 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,981 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,338 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,695 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,049 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,050 p=83091 u=mistral n=ansible | fatal: [undercloud]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,055 p=83091 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************

/var/log/containers/httpd/keystone/keystone_wsgi_error.log:
[Tue Sep 15 23:52:38.988613 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570] File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool...

I'm facing the same issue with OVS based networking:

Deploy fail:
020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | TASK [tripleo-keystone-resources : Check Keystone public endpoint status] ******
2020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | Wednesday 16 September 2020  10:52:58 -0400 (0:00:07.246)       0:28:33.359 ***
2020-09-16 10:52:58,899 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,255 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,611 p=83091 u=mistral n=ansible | failed: [undercloud] (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,016 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,271 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,677 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,981 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,338 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,695 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,049 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,050 p=83091 u=mistral n=ansible | fatal: [undercloud]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,055 p=83091 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************

/var/log/containers/httpd/keystone/keystone_wsgi_error.log:
[Tue Sep 15 23:52:38.988613 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 751, in _checkout
[Tue Sep 15 23:52:38.988616 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     fairy = _ConnectionRecord.checkout(pool)
[Tue Sep 15 23:52:38.988620 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 488, in checkout
[Tue Sep 15 23:52:38.988623 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     rec._checkin_failed(err)
[Tue Sep 15 23:52:38.988627 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
[Tue Sep 15 23:52:38.988629 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     compat.reraise(exc_type, exc_value, exc_tb)
[Tue Sep 15 23:52:38.988634 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise
[Tue Sep 15 23:52:38.988636 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     raise value
[Tue Sep 15 23:52:38.988640 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 485, in checkout
[Tue Sep 15 23:52:38.988643 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     dbapi_connection = rec.get_connection()
[Tue Sep 15 23:52:38.988647 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 584, in get_connection
[Tue Sep 15 23:52:38.988650 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     self.__connect()
[Tue Sep 15 23:52:38.988654 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 630, in __connect
[Tue Sep 15 23:52:38.988656 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     connection = pool._invoke_creator(self)
[Tue Sep 15 23:52:38.988661 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
[Tue Sep 15 23:52:38.988663 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     return dialect.connect(*cargs, **cparams)
[Tue Sep 15 23:52:38.988667 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 453, in connect
[Tue Sep 15 23:52:38.988670 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     return self.dbapi.connect(*cargs, **cparams)
[Tue Sep 15 23:52:38.988674 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/__init__.py", line 90, in Connect
[Tue Sep 15 23:52:38.988677 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     return Connection(*args, **kwargs)
[Tue Sep 15 23:52:38.988684 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 699, in __init__
[Tue Sep 15 23:52:38.988686 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     self.connect()
[Tue Sep 15 23:52:38.988691 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 935, in connect
[Tue Sep 15 23:52:38.988693 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     self._get_server_information()
[Tue Sep 15 23:52:38.988697 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1249, in _get_server_information
[Tue Sep 15 23:52:38.988700 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     packet = self._read_packet()
[Tue Sep 15 23:52:38.988704 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 991, in _read_packet
[Tue Sep 15 23:52:38.988707 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     packet_header = self._read_bytes(4)
[Tue Sep 15 23:52:38.988711 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1037, in _read_bytes
[Tue Sep 15 23:52:38.988714 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570]     CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
[Tue Sep 15 23:52:38.988728 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570] oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
[Tue Sep 15 23:52:38.988731 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570] [SQL: SELECT 1]
[Tue Sep 15 23:52:38.988734 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570] (Background on this error at: http://sqlalche.me/e/e3q8)

/var/log/containers/mysql/mysqld.log:
2020-09-16 13:58:41 156854 [Warning] Aborted connection 156854 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156451 [Warning] Aborted connection 156451 to db: 'ovs_neutron' user: 'neutron' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156441 [Warning] Aborted connection 156441 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156874 [Warning] Aborted connection 156874 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156343 [Warning] Aborted connection 156343 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156873 [Warning] Aborted connection 156873 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156915 [Warning] Aborted connection 156915 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 156996 [Warning] Aborted connection 156996 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 157093 [Warning] Aborted connection 157093 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 148645 [Warning] Aborted connection 148645 to db: 'ovs_neutron' user: 'neutron' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 157025 [Warning] Aborted connection 157025 to db: 'ovs_neutron' user: 'neutron' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 13:58:41 157092 [Warning] Aborted connection 157092 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:20 158486 [Warning] Aborted connection 158486 to db: 'aodh' user: 'aodh' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:26 158506 [Warning] Aborted connection 158506 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:32 158527 [Warning] Aborted connection 158527 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:32 158528 [Warning] Aborted connection 158528 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:36 158541 [Warning] Aborted connection 158541 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:40 158554 [Warning] Aborted connection 158554 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:00:45 158568 [Warning] Aborted connection 158568 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:34 158727 [Warning] Aborted connection 158727 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:37 158739 [Warning] Aborted connection 158739 to db: 'panko' user: 'panko' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:42 158754 [Warning] Aborted connection 158754 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:46 158768 [Warning] Aborted connection 158768 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:48 158776 [Warning] Aborted connection 158776 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:01:53 158791 [Warning] Aborted connection 158791 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:05 158839 [Warning] Aborted connection 158839 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:10 158857 [Warning] Aborted connection 158857 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:15 158870 [Warning] Aborted connection 158870 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:20 158891 [Warning] Aborted connection 158891 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:24 158904 [Warning] Aborted connection 158904 to db: 'placement' user: 'placement' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:30 158924 [Warning] Aborted connection 158924 to db: 'nova_cell0' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:30 158925 [Warning] Aborted connection 158925 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:02:30 158923 [Warning] Aborted connection 158923 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:06 167575 [Warning] Aborted connection 167575 to db: 'aodh' user: 'aodh' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:11 167592 [Warning] Aborted connection 167592 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:17 167611 [Warning] Aborted connection 167611 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:17 167609 [Warning] Aborted connection 167609 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:21 167626 [Warning] Aborted connection 167626 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:26 167639 [Warning] Aborted connection 167639 to db: 'heat' user: 'heat' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:49:30 167654 [Warning] Aborted connection 167654 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:20 167807 [Warning] Aborted connection 167807 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:25 167823 [Warning] Aborted connection 167823 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:29 167838 [Warning] Aborted connection 167838 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:31 167844 [Warning] Aborted connection 167844 to db: 'glance' user: 'glance' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:36 167859 [Warning] Aborted connection 167859 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:47 167905 [Warning] Aborted connection 167905 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:52 167921 [Warning] Aborted connection 167921 to db: 'cinder' user: 'cinder' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:50:57 167934 [Warning] Aborted connection 167934 to db: 'keystone' user: 'keystone' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:51:03 167956 [Warning] Aborted connection 167956 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:51:06 167968 [Warning] Aborted connection 167968 to db: 'placement' user: 'placement' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:51:12 167988 [Warning] Aborted connection 167988 to db: 'nova_cell0' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:51:12 167989 [Warning] Aborted connection 167989 to db: 'nova' user: 'nova' host: '172.17.1.149' (Got an error reading communication packets)
2020-09-16 14:51:12 167986 [Warning] Aborted connection 167986 to db: 'nova_api' user: 'nova_api' host: '172.17.1.149' (Got an error reading communication packets)

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-18:

#3

@martin what in particular made you suspect you have the same issue I can't tell from the logs in comment #2

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-18:

#4

Download full text (6.1 KiB)

did some more digging - leaning more towards this being a neutron issue but still not closer to understanding why.

I used https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/ as an example and trying to line up the timings. After the upgrade starts I see a number of errors in the openvswitch/ovn logs in particular "OVNSB commit failed, force recompute next time" followed by "WARN|tcp:192.168.24.1:6642: connection dropped (Broken pipe)":

"UPGRADE start/end": 20:33:00.699450 -> 21:33:44

        * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/job-output.txt
        * 2020-09-17 20:33:00.699450 | primary | TASK [standalone-upgrade : Upgrade the standalone] *****************************
        2020-09-17 20:33:00.699475 | primary | Thursday 17 September 2020 20:33:00 +0000 (0:00:02.279) 0:07:32.348 ****
        2020-09-17 21:33:44.045462 | primary | changed: [undercloud]

"MYSQL SHUTDOWN/STARTUP/aborted connection"

        * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/undercloud/var/log/containers/mysql/mysqld.log
        * 2020-09-17 20:40:13 0 [Note] /usr/libexec/mysqld: Shutdown complete
        * 2020-09-17 20:45:28 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
        * 2020-09-17 21:02:12 0 [Note] /usr/libexec/mysqld: Shutdown complete
        * 2020-09-17 21:02:32 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
        * 2020-09-17 21:02:33 0 [Note] /usr/libexec/mysqld: ready for connections.
        * 2020-09-17 21:09:50 299 [Warning] Aborted connection 299 to db: 'glance' user: 'glance' host: '192.168.24.1' (Got an error reading communication packets)

"OVS/OVN logs:"

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/undercloud/var/log/containers/openvswitch/ovn-controller.log

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/7504...

did some more digging - leaning more towards this being a neutron issue but still not closer to understanding why.

I used https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/ as an example and trying to line up the timings. After the upgrade starts I see a number of errors in the openvswitch/ovn logs in particular "OVNSB commit failed, force recompute next time" followed by "WARN|tcp:192.168.24.1:6642: connection dropped (Broken pipe)":

"UPGRADE start/end": 20:33:00.699450  ->  21:33:44

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/job-output.txt
        * 2020-09-17 20:33:00.699450 | primary | TASK [standalone-upgrade : Upgrade the standalone] *****************************
        2020-09-17 20:33:00.699475 | primary | Thursday 17 September 2020  20:33:00 +0000 (0:00:02.279)       0:07:32.348 ****
        2020-09-17 21:33:44.045462 | primary | changed: [undercloud]

"MYSQL SHUTDOWN/STARTUP/aborted connection"

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/undercloud/var/log/containers/mysql/mysqld.log
        * 2020-09-17 20:40:13 0 [Note] /usr/libexec/mysqld: Shutdown complete
        * 2020-09-17 20:45:28 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
        * 2020-09-17 21:02:12 0 [Note] /usr/libexec/mysqld: Shutdown complete
        * 2020-09-17 21:02:32 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
        * 2020-09-17 21:02:33 0 [Note] /usr/libexec/mysqld: ready for connections.
        * 2020-09-17 21:09:50 299 [Warning] Aborted connection 299 to db: 'glance' user: 'glance' host: '192.168.24.1' (Got an error reading communication packets)

"OVS/OVN logs:"

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/undercloud/var/log/containers/openvswitch/ovsdb-server-nb.log

* https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/750456/3/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/62c4529/logs/undercloud/var/log/containers/stdouts/ovn_controller.log

Revision history for this message

Michele Baldessari (michele) wrote on 2020-09-18:

#5

Download full text (4.2 KiB)

Here is my analysis of https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/job-output.txt

Timeline:
1) Standalone upgrade starts at 2020-09-15 13:59:31 and completes successfully at 2020-09-15 15:00:42

Note that towards the end of the upgrade we can observe a number of scary messages such as:
2020-09-15 15:00:03.942 8 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status 2003, "Can't connect to MySQL server on '192.168.24.3'

The reason for these error messages is that one of the post-upgrade tasks in tht restarts the ovn-dbs-bundle (Ia7cf78e1f5e46235147bdf67c03b58d774244774) which brings down both VIPs (I expected only one VIPs to go down but apparently both 24.1 and 24.3 get restarted, I will investigate that separately. It does not seem too important just yet)

2) The ovn-dbs restart is in any case fully completed at 15:00:04:
Sep 15 15:00:00 standalone.localdomain pacemaker-execd [325954] (log_finished) info: finished - rsc:ovn-dbs-bundle-podman-0 action:start call_id:100 pid:586688 exit-code:0 exec-time:2309ms queue-time:0ms
Sep 15 15:00:04 standalone.localdomain pacemaker-controld [325957] (process_lrm_event) notice: Result of start operation for ip-192.168.24.1 on standalone: 0 (ok) | call=102 key=ip-192.168.24.1_start_0 confirmed=true cib-update=385
Sep 15 15:00:04 standalone.localdomain pacemaker-controld [325957] (process_lrm_event) notice: Result of start operation for ip-192.168.24.3 on standalone: 0 (ok) | call=103 key=ip-192.168.24.3_start_0 confirmed=true cib-update=387

3) The router for the failing os_tempest ping gets successfully created at:
2020-09-15 15:02:01.467358 | primary | TASK [os_tempest : Create router] **********************************************
2020-09-15 15:02:01.467377 | primary | Tuesday 15 September 2020 15:02:01 +0000 (0:00:02.308) 1:10:27.761 *****
2020-09-15 15:02:04.475258 | primary | ok: [undercloud -> 127.0.0.2]
2020-09-15 15:02:04.504699 | primary |
2020-09-15 15:02:04.504764 | primary | TASK [os_tempest : Get router admin state and ip address] **********************
2020-09-15 15:02:04.504777 | primary | Tuesday 15 September 2020 15:02:04 +0000 (0:00:03.037) 1:10:30.799 *****
2020-09-15 15:02:04.557379 | primary | ok: [undercloud -> 127.0.0.2]

4) The ping itself fails at 15:02:07:
2020-09-15 15:02:07.057448 | primary | TASK [os_tempest : Ping router ip address] *************************************
2020-09-15 15:02:07.057502 | primary | Tuesday 15 September 2020 15:02:07 +0000 (0:00:00.065) 1:10:33.351 *****
2020-09-15 15:02:10.745010 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-15 15:02:24.365896 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).

After the failure during the log collection we do see in the ovn logs that we have a port corresponding to the ip tempest is pinging (192.168.24.122):
router 5e5e16a8-7c81-4aea-a56f-9edbb3343a34 (neutron-28fab0bc-bb0b-4e75-9204-cb19cc28246f) (aka router)
port lrp-9b278e0...

Here is my analysis of https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/job-output.txt

Timeline:
1) Standalone upgrade starts at 2020-09-15 13:59:31 and completes successfully at 2020-09-15 15:00:42

Note that towards the end of the upgrade we can observe a number of scary messages such as:
2020-09-15 15:00:03.942 8 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status 2003, "Can't connect to MySQL server on '192.168.24.3'

The reason for these error messages is that one of the post-upgrade tasks in tht restarts the ovn-dbs-bundle (Ia7cf78e1f5e46235147bdf67c03b58d774244774) which brings down both VIPs (I expected only one VIPs to go down but apparently both 24.1 and 24.3 get restarted, I will investigate that separately. It does not seem too important just yet)

2) The ovn-dbs restart is in any case fully completed at 15:00:04:
Sep 15 15:00:00 standalone.localdomain pacemaker-execd     [325954] (log_finished)      info: finished - rsc:ovn-dbs-bundle-podman-0 action:start call_id:100 pid:586688 exit-code:0 exec-time:2309ms queue-time:0ms                           
Sep 15 15:00:04 standalone.localdomain pacemaker-controld  [325957] (process_lrm_event)         notice: Result of start operation for ip-192.168.24.1 on standalone: 0 (ok) | call=102 key=ip-192.168.24.1_start_0 confirmed=true cib-update=385
Sep 15 15:00:04 standalone.localdomain pacemaker-controld  [325957] (process_lrm_event)         notice: Result of start operation for ip-192.168.24.3 on standalone: 0 (ok) | call=103 key=ip-192.168.24.3_start_0 confirmed=true cib-update=387

3) The router for the failing os_tempest ping gets successfully created at:
2020-09-15 15:02:01.467358 | primary | TASK [os_tempest : Create router] **********************************************
2020-09-15 15:02:01.467377 | primary | Tuesday 15 September 2020  15:02:01 +0000 (0:00:02.308)       1:10:27.761 *****
2020-09-15 15:02:04.475258 | primary | ok: [undercloud -> 127.0.0.2]
2020-09-15 15:02:04.504699 | primary |
2020-09-15 15:02:04.504764 | primary | TASK [os_tempest : Get router admin state and ip address] **********************
2020-09-15 15:02:04.504777 | primary | Tuesday 15 September 2020  15:02:04 +0000 (0:00:03.037)       1:10:30.799 *****
2020-09-15 15:02:04.557379 | primary | ok: [undercloud -> 127.0.0.2]

4) The ping itself fails at 15:02:07:
2020-09-15 15:02:07.057448 | primary | TASK [os_tempest : Ping router ip address] *************************************
2020-09-15 15:02:07.057502 | primary | Tuesday 15 September 2020  15:02:07 +0000 (0:00:00.065)       1:10:33.351 *****
2020-09-15 15:02:10.745010 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-15 15:02:24.365896 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).

After the failure during the log collection we do see in the ovn logs that we have a port corresponding to the ip tempest is pinging (192.168.24.122):
router 5e5e16a8-7c81-4aea-a56f-9edbb3343a34 (neutron-28fab0bc-bb0b-4e75-9204-cb19cc28246f) (aka router)
    port lrp-9b278e05-ae3c-49a9-b9ac-770319bb366e
        mac: "fa:16:3e:fc:fb:b2"
        networks: ["192.168.74.1/28"]
    port lrp-adaa967e-02bf-4207-8cbb-18637f0d7cac
        mac: "fa:16:3e:f7:51:b2"
        networks: ["192.168.24.122/24"]
        gateway chassis: [b4f013e5-31a0-44de-a8e7-a511ac4dc739]
    nat 9770fed2-490f-40d0-9c61-e8a62c7704b1
        external ip: "192.168.24.122"
        logical ip: "192.168.74.0/28"
        type: "snat"

So I think there are two main possibilities here:
A) After Ia7cf78e1f5e46235147bdf67c03b58d774244774 or some unrelated change we need more time for things to settle (this feels a bit less likely)
B) Something else more low-level with the network is going on (I kinda expected to see 192.168.24.122 somewhere in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/extra/network-netns but it might be that with OVN and openvswitch things are a bit different than what I expect them)

Sergii Golovatiuk (sgolovatiuk) on 2020-09-18

Changed in tripleo:
assignee:	nobody → Sergii Golovatiuk (sgolovatiuk)

Revision history for this message

Martin Mágr (mmagr) wrote on 2020-09-21:

#6

The reason why I think I had the same issuse is the fact that when you check the mysqld.log in the job [1] you can see that all cloud services have issues using DB. After upgrade on my env the cloud more or less worked, but apparently there was a network issue, which in my case made redeploy after FFU fail 100% time on the Keystone check step.

[1] https://e453f1d8808c5b6bd184-223d8b88d73ea59070ac36b627fdc3bc.ssl.cf2.rackcdn.com/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ad8724d/logs/undercloud/var/log/containers/mysql/mysqld.log

Revision history for this message

Jakub Libosvar (libosvar) wrote on 2020-09-23:

#7

Sergii provided me an environment and that was extremely helpful, thanks for that. I looked there and I saw the external network subnet is the same as subnet used for the control plane. It means there was a route for the FIP to the br-ctplane instead of br-ex, br-ex was used in the bridge mappings for the given provider network.

When I changed the bridge mappings to use br-ctplane instead, the pings works. However this is not the solution because I think the control plane subnet should differ from the external public subnet. The network configuration should be changed for the external network so this can work properly (separate external and control plane networks).

Sergii Golovatiuk (sgolovatiuk) on 2020-09-23

Changed in tripleo:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)

Alex Schultz (alex-schultz) on 2020-09-23

Changed in tripleo:
assignee:	Alex Schultz (alex-schultz) → nobody

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-09-23:

#8

Download full text (5.9 KiB)

This is a job config problem. The initial installation does not include --control-virtual-ip in it's execution. The normal standalone job has this defined, the upgrade job is missing this as part of the initial deployment. The upgrade execution includes --control-virtual-ip.

Normal standalone trupleo_deploy.sh:
https://0522a0f118ced5ed6a93-4c6fae75ca48c2a9b52e94b381f06ed2.ssl.cf2.rackcdn.com/753546/6/check/tripleo-ci-centos-8-standalone/eabe4ae/logs/undercloud/home/zuul/tripleo_deploy.sh

#!/bin/bash
# This file is managed by ansible
set -xeo pipefail

export DEPLOY_CONTROL_VIP=192.168.24.3
export DEPLOY_DEPLOYMENT_USER=zuul
export DEPLOY_LOCAL_IP=192.168.24.1/24
export DEPLOY_OUTPUT_DIR=/home/zuul
export DEPLOY_ROLES_FILE=/usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml
export DEPLOY_STACK=standalone
export DEPLOY_STANDALONE_ROLE=Standalone
export DEPLOY_TEMPLATES=/usr/share/openstack-tripleo-heat-templates
export DEPLOY_TIMEOUT_ARG=90
openstack tripleo deploy --templates $DEPLOY_TEMPLATES --standalone --yes --output-dir $DEPLOY_OUTPUT_DIR --stack $DEPLOY_STACK --standalone-role $DEPLOY_STANDALONE_ROLE --timeout $DEPLOY_TIMEOUT_ARG -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml -e /home/zuul/containers-prepare-parameters.yaml -e /home/zuul/standalone_parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -r $DEPLOY_ROLES_FILE --deployment-user $DEPLOY_DEPLOYMENT_USER --local-ip $DEPLOY_LOCAL_IP --control-virtual-ip $DEPLOY_CONTROL_VIP >/home/zuul/standalone_deploy.log 2>&1

Upgrade tripleo_deploy.sh:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/home/zuul/tripleo_deploy.sh

#!/bin/bash
# This file is managed by ansible
set -xeo pipefail

export DEPLOY_DEPLOYMENT_USER=zuul
export DEPLOY_LOCAL_IP=192.168.24.1/24
export DEPLOY_OUTPUT_DIR=/home/zuul
export DEPLOY_ROLES_FILE=/usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml
export DEPLOY_STACK=standalone
exp...

This is a job config problem. The initial installation does not include --control-virtual-ip in it's execution.  The normal standalone job has this defined, the upgrade job is missing this as part of the initial deployment. The upgrade execution includes --control-virtual-ip.

Normal standalone trupleo_deploy.sh:
https://0522a0f118ced5ed6a93-4c6fae75ca48c2a9b52e94b381f06ed2.ssl.cf2.rackcdn.com/753546/6/check/tripleo-ci-centos-8-standalone/eabe4ae/logs/undercloud/home/zuul/tripleo_deploy.sh

#!/bin/bash                                                                     
# This file is managed by ansible                                               
set -xeo pipefail                                                               
                                                                                
export DEPLOY_CONTROL_VIP=192.168.24.3                                          
export DEPLOY_DEPLOYMENT_USER=zuul                                              
export DEPLOY_LOCAL_IP=192.168.24.1/24                                          
export DEPLOY_OUTPUT_DIR=/home/zuul                                             
export DEPLOY_ROLES_FILE=/usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml
export DEPLOY_STACK=standalone                                                  
export DEPLOY_STANDALONE_ROLE=Standalone                                        
export DEPLOY_TEMPLATES=/usr/share/openstack-tripleo-heat-templates             
export DEPLOY_TIMEOUT_ARG=90                                                    
openstack tripleo deploy  --templates $DEPLOY_TEMPLATES --standalone  --yes --output-dir $DEPLOY_OUTPUT_DIR  --stack $DEPLOY_STACK --standalone-role $DEPLOY_STANDALONE_ROLE --timeout $DEPLOY_TIMEOUT_ARG -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml -e /home/zuul/containers-prepare-parameters.yaml -e /home/zuul/standalone_parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -r $DEPLOY_ROLES_FILE     --deployment-user $DEPLOY_DEPLOYMENT_USER  --local-ip $DEPLOY_LOCAL_IP --control-virtual-ip $DEPLOY_CONTROL_VIP         >/home/zuul/standalone_deploy.log 2>&1

Upgrade tripleo_deploy.sh:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/home/zuul/tripleo_deploy.sh

#!/bin/bash                                                                     
# This file is managed by ansible                                               
set -xeo pipefail                                                               
                                                                                
export DEPLOY_DEPLOYMENT_USER=zuul                                              
export DEPLOY_LOCAL_IP=192.168.24.1/24                                          
export DEPLOY_OUTPUT_DIR=/home/zuul                                             
export DEPLOY_ROLES_FILE=/usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml
export DEPLOY_STACK=standalone                                                  
export DEPLOY_STANDALONE_ROLE=Standalone                                        
export DEPLOY_TEMPLATES=/usr/share/openstack-tripleo-heat-templates             
export DEPLOY_TIMEOUT_ARG=90                                                    
openstack tripleo deploy  --templates $DEPLOY_TEMPLATES --standalone  --yes --output-dir $DEPLOY_OUTPUT_DIR  --stack $DEPLOY_STACK --standalone-role $DEPLOY_STANDALONE_ROLE --timeout $DEPLOY_TIMEOUT_ARG -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml -e /home/zuul/containers-prepare-parameters.yaml -e /home/zuul/standalone_parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml -r $DEPLOY_ROLES_FILE     --deployment-user $DEPLOY_DEPLOYMENT_USER  --local-ip $DEPLOY_LOCAL_IP          >/home/zuul/standalone_deploy.log 2>&1

Upgrade standalone_upgrade.sh:
PROMPT_ANSWER=""                                                                
if openstack tripleo upgrade --help | grep -qe "--yes"; then                    
    PROMPT_ANSWER="--yes"                                                       
fi                                                                              
                                                                                
sudo yum update -y \                                                            
    python-tripleoclient \                                                      
    openstack-tripleo-common \                                                  
    openstack-tripleo-heat-templates                                            
                                                                                
sudo openstack tripleo upgrade $PROMPT_ANSWER \                                 
  --templates \                                                                 
  --local-ip=192.168.24.1/24 \                                                  
  --control-virtual-ip=192.168.24.3 \                                           
  -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml \
  -r /usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml \        
  -e "/home/zuul/containers-prepare-parameters-upgrade.yaml" \                  
  -e "/home/zuul/standalone_parameters_upgrade.yaml" \                          
  -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
  --output-dir /home/zuul \                                                     
  --standalone

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-09-23:

#9

Likely caused by https://review.opendev.org/#/c/725782/ since the last successful run of this was on 8/18 when that patch was merged. The issue is likely that on train we're not pacemaker enabled and on ussuri we are. So the upgrade from non-pacemaker to pacemaker is failing.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-23: Fix proposed to tripleo-quickstart (master)

#10

Fix proposed to branch: master
Review: https://review.opendev.org/753817

Changed in tripleo:
assignee:	nobody → Alex Schultz (alex-schultz)
status:	Triaged → In Progress

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-24:

#11

@Alex

just commented @ https://review.opendev.org/#/c/753817/2/config/general_config/featureset056.yml but we are already be doing this with https://review.opendev.org/#/c/742418/13/zuul.d/standalone-jobs.yaml

https://opendev.org/openstack/tripleo-ci/src/commit/019f2fe1844d94215224104e8c6af88faa715753/zuul.d/standalone-jobs.yaml#L150-L153

it was for https://bugs.launchpad.net/tripleo/+bug/1887159

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-24:

#12

OK after more discussion just now on irc with @Sergii... so the fix from comment #11 makes the *deployment* have HA but then https://review.opendev.org/#/c/753817/ will add the docker-ha for the upgrade commands too.

so we need both. I added /#/c/753817/ on the test at https://review.opendev.org/739457 let's see if we get a green run

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-24:

#13

per comment #12, unfortunately test still fails @

* https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_cbd/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/cbdd981/job-output.txt
* 2020-09-24 14:22:39.653261 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-25: Fix merged to tripleo-quickstart (master)

#14

Reviewed: https://review.opendev.org/753817
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=e8567839a7f144cb00be141a176f50cacb48877e
Submitter: Zuul
Branch: master

commit e8567839a7f144cb00be141a176f50cacb48877e
Author: Alex Schultz <email address hidden>
Date: Wed Sep 23 12:26:39 2020 -0600

Enable HA always for upgrade job

    In Ussuri we enabled pacemaker by default so when we landed a change[0]
    in quickstart to handle this logic, it broke the upgrade job because
    the ussuri job uses train initially and gets a non-ha standalone which
    it tries to upgrade to HA. This results in an incorrect network config.
    Since really the exepectation is that we'd always be upgrade HA to HA,
    let's test that instead.

[0] https://review.opendev.org/#/c/725782/
Closes-Bug: #1895822

Change-Id: I78c1a0cf68534e574b14ad505404139d93983324

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

Rafael Folco (rafaelfolco) wrote on 2020-09-28:

#15

Apparently the issue hasn't been fixed yet:

https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_177/754366/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/1775d17/job-output.txt

2020-09-28 06:11:57.107266 | primary | TASK [os_tempest : Ping router ip address] *************************************
2020-09-28 06:11:57.107329 | primary | Monday 28 September 2020 06:11:57 +0000 (0:00:00.081) 1:08:12.313 ******
2020-09-28 06:12:00.758247 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-28 06:12:14.388017 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-09-28 06:12:28.019650 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-09-28 06:12:41.588559 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-09-28 06:12:55.219646 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-09-28 06:13:08.863241 | primary | fatal: [undercloud]: FAILED! => {
2020-09-28 06:13:08.863322 | primary | "attempts": 5,
2020-09-28 06:13:08.863351 | primary | "changed": true,
2020-09-28 06:13:08.863375 | primary | "cmd": "set -e\nping -c2 \"192.168.24.129\"\n",
2020-09-28 06:13:08.863417 | primary | "delta": "0:00:03.101290",
2020-09-28 06:13:08.863443 | primary | "end": "2020-09-28 06:13:08.821880",
2020-09-28 06:13:08.863469 | primary | "rc": 1,
2020-09-28 06:13:08.863492 | primary | "start": "2020-09-28 06:13:05.720590"
2020-09-28 06:13:08.863516 | primary | }
2020-09-28 06:13:08.863540 | primary |
2020-09-28 06:13:08.863564 | primary | STDOUT:
2020-09-28 06:13:08.863605 | primary |
2020-09-28 06:13:08.863630 | primary | PING 192.168.24.129 (192.168.24.129) 56(84) bytes of data.
2020-09-28 06:13:08.863655 | primary | From 192.168.24.1 icmp_seq=1 Destination Host Unreachable
2020-09-28 06:13:08.863678 | primary | From 192.168.24.1 icmp_seq=2 Destination Host Unreachable

Apparently the issue hasn't been fixed yet:

https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_177/754366/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/1775d17/job-output.txt

2020-09-28 06:11:57.107266 | primary | TASK [os_tempest : Ping router ip address] *************************************
2020-09-28 06:11:57.107329 | primary | Monday 28 September 2020  06:11:57 +0000 (0:00:00.081)       1:08:12.313 ******
2020-09-28 06:12:00.758247 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-28 06:12:14.388017 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-09-28 06:12:28.019650 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-09-28 06:12:41.588559 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-09-28 06:12:55.219646 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-09-28 06:13:08.863241 | primary | fatal: [undercloud]: FAILED! => {
2020-09-28 06:13:08.863322 | primary |     "attempts": 5,
2020-09-28 06:13:08.863351 | primary |     "changed": true,
2020-09-28 06:13:08.863375 | primary |     "cmd": "set -e\nping -c2 \"192.168.24.129\"\n",
2020-09-28 06:13:08.863417 | primary |     "delta": "0:00:03.101290",
2020-09-28 06:13:08.863443 | primary |     "end": "2020-09-28 06:13:08.821880",
2020-09-28 06:13:08.863469 | primary |     "rc": 1,
2020-09-28 06:13:08.863492 | primary |     "start": "2020-09-28 06:13:05.720590"
2020-09-28 06:13:08.863516 | primary | }
2020-09-28 06:13:08.863540 | primary |
2020-09-28 06:13:08.863564 | primary | STDOUT:
2020-09-28 06:13:08.863605 | primary |
2020-09-28 06:13:08.863630 | primary | PING 192.168.24.129 (192.168.24.129) 56(84) bytes of data.
2020-09-28 06:13:08.863655 | primary | From 192.168.24.1 icmp_seq=1 Destination Host Unreachable
2020-09-28 06:13:08.863678 | primary | From 192.168.24.1 icmp_seq=2 Destination Host Unreachable

Rafael Folco (rafaelfolco) on 2020-09-28

Changed in tripleo:
status:	Fix Released → Triaged

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2020-09-28:

#16

Download full text (8.6 KiB)

If we look at job closely it fails on https://opendev.org/openstack/openstack-ansible-os_tempest/src/branch/master/tasks/tempest_resources.yml#L283-L291

If we look at interfaces before upgrade they will be as

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
       valid_lft 3147sec preferred_lft 3147sec
    inet6 fe80::5054:ff:feef:f4eb/64 scope link
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:25:a0:f8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.229/24 brd 192.168.122.255 scope global dynamic noprefixroute ens4
       valid_lft 3271sec preferred_lft 3271sec
    inet6 fe80::5054:ff:fe25:a0f8/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:c0:26:55:35:d3 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.2/24 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:6f:46:31:8f:42 brd ff:ff:ff:ff:ff:ff

After upgrade run they will be as

: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
       valid_lft 2369sec preferred_lft 2369sec
    inet6 fe80::5054:ff:feef:f4eb/64 scope link
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 q...

If we look at job closely it fails on  https://opendev.org/openstack/openstack-ansible-os_tempest/src/branch/master/tasks/tempest_resources.yml#L283-L291

If we look at interfaces before upgrade they will be as

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
       valid_lft 3147sec preferred_lft 3147sec
    inet6 fe80::5054:ff:feef:f4eb/64 scope link
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:25:a0:f8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.229/24 brd 192.168.122.255 scope global dynamic noprefixroute ens4
       valid_lft 3271sec preferred_lft 3271sec
    inet6 fe80::5054:ff:fe25:a0f8/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:c0:26:55:35:d3 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.2/24 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:6f:46:31:8f:42 brd ff:ff:ff:ff:ff:ff

After upgrade run they will be as

: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
       valid_lft 2369sec preferred_lft 2369sec
    inet6 fe80::5054:ff:feef:f4eb/64 scope link
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:25:a0:f8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.229/24 brd 192.168.122.255 scope global dynamic noprefixroute ens4
       valid_lft 2358sec preferred_lft 2358sec
    inet6 fe80::5054:ff:fe25:a0f8/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:c0:26:55:35:d3 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:6f:46:31:8f:42 brd ff:ff:ff:ff:ff:ff
46: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.1/32 brd 192.168.24.1 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::85:daff:fe68:e54e/64 scope link
       valid_lft forever preferred_lft forever

So ips on interfaces are different. So, when I made interfaces in the same way as they were before upgrade ping test passed.

If we look at deployment script it looks like

tripleo_deploy.sh

---%<---
#!/bin/bash
# This file is managed by ansible
set -xeo pipefail

export DEPLOY_CONTROL_VIP=192.168.24.3
export DEPLOY_DEPLOYMENT_USER=zuul
export DEPLOY_LOCAL_IP=192.168.24.1/24
export DEPLOY_OUTPUT_DIR=/home/zuul
export DEPLOY_ROLES_FILE=/usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml
export DEPLOY_STACK=standalone
export DEPLOY_STANDALONE_ROLE=Standalone
export DEPLOY_TEMPLATES=/usr/share/openstack-tripleo-heat-templates
export DEPLOY_TIMEOUT_ARG=90
openstack tripleo deploy \
   --templates $DEPLOY_TEMPLATES \
   --standalone \
   --yes \
   --output-dir $DEPLOY_OUTPUT_DIR \
   --stack $DEPLOY_STACK \
   --standalone-role $DEPLOY_STANDALONE_ROLE \
   --timeout $DEPLOY_TIMEOUT_ARG \
   -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml \
   -e /home/zuul/containers-prepare-parameters.yaml \
   -e /home/zuul/standalone_parameters.yaml \
   -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml \
   -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
   -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
   -r $DEPLOY_ROLES_FILE \
   --deployment-user $DEPLOY_DEPLOYMENT_USER \
   --local-ip $DEPLOY_LOCAL_IP \
   --control-virtual-ip $DEPLOY_CONTROL_VIP > /home/zuul/standalone_deploy.log 2>&1
---%<---

which comes from https://github.com/openstack/tripleo-operator-ansible/blob/master/plugins/modules/tripleo_shell_script.py

However, if we look standalone-upgrade.sh it looks like

PROMPT_ANSWER=""
if openstack tripleo upgrade --help | grep -qe "--yes"; then
    PROMPT_ANSWER="--yes"
fi

sudo yum update -y \
    python-tripleoclient \
    openstack-tripleo-common \
    openstack-tripleo-heat-templates

sudo openstack tripleo upgrade $PROMPT_ANSWER \
  --templates \
  --local-ip=192.168.24.1/24 \
  --control-virtual-ip=192.168.24.3 \
  -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml \
  -r /usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml \
  -e "/home/zuul/containers-prepare-parameters-upgrade.yaml" \
  -e "/home/zuul/standalone_parameters_upgrade.yaml" \
  -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
  --output-dir /home/zuul \
  --standalone

and it comes from quickstart-extras https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/standalone-upgrade/tasks/main.yml#L27-L38
and generates the following upgrade script

cat standalone-upgrade.sh
PROMPT_ANSWER=""
if openstack tripleo upgrade --help | grep -qe "--yes"; then
    PROMPT_ANSWER="--yes"
fi

sudo yum update -y \
    python-tripleoclient \
    openstack-tripleo-common \
    openstack-tripleo-heat-templates

sudo openstack tripleo upgrade $PROMPT_ANSWER \
  --templates \
  --local-ip=192.168.24.1/24 \
  --control-virtual-ip=192.168.24.3 \
  -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml \
  -r /usr/share/openstack-tripleo-heat-templates/roles/Standalone.yaml \
  -e "/home/zuul/containers-prepare-parameters-upgrade.yaml" \
  -e "/home/zuul/standalone_parameters_upgrade.yaml" \
  -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
  --output-dir /home/zuul \
  --standalone

So, as we can see deployment and upgrade scripts are different. So, as experiment I made the upgrade script from deployment just sed -i -e 's/openstack\ tripleo\ deploy/openstack\ tripleo\ upgrade/' tripleo_deploy.sh

and ran it the upgrade passed, the interfaces were before and after upgrade were identical and ping test passed.

Revision history for this message

wes hayutin (weshayutin) wrote on 2020-09-28:

#17

tripleo upgrade operator https://opendev.org/openstack/tripleo-operator-ansible/src/branch/master/roles/tripleo_overcloud_upgrade_run/tasks/main.yml

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-29:

#18

digging some more today - per comment https://bugs.launchpad.net/tripleo/+bug/1895822/comments/8 - there *is* a --control-virtual-ip being passed in both cases afaics

per https://bugs.launchpad.net/tripleo/+bug/1895822/comments/16 as just discussed with sergii on freenode #oooq - I think the env files are the same being passed in both cases?

I agree there is something off about the network config going train->ussuri. We are not getting this bug for ussuri-master jobs - example at [1] which has the 'Ping router ip address' task executed twice (after deploy & after upgrade) without issue.

We changed something train -> ussuri - it might still be related to the switch to 'ha by default' [2].

I've been sanity checking deploy vs upgrade network config but haven't spotted something yet eg [3] vs [4] main diff is

{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/24"}],

vs

{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/24"}, {"ip_netmask": "192.168.24.3/32"}, {"ip_netmask": "192.168.24.1/32"}],

[1] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e30/752419/12/check/tripleo-ci-centos-8-standalone-upgrade/e30bfb8/job-output.txt
[2] https://review.opendev.org/#/c/359060/
[3] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/home/zuul/standalone-ansible-i8z8idi8/Standalone/NetworkConfig
[4] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/home/zuul/standalone-ansible-v1_y03ip/Standalone/NetworkConfig

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-29:

#19

really not sure about this yet but ...

I started looking for things that are in Ussuri but no Train... I compared that

* https://opendev.org/openstack/tripleo-heat-templates/commits/branch/stable/ussuri/environments/standalone/standalone-tripleo.yaml
* https://opendev.org/openstack/tripleo-heat-templates/commits/branch/stable/train/environments/standalone/standalone-tripleo.yaml

in particular this commit seems interesting and missing in train: https://opendev.org/openstack/tripleo-heat-templates/commit/c712355e4bae4ef2fc1b83e5603c0364dbd50a78 * Deprecate Keepalived service https://review.opendev.org/#/c/657067/

I just cherry-picked it to train
https://review.opendev.org/755059 for testing/sanity but as i said... not sure that's it yet.

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-09-29:

#20

this bit in particular is what caught my eye for comment #19 https://review.opendev.org/#/c/657067/49/net-config-standalone.j2.yaml

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-10-01:

#21

After the upgrade, the resulting network configuration has no br-ex [1] only br-ctlplane. This may explain what jakub saw and commented at comment #7 above.
Not sure if this is *why* there is no br-ex but I noticed that the os-net-config data is different in deployment [2] vs upgrade [3]. In particular the upgrade os-net-config config.json has the extra control_virtual_ip ("192.168.24.3/32") and public_virtual_ip passed in ("192.168.24.1/32"). I think the patch I attempted to cherrypick to Train at [4] would add those but I don't know if that's the reason or if it's somehow related.

I also see these 'martian source' br-ex messages in the journal [5] which make sense if we are removing br-ex during the upgrade

Sep 28 16:41:44 standalone.localdomain kernel: IPv4: martian source 192.168.24.119 from 192.168.24.119, on dev br-ex

Am hoping to point some of the DF folks at this message perhaps it will help get us closer to the issue - grateful for any thoughts here thank you.

[1] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/var/log/extra/network.txt
[2] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/etc/os-net-config/config.json.2020-09-28T16%3A05%3A24
[3] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/etc/os-net-config/config.json
[4] https://review.opendev.org/#/c/755059/1/net-config-standalone.j2.yaml
[5] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/var/log/extra/journal.txt

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-01:

#22

There is a br-ex in the network config.

https://799a0e5cbc4205600635-877a0f70cfb612990952d1399a198d7a.ssl.cf5.rackcdn.com/750595/1/check/tripleo-ci-centos-8-standalone-upgrade/2ecbaf6/logs/undercloud/etc/os-net-config/config.json

You'd want to check the NetworkConfig when comparing but br-ex is being configured in both.

For train we have:

https://799a0e5cbc4205600635-877a0f70cfb612990952d1399a198d7a.ssl.cf5.rackcdn.com/750595/1/check/tripleo-ci-centos-8-standalone-upgrade/2ecbaf6/logs/undercloud/home/zuul/standalone-ansible-s__he90q/Standalone/standalone/NetworkConfig

br-ex should get added via:
sed -i "s/: \"bridge_name/: \"${bridge_name:-''}/g" /etc/os-net-config/config.json
sed -i "s/interface_name/${interface_name:-''}/g" /etc/os-net-config/config.json

These values get invoked with br-ex from:

tripleo_network_config_bridge_name: "{{ neutron_physical_bridge_name }}"
tripleo_network_config_interface_name: "{{ neutron_public_interface_name }}"

This is defined in the core playbook. https://799a0e5cbc4205600635-877a0f70cfb612990952d1399a198d7a.ssl.cf5.rackcdn.com/750595/1/check/tripleo-ci-centos-8-standalone-upgrade/2ecbaf6/logs/undercloud/home/zuul/standalone-ansible-s__he90q/deploy_steps_playbook.yaml

In ussuri we don't have that stuff anymore because we backported the network config generation so it doesn't rely on that anymore.

https://799a0e5cbc4205600635-877a0f70cfb612990952d1399a198d7a.ssl.cf5.rackcdn.com/750595/1/check/tripleo-ci-centos-8-standalone-upgrade/2ecbaf6/logs/undercloud/home/zuul/standalone-ansible-oewpzx4m/Standalone/NetworkConfig

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-01: Related fix proposed to tripleo-quickstart-extras (master)

#23

Related fix proposed to branch: master
Review: https://review.opendev.org/755607

Revision history for this message

Marios Andreou (marios-b) wrote on 2020-10-02:

#24

@Alex per comment #22 agree, it *is* there in the config - if you see my comment #21 [2] and [3] repeated here are pointing to the os-net-config for deployment [2] and upgrade [3] and indeed both have the br-ex.

However the resulting configuration *on the node* is missing the br-ex e.g. looking at [1] no mention of br-ex. Compare that with [4] from a standalone job you can see the br-ex configured and with has routes for it too

[1] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/var/log/extra/network.txt
[2] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/etc/os-net-config/config.json.2020-09-28T16%3A05%3A24
[3] https://ac0934616631b7150b73-eba45f55476b984e1cae6ece42d21924.ssl.cf2.rackcdn.com/739457/30/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/3933f4c/logs/undercloud/etc/os-net-config/config.json
[4] https://799a0e5cbc4205600635-877a0f70cfb612990952d1399a198d7a.ssl.cf5.rackcdn.com/750595/1/check/tripleo-ci-centos-8-standalone-upgrade/2ecbaf6/logs/undercloud/var/log/extra/network.txt

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-02:

#25

It's there in [1], just under ipv6 because there is no ipv4 address associated with it.

5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000

The network config just seems wrong. Per the tripleo-docs we always use br-ctlplane instead of br-ex but I'm trying to remember why that's configured that way. In upstream CI br-ex was always already configured via the undercloud-setup bits for multinode so I'm wondering if we're just running into poor CI setup vs there actually being a problem

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-07:

#26

Ok so in Train, the os-net-config network config only includes the ctlplane/24 for the address. In ussuri this changed to also include the ctlplane and public vips as /32 because of the deprecation of keepalived.

Because ctlplane_ip is listed twice with a /24 and a /32 we get both. I'm trying to track down what changed (or if this has always been broken). It seems like we need to exclude a vip address if it's already the ctlplane_ip.

It looks like in train, we only added the ctlplane vip as a vip and not the public vip like we do in ussuri.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Fix proposed to tripleo-heat-templates (master)

#27

Fix proposed to branch: master
Review: https://review.opendev.org/756521

Changed in tripleo:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Change abandoned on tripleo-quickstart-extras (master)

#28

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/755607
Reason: https://review.opendev.org/#/c/756521 is likely the correct fix. we can keep using br-ex, we just need to configure it correctly

Bogdan Dobrelya (bogdando) on 2020-10-07

tags:

added: train-backport-potential ussuri-backport-potential

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-07:

#29

shouldn't affect train, it was a change in ussuri to get rid of keepalived. Unless we backport that, this is not needed

tags:

removed: train-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Fix proposed to tripleo-heat-templates (stable/ussuri)

#30

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/756532

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Fix proposed to python-tripleoclient (master)

#31

Fix proposed to branch: master
Review: https://review.opendev.org/756562

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Change abandoned on tripleo-heat-templates (master)

#32

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/756521
Reason: https://review.opendev.org/#/c/756562/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Change abandoned on tripleo-heat-templates (stable/ussuri)

#33

Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/756532

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Fix proposed to python-tripleoclient (stable/ussuri)

#34

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/756563

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Related fix proposed to tripleo-heat-templates (master)

#35

Related fix proposed to branch: master
Review: https://review.opendev.org/756577

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-07: Related fix proposed to tripleo-heat-templates (stable/ussuri)

#36

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/756579

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-08: Related fix proposed to tripleo-heat-templates (master)

#37

Related fix proposed to branch: master
Review: https://review.opendev.org/756706

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-08: Related fix proposed to tripleo-heat-templates (stable/ussuri)

#38

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/756707

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-08: Related fix proposed to tripleo-ansible (master)

#39

Related fix proposed to branch: master
Review: https://review.opendev.org/756715

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-08:

#40

Ok so I think it's the bridgemap & network config. Doing the upgrade using the documentation with the following configuration works fine:

  NeutronPublicInterface: eth1
  NeutronBridgeMappings: "datacentre:br-ctlplane"
  NeutronPhysicalBridge: br-ctlplane

In CI, we only configure NeutronPublicInterface: br-ex. NuetronPhysicalBridge is br-ex by default. So it appears that the br-ctlplane interface that we use with br-ex is not being properly connected.

I think what happens is we have:

br-ex -> br-ctlplane (with ips)
^- neutron

So routing gets weird. I think we need

br-ex -> br-ctlplane (with ips)
^- neutron

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-09: Related fix proposed to tripleo-quickstart-extras (master)

#41

Related fix proposed to branch: master
Review: https://review.opendev.org/757119

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12: Change abandoned on tripleo-heat-templates (master)

#42

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/756577
Reason: we need this defined because the framework assumes there is always a control_virtual_ip

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12:

#43

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/756706
Reason: we need this defined because the framework assumes there is always a control_virtual_ip

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12: Change abandoned on tripleo-heat-templates (stable/ussuri)

#44

Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/756707

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12:

#45

Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/756579

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12: Change abandoned on tripleo-quickstart-extras (master)

#46

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/757119
Reason: we don't need to change ips if we can change the br-ex allocation

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12: Change abandoned on python-tripleoclient (master)

#47

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/756562
Reason: This actually doesn't cause a problem (TM)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-12: Change abandoned on python-tripleoclient (stable/ussuri)

#48

Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/756563

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-13: Related fix proposed to tripleo-quickstart-extras (master)

#49

Related fix proposed to branch: master
Review: https://review.opendev.org/757756

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-13: Related fix proposed to tripleo-heat-templates (master)

#50

Related fix proposed to branch: master
Review: https://review.opendev.org/757900

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-13: Change abandoned on tripleo-ansible (master)

#51

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/756715

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-15: Related fix merged to tripleo-heat-templates (master)

#52

Reviewed: https://review.opendev.org/757900
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c86fa4fb4ee6f4bf86252ca3e1fde25fa55c3677
Submitter: Zuul
Branch: master

commit c86fa4fb4ee6f4bf86252ca3e1fde25fa55c3677
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600

Don't manage bridge mappings in scenario file

    The bridge mappings should be managed in the standalone parameters. This
    bridge mapping prevents us from being able to change the datacentre
    mapping in CI.

Change-Id: I6b5b9db75a11c2347720258a39b03aa28702dbf1
Related-Bug: #1895822

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-15: Related fix merged to tripleo-quickstart-extras (master)

#53

Reviewed: https://review.opendev.org/757756
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=eaf976ae0e1795b47b372ed23125f35a86b518f0
Submitter: Zuul
Branch: master

commit eaf976ae0e1795b47b372ed23125f35a86b518f0
Author: yatinkarel <email address hidden>
Date: Tue Oct 13 13:55:13 2020 +0530

Handle migration of br-ex network

[1] changing br-ex network from 192.168.24 to 172.16.1
for standalone jobs.

    In order to allow the migration need to adjust
    tempest configuration. This can be reverted once
    [1] and [2] lands.

[1] https://review.opendev.org/#/c/757605
[2] https://review.opendev.org/#/c/755607

Related-Bug: #1895822
Change-Id: I1865db911661092debb133fd2638c9a6a9bd2e47

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-20:

#54

Reviewed: https://review.opendev.org/755607
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=fa1bd4ad28d4e7aaec7c2cde29603118d840247a
Submitter: Zuul
Branch: master

commit fa1bd4ad28d4e7aaec7c2cde29603118d840247a
Author: Alex Schultz <email address hidden>
Date: Thu Oct 1 12:03:33 2020 -0600

Standalone configure neutron bridge correctly

    This change updates the default bridge mapping from datacentre:br-ex to
    datacentre:br-ctlplane. We're doing this because in the standalone in
    CI, we configure a br-ex before running the standalone (via
    undercloud-setup) and want to attach our br-ctlplane to it. We then want
    to ensure that we use br-ctlplane for the neutron access to the external
    network to prevent weird routing issues when we have two bridges on the
    same subnet.

    Depends-On: https://review.opendev.org/#/c/757605/
    Change-Id: I0e5aa3f58746dc0b92bd35ade7792f323b5647f7
    Related-Bug: #1895822

Revision history for this message

Sandeep Yadav (sandeepyadav93) wrote on 2020-10-22:

#55

Hello Alex,

I know you are already working on sc12 along with this bug, but just in case:-

tripleo-ci-centos-8-scenario012-standalone is now also failing at tempest for ussuri/train for both check and periodic jobs with same error:-

~~~
2020-10-22 07:40:54.215940 | primary | TASK [os_tempest : Ping router ip address] *************************************
2020-10-22 07:40:54.215980 | primary | Thursday 22 October 2020 07:40:54 +0000 (0:00:00.111) 0:46:43.145 ******
2020-10-22 07:40:58.032949 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-10-22 07:41:11.729033 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-10-22 07:41:25.488429 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-10-22 07:41:39.185617 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-10-22 07:41:52.881059 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-10-22 07:42:06.576933 | primary | fatal: [undercloud]: FAILED! => {
2020-10-22 07:42:06.577289 | primary | "attempts": 5,
2020-10-22 07:42:06.577361 | primary | "changed": true,
2020-10-22 07:42:06.577407 | primary | "cmd": "set -e\nping -c2 \"192.168.24.120\"\n",
2020-10-22 07:42:06.577449 | primary | "delta": "0:00:03.066736",
2020-10-22 07:42:06.577492 | primary | "end": "2020-10-22 07:42:06.538032",
2020-10-22 07:42:06.577534 | primary | "rc": 1,
2020-10-22 07:42:06.577573 | primary | "start": "2020-10-22 07:42:03.471296"
2020-10-22 07:42:06.577612 | primary | }
~~~

Logs:- https://zuul.openstack.org/build/83960e41edca4d64b9d43cf8f168a08a/log/job-output.txt

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-22: Related fix proposed to tripleo-heat-templates (stable/ussuri)

#56

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/759295

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-22: Related fix proposed to tripleo-heat-templates (stable/train)

#57

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/759296

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2020-10-22:

#58

cherry picked the CI change back for scenario012 https://review.opendev.org/759295 https://review.opendev.org/759296

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-23: Related fix merged to tripleo-heat-templates (stable/train)

#59

Reviewed: https://review.opendev.org/759296
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f07ca38a82c659988d4630f9e5ce87da474a861d
Submitter: Zuul
Branch: stable/train

commit f07ca38a82c659988d4630f9e5ce87da474a861d
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600

Don't manage bridge mappings in scenario file

    The bridge mappings should be managed in the standalone parameters. This
    bridge mapping prevents us from being able to change the datacentre
    mapping in CI.

    Change-Id: I6b5b9db75a11c2347720258a39b03aa28702dbf1
    Related-Bug: #1895822
    (cherry picked from commit c86fa4fb4ee6f4bf86252ca3e1fde25fa55c3677)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-24: Related fix merged to tripleo-heat-templates (stable/ussuri)

#60

Reviewed: https://review.opendev.org/759295
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=805fe6e4196306883a66c0d1e3adc8a43d9702d2
Submitter: Zuul
Branch: stable/ussuri

commit 805fe6e4196306883a66c0d1e3adc8a43d9702d2
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600

Don't manage bridge mappings in scenario file

    The bridge mappings should be managed in the standalone parameters. This
    bridge mapping prevents us from being able to change the datacentre
    mapping in CI.

    Change-Id: I6b5b9db75a11c2347720258a39b03aa28702dbf1
    Related-Bug: #1895822
    (cherry picked from commit c86fa4fb4ee6f4bf86252ca3e1fde25fa55c3677)

tags:

added: in-stable-ussuri

Marios Andreou (marios-b) on 2020-11-03

Changed in tripleo:
milestone:	victoria-3 → wallaby-1

Marios Andreou (marios-b) on 2020-12-08

Changed in tripleo:
milestone:	wallaby-1 → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Marios Andreou (marios-b) on 2021-03-17

Changed in tripleo:
milestone:	wallaby-3 → wallaby-rc1

Marios Andreou (marios-b) on 2021-05-06

Changed in tripleo:
milestone:	wallaby-rc1 → xena-1

Marios Andreou (marios-b) on 2021-06-22

Changed in tripleo:
milestone:	xena-1 → xena-2

Marios Andreou (marios-b) on 2021-07-21

Changed in tripleo:
milestone:	xena-2 → xena-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-08-10: Change abandoned on tripleo-ci (master)

#61

Change abandoned by "Alex Schultz <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/757895

Revision history for this message

Rabi Mishra (rabi) wrote on 2021-08-20:

#62

Looks like the issue is back?

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d11/805139/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/d119128/logs/quickstart_install.log

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-10-20:

#63

Closing this out ... https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri is passing per the last comment

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-03-28: Related fix proposed to tripleo-quickstart-extras (master)

#64

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/878747

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-04-17: Related fix merged to tripleo-quickstart-extras (master)

#65

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/878747
Committed: https://opendev.org/openstack/tripleo-quickstart-extras/commit/6887104038ba5d0152c77abe4a38e98d72693b89
Submitter: "Zuul (22348)"
Branch: master

commit 6887104038ba5d0152c77abe4a38e98d72693b89
Author: yatinkarel <email address hidden>
Date: Tue Mar 28 14:03:21 2023 +0530

Fix neutron_bridge_mappings default for standalone

    br-tenant is not created as part of standalone
    deployments but bridge_mapping was refererring
    it. This results into unnecessary Warning[1] in
    ovn-controller logs, this patch fixes it.

[1] Bridge 'br-tenant' not found for network 'tenant'

Related-Bug: #1895822
Change-Id: I9b23d6842cd518971b325ffd29b51d171c353b4f

tripleo

centos8 standalone-upgrade-ussuri fails tempest ping router IP

Bug Description

Other bug subscribers

Remote bug watches