centos8 standalone-upgrade-ussuri fails tempest ping router IP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| tripleo |
Critical
|
Alex Schultz |
Bug Description
At [1] the tripleo-
2020-09-15 15:02:07.057448 | primary | TASK [os_tempest : Ping router ip address] *******
2020-09-15 15:02:07.057502 | primary | Tuesday 15 September 2020 15:02:07 +0000 (0:00:00.065) 1:10:33.351 *****
2020-09-15 15:02:10.745010 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-15 15:02:24.365896 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-09-15 15:02:38.005903 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-09-15 15:02:51.638488 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-09-15 15:03:05.266932 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-09-15 15:03:18.902046 | primary | fatal: [undercloud]: FAILED! => {
2020-09-15 15:03:18.902122 | primary | "attempts": 5,
Poking a little further I see a few errors related to network and mysql however I am not sure which is the original/root cause.
At [4] neutron and [5] keystone containers log we see the following many times:
2020-09-11 07:41:16.044 139 ERROR oslo_db.
At [6] ovn_controller.log there is the following repeated many times:
At [7] container-
At [8] pacemaker log we have the following a few times:
Sep 15 14:12:32 standalone.
Sep 15 14:12:32 standalone.
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/
[6] https:/
[7] https:/
[8] https:/
tags: | added: promotion-blocker |
Marios Andreou (marios-b) wrote : | #1 |
Martin Mágr (mmagr) wrote : | #2 |
I'm facing the same issue with OVS based networking:
Deploy fail:
020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | TASK [tripleo-
2020-09-16 10:52:58,277 p=83091 u=mistral n=ansible | Wednesday 16 September 2020 10:52:58 -0400 (0:00:07.246) 0:28:33.359 ***
2020-09-16 10:52:58,899 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,255 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:52:59,611 p=83091 u=mistral n=ansible | failed: [undercloud] (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,016 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,271 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,677 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:00,981 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,338 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:01,695 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,049 p=83091 u=mistral n=ansible | ok: [undercloud] => (item=None) => {"attempts": 1, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,050 p=83091 u=mistral n=ansible | fatal: [undercloud]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-09-16 10:53:02,055 p=83091 u=mistral n=ansible | NO MORE HOSTS LEFT *******
/var/log/
[Tue Sep 15 23:52:38.988613 2020] [wsgi:error] [pid 1293] [remote 172.17.1.149:47570] File "/usr/lib64/
Marios Andreou (marios-b) wrote : | #3 |
@martin what in particular made you suspect you have the same issue I can't tell from the logs in comment #2
Marios Andreou (marios-b) wrote : | #4 |
did some more digging - leaning more towards this being a neutron issue but still not closer to understanding why.
I used https:/
"UPGRADE start/end": 20:33:00.699450 -> 21:33:44
* https:/
* 2020-09-17 20:33:00.699450 | primary | TASK [standalone-upgrade : Upgrade the standalone] *******
2020-09-17 20:33:00.699475 | primary | Thursday 17 September 2020 20:33:00 +0000 (0:00:02.279) 0:07:32.348 ****
2020-09-17 21:33:44.045462 | primary | changed: [undercloud]
"MYSQL SHUTDOWN/
* https:/
* 2020-09-17 20:40:13 0 [Note] /usr/libexec/
* 2020-09-17 20:45:28 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
* 2020-09-17 21:02:12 0 [Note] /usr/libexec/
* 2020-09-17 21:02:32 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
* 2020-09-17 21:02:33 0 [Note] /usr/libexec/
* 2020-09-17 21:09:50 299 [Warning] Aborted connection 299 to db: 'glance' user: 'glance' host: '192.168.24.1' (Got an error reading communication packets)
"OVS/OVN logs:"
* 2020-09-
Michele Baldessari (michele) wrote : | #5 |
Here is my analysis of https:/
Timeline:
1) Standalone upgrade starts at 2020-09-15 13:59:31 and completes successfully at 2020-09-15 15:00:42
Note that towards the end of the upgrade we can observe a number of scary messages such as:
2020-09-15 15:00:03.942 8 ERROR nova.servicegro
The reason for these error messages is that one of the post-upgrade tasks in tht restarts the ovn-dbs-bundle (Ia7cf78e1f5e46
2) The ovn-dbs restart is in any case fully completed at 15:00:04:
Sep 15 15:00:00 standalone.
Sep 15 15:00:04 standalone.
Sep 15 15:00:04 standalone.
3) The router for the failing os_tempest ping gets successfully created at:
2020-09-15 15:02:01.467358 | primary | TASK [os_tempest : Create router] *******
2020-09-15 15:02:01.467377 | primary | Tuesday 15 September 2020 15:02:01 +0000 (0:00:02.308) 1:10:27.761 *****
2020-09-15 15:02:04.475258 | primary | ok: [undercloud -> 127.0.0.2]
2020-09-15 15:02:04.504699 | primary |
2020-09-15 15:02:04.504764 | primary | TASK [os_tempest : Get router admin state and ip address] *******
2020-09-15 15:02:04.504777 | primary | Tuesday 15 September 2020 15:02:04 +0000 (0:00:03.037) 1:10:30.799 *****
2020-09-15 15:02:04.557379 | primary | ok: [undercloud -> 127.0.0.2]
4) The ping itself fails at 15:02:07:
2020-09-15 15:02:07.057448 | primary | TASK [os_tempest : Ping router ip address] *******
2020-09-15 15:02:07.057502 | primary | Tuesday 15 September 2020 15:02:07 +0000 (0:00:00.065) 1:10:33.351 *****
2020-09-15 15:02:10.745010 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-15 15:02:24.365896 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
After the failure during the log collection we do see in the ovn logs that we have a port corresponding to the ip tempest is pinging (192.168.24.122):
router 5e5e16a8-
port lrp-9b278e0...
Changed in tripleo: | |
assignee: | nobody → Sergii Golovatiuk (sgolovatiuk) |
Martin Mágr (mmagr) wrote : | #6 |
The reason why I think I had the same issuse is the fact that when you check the mysqld.log in the job [1] you can see that all cloud services have issues using DB. After upgrade on my env the cloud more or less worked, but apparently there was a network issue, which in my case made redeploy after FFU fail 100% time on the Keystone check step.
Jakub Libosvar (libosvar) wrote : | #7 |
Sergii provided me an environment and that was extremely helpful, thanks for that. I looked there and I saw the external network subnet is the same as subnet used for the control plane. It means there was a route for the FIP to the br-ctplane instead of br-ex, br-ex was used in the bridge mappings for the given provider network.
When I changed the bridge mappings to use br-ctplane instead, the pings works. However this is not the solution because I think the control plane subnet should differ from the external public subnet. The network configuration should be changed for the external network so this can work properly (separate external and control plane networks).
Changed in tripleo: | |
assignee: | Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz) |
Changed in tripleo: | |
assignee: | Alex Schultz (alex-schultz) → nobody |
Alex Schultz (alex-schultz) wrote : | #8 |
This is a job config problem. The initial installation does not include --control-
Normal standalone trupleo_deploy.sh:
https:/
#!/bin/bash
# This file is managed by ansible
set -xeo pipefail
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
openstack tripleo deploy --templates $DEPLOY_TEMPLATES --standalone --yes --output-dir $DEPLOY_OUTPUT_DIR --stack $DEPLOY_STACK --standalone-role $DEPLOY_
Upgrade tripleo_deploy.sh:
https:/
#!/bin/bash
# This file is managed by ansible
set -xeo pipefail
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
export DEPLOY_
exp...
Alex Schultz (alex-schultz) wrote : | #9 |
Likely caused by https:/
Fix proposed to branch: master
Review: https:/
Changed in tripleo: | |
assignee: | nobody → Alex Schultz (alex-schultz) |
status: | Triaged → In Progress |
Marios Andreou (marios-b) wrote : | #11 |
@Alex
just commented @ https:/
Marios Andreou (marios-b) wrote : | #12 |
OK after more discussion just now on irc with @Sergii... so the fix from comment #11 makes the *deployment* have HA but then https:/
so we need both. I added /#/c/753817/ on the test at https:/
Marios Andreou (marios-b) wrote : | #13 |
per comment #12, unfortunately test still fails @
* https:/
* 2020-09-24 14:22:39.653261 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit e8567839a7f144c
Author: Alex Schultz <email address hidden>
Date: Wed Sep 23 12:26:39 2020 -0600
Enable HA always for upgrade job
In Ussuri we enabled pacemaker by default so when we landed a change[0]
in quickstart to handle this logic, it broke the upgrade job because
the ussuri job uses train initially and gets a non-ha standalone which
it tries to upgrade to HA. This results in an incorrect network config.
Since really the exepectation is that we'd always be upgrade HA to HA,
let's test that instead.
[0] https:/
Closes-Bug: #1895822
Change-Id: I78c1a0cf68534e
Changed in tripleo: | |
status: | In Progress → Fix Released |
Rafael Folco (rafaelfolco) wrote : | #15 |
Apparently the issue hasn't been fixed yet:
https:/
2020-09-28 06:11:57.107266 | primary | TASK [os_tempest : Ping router ip address] *******
2020-09-28 06:11:57.107329 | primary | Monday 28 September 2020 06:11:57 +0000 (0:00:00.081) 1:08:12.313 ******
2020-09-28 06:12:00.758247 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-09-28 06:12:14.388017 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-09-28 06:12:28.019650 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-09-28 06:12:41.588559 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-09-28 06:12:55.219646 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-09-28 06:13:08.863241 | primary | fatal: [undercloud]: FAILED! => {
2020-09-28 06:13:08.863322 | primary | "attempts": 5,
2020-09-28 06:13:08.863351 | primary | "changed": true,
2020-09-28 06:13:08.863375 | primary | "cmd": "set -e\nping -c2 \"192.168.
2020-09-28 06:13:08.863417 | primary | "delta": "0:00:03.101290",
2020-09-28 06:13:08.863443 | primary | "end": "2020-09-28 06:13:08.821880",
2020-09-28 06:13:08.863469 | primary | "rc": 1,
2020-09-28 06:13:08.863492 | primary | "start": "2020-09-28 06:13:05.720590"
2020-09-28 06:13:08.863516 | primary | }
2020-09-28 06:13:08.863540 | primary |
2020-09-28 06:13:08.863564 | primary | STDOUT:
2020-09-28 06:13:08.863605 | primary |
2020-09-28 06:13:08.863630 | primary | PING 192.168.24.129 (192.168.24.129) 56(84) bytes of data.
2020-09-28 06:13:08.863655 | primary | From 192.168.24.1 icmp_seq=1 Destination Host Unreachable
2020-09-28 06:13:08.863678 | primary | From 192.168.24.1 icmp_seq=2 Destination Host Unreachable
Changed in tripleo: | |
status: | Fix Released → Triaged |
Sergii Golovatiuk (sgolovatiuk) wrote : | #16 |
If we look at job closely it fails on https:/
If we look at interfaces before upgrade they will be as
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,
link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
valid_lft 3147sec preferred_lft 3147sec
inet6 fe80::5054:
valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,
link/ether 52:54:00:25:a0:f8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.229/24 brd 192.168.122.255 scope global dynamic noprefixroute ens4
valid_lft 3271sec preferred_lft 3271sec
inet6 fe80::5054:
valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,
link/ether c6:c0:26:55:35:d3 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,
link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
inet 192.168.24.2/24 scope global br-ex
valid_lft forever preferred_lft forever
inet6 fe80::85:
valid_lft forever preferred_lft forever
6: br-ctlplane: <BROADCAST,
link/ether 02:85:da:68:e5:4e brd ff:ff:ff:ff:ff:ff
inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
valid_lft forever preferred_lft forever
inet 192.168.24.3/32 brd 192.168.24.255 scope global br-ctlplane
valid_lft forever preferred_lft forever
inet6 fe80::85:
valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,
link/ether fa:6f:46:31:8f:42 brd ff:ff:ff:ff:ff:ff
After upgrade run they will be as
: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,
link/ether 52:54:00:ef:f4:eb brd ff:ff:ff:ff:ff:ff
inet 192.168.122.23/24 brd 192.168.122.255 scope global dynamic noprefixroute ens3
valid_lft 2369sec preferred_lft 2369sec
inet6 fe80::5054:
valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,
wes hayutin (weshayutin) wrote : | #17 |
Marios Andreou (marios-b) wrote : | #18 |
digging some more today - per comment https:/
per https:/
I agree there is something off about the network config going train->ussuri. We are not getting this bug for ussuri-master jobs - example at [1] which has the 'Ping router ip address' task executed twice (after deploy & after upgrade) without issue.
We changed something train -> ussuri - it might still be related to the switch to 'ha by default' [2].
I've been sanity checking deploy vs upgrade network config but haven't spotted something yet eg [3] vs [4] main diff is
{"network_
vs
{"network_
[1] https:/
[2] https:/
[3] https:/
[4] https:/
Marios Andreou (marios-b) wrote : | #19 |
really not sure about this yet but ...
I started looking for things that are in Ussuri but no Train... I compared that
* https:/
* https:/
in particular this commit seems interesting and missing in train: https:/
I just cherry-picked it to train
https:/
Marios Andreou (marios-b) wrote : | #20 |
this bit in particular is what caught my eye for comment #19 https:/
Marios Andreou (marios-b) wrote : | #21 |
After the upgrade, the resulting network configuration has no br-ex [1] only br-ctlplane. This may explain what jakub saw and commented at comment #7 above.
Not sure if this is *why* there is no br-ex but I noticed that the os-net-config data is different in deployment [2] vs upgrade [3]. In particular the upgrade os-net-config config.json has the extra control_virtual_ip ("192.168.24.3/32") and public_virtual_ip passed in ("192.168.
I also see these 'martian source' br-ex messages in the journal [5] which make sense if we are removing br-ex during the upgrade
Sep 28 16:41:44 standalone.
Am hoping to point some of the DF folks at this message perhaps it will help get us closer to the issue - grateful for any thoughts here thank you.
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/
Alex Schultz (alex-schultz) wrote : | #22 |
There is a br-ex in the network config.
You'd want to check the NetworkConfig when comparing but br-ex is being configured in both.
For train we have:
br-ex should get added via:
sed -i "s/: \"bridge_name/: \"${bridge_
sed -i "s/interface_
These values get invoked with br-ex from:
This is defined in the core playbook. https:/
In ussuri we don't have that stuff anymore because we backported the network config generation so it doesn't rely on that anymore.
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master) | #23 |
Related fix proposed to branch: master
Review: https:/
Marios Andreou (marios-b) wrote : | #24 |
@Alex per comment #22 agree, it *is* there in the config - if you see my comment #21 [2] and [3] repeated here are pointing to the os-net-config for deployment [2] and upgrade [3] and indeed both have the br-ex.
However the resulting configuration *on the node* is missing the br-ex e.g. looking at [1] no mention of br-ex. Compare that with [4] from a standalone job you can see the br-ex configured and with has routes for it too
[1] https:/
[2] https:/
[3] https:/
[4] https:/
Alex Schultz (alex-schultz) wrote : | #25 |
It's there in [1], just under ipv6 because there is no ipv4 address associated with it.
5: br-ex: <BROADCAST,
The network config just seems wrong. Per the tripleo-docs we always use br-ctlplane instead of br-ex but I'm trying to remember why that's configured that way. In upstream CI br-ex was always already configured via the undercloud-setup bits for multinode so I'm wondering if we're just running into poor CI setup vs there actually being a problem
Alex Schultz (alex-schultz) wrote : | #26 |
Ok so in Train, the os-net-config network config only includes the ctlplane/24 for the address. In ussuri this changed to also include the ctlplane and public vips as /32 because of the deprecation of keepalived.
Because ctlplane_ip is listed twice with a /24 and a /32 we get both. I'm trying to track down what changed (or if this has always been broken). It seems like we need to exclude a vip address if it's already the ctlplane_ip.
It looks like in train, we only added the ctlplane vip as a vip and not the public vip like we do in ussuri.
Fix proposed to branch: master
Review: https:/
Changed in tripleo: | |
status: | Triaged → In Progress |
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master) | #28 |
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: https:/
tags: | added: train-backport-potential ussuri-backport-potential |
Alex Schultz (alex-schultz) wrote : | #29 |
shouldn't affect train, it was a change in ussuri to get rid of keepalived. Unless we backport that, this is not needed
tags: | removed: train-backport-potential |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri) | #30 |
Fix proposed to branch: stable/ussuri
Review: https:/
Fix proposed to branch: master
Review: https:/
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/ussuri) | #33 |
Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https:/
Fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master) | #35 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri) | #36 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master) | #37 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri) | #38 |
Related fix proposed to branch: stable/ussuri
Review: https:/
Related fix proposed to branch: master
Review: https:/
Alex Schultz (alex-schultz) wrote : | #40 |
Ok so I think it's the bridgemap & network config. Doing the upgrade using the documentation with the following configuration works fine:
NeutronPublic
NeutronBridge
NeutronPhysic
In CI, we only configure NeutronPublicIn
I think what happens is we have:
br-ex -> br-ctlplane (with ips)
^- neutron
So routing gets weird. I think we need
br-ex -> br-ctlplane (with ips)
^- neutron
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master) | #41 |
Related fix proposed to branch: master
Review: https:/
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: we need this defined because the framework assumes there is always a control_virtual_ip
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: we need this defined because the framework assumes there is always a control_virtual_ip
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/ussuri) | #44 |
Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https:/
Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master) | #46 |
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: we don't need to change ips if we can change the br-ex allocation
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
Reason: This actually doesn't cause a problem (TM)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on python-tripleoclient (stable/ussuri) | #48 |
Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master) | #49 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master) | #50 |
Related fix proposed to branch: master
Review: https:/
Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master) | #52 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit c86fa4fb4ee6f4b
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600
Don't manage bridge mappings in scenario file
The bridge mappings should be managed in the standalone parameters. This
bridge mapping prevents us from being able to change the datacentre
mapping in CI.
Change-Id: I6b5b9db75a11c2
Related-Bug: #1895822
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master) | #53 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit eaf976ae0e1795b
Author: yatinkarel <email address hidden>
Date: Tue Oct 13 13:55:13 2020 +0530
Handle migration of br-ex network
[1] changing br-ex network from 192.168.24 to 172.16.1
for standalone jobs.
In order to allow the migration need to adjust
tempest configuration. This can be reverted once
[1] and [2] lands.
[1] https:/
[2] https:/
Related-Bug: #1895822
Change-Id: I1865db91166109
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit fa1bd4ad28d4e7a
Author: Alex Schultz <email address hidden>
Date: Thu Oct 1 12:03:33 2020 -0600
Standalone configure neutron bridge correctly
This change updates the default bridge mapping from datacentre:br-ex to
datacentre:
CI, we configure a br-ex before running the standalone (via
undercloud-
to ensure that we use br-ctlplane for the neutron access to the external
network to prevent weird routing issues when we have two bridges on the
same subnet.
Depends-On: https:/
Change-Id: I0e5aa3f58746dc
Related-Bug: #1895822
Sandeep Yadav (sandeepyadav93) wrote : | #55 |
Hello Alex,
I know you are already working on sc12 along with this bug, but just in case:-
tripleo-
~~~
2020-10-22 07:40:54.215940 | primary | TASK [os_tempest : Ping router ip address] *******
2020-10-22 07:40:54.215980 | primary | Thursday 22 October 2020 07:40:54 +0000 (0:00:00.111) 0:46:43.145 ******
2020-10-22 07:40:58.032949 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2020-10-22 07:41:11.729033 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2020-10-22 07:41:25.488429 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2020-10-22 07:41:39.185617 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2020-10-22 07:41:52.881059 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2020-10-22 07:42:06.576933 | primary | fatal: [undercloud]: FAILED! => {
2020-10-22 07:42:06.577289 | primary | "attempts": 5,
2020-10-22 07:42:06.577361 | primary | "changed": true,
2020-10-22 07:42:06.577407 | primary | "cmd": "set -e\nping -c2 \"192.168.
2020-10-22 07:42:06.577449 | primary | "delta": "0:00:03.066736",
2020-10-22 07:42:06.577492 | primary | "end": "2020-10-22 07:42:06.538032",
2020-10-22 07:42:06.577534 | primary | "rc": 1,
2020-10-22 07:42:06.577573 | primary | "start": "2020-10-22 07:42:03.471296"
2020-10-22 07:42:06.577612 | primary | }
~~~
Logs:- https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri) | #56 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train) | #57 |
Related fix proposed to branch: stable/train
Review: https:/
Alex Schultz (alex-schultz) wrote : | #58 |
cherry picked the CI change back for scenario012 https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train) | #59 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit f07ca38a82c6599
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600
Don't manage bridge mappings in scenario file
The bridge mappings should be managed in the standalone parameters. This
bridge mapping prevents us from being able to change the datacentre
mapping in CI.
Change-Id: I6b5b9db75a11c2
Related-Bug: #1895822
(cherry picked from commit c86fa4fb4ee6f4b
tags: | added: in-stable-train |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ussuri) | #60 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ussuri
commit 805fe6e41963068
Author: Alex Schultz <email address hidden>
Date: Tue Oct 13 09:42:28 2020 -0600
Don't manage bridge mappings in scenario file
The bridge mappings should be managed in the standalone parameters. This
bridge mapping prevents us from being able to change the datacentre
mapping in CI.
Change-Id: I6b5b9db75a11c2
Related-Bug: #1895822
(cherry picked from commit c86fa4fb4ee6f4b
tags: | added: in-stable-ussuri |
Changed in tripleo: | |
milestone: | victoria-3 → wallaby-1 |
Changed in tripleo: | |
milestone: | wallaby-1 → wallaby-2 |
Changed in tripleo: | |
milestone: | wallaby-2 → wallaby-3 |
Changed in tripleo: | |
milestone: | wallaby-3 → wallaby-rc1 |
spent some more time digging at logs. I am not clear yet if this is an issue with HA/mysql or if it is an issue with ovs/ovn networking. I am leaning towards networking at the moment.
I'll reach out to network and pidone squads to check here - adding pointers to some error messages in the logs I came across just now:
* https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_ee1/ 739457/ 27/check/ tripleo- ci-centos- 8-standalone- upgrade- ussuri/ ee18aa6/ logs/undercloud /var/log/ containers/ mysql/mysqld. log
* 2020-09-15 14:32:08 0 [Note] InnoDB: Starting shutdown... mysqld: Shutdown complete f754-11ea- ac23-9b5df17a20 4a:8702, safe_to_bootstrap: 1 mysqld: ready for connections. mysql/mysql. sock' port: 3306 MariaDB Server
2020- 09-15 14:32:31 0 [Note] InnoDB: Buffer pool(s) load completed at 200915 14:32:31
* 2020-09-15 14:32:09 0 [Note] /usr/libexec/
* 2020-09-15 14:32:30 0 [Note] WSREP: Found saved state: cebd6089-
* 2020-09-15 14:32:30 0 [Note] /usr/libexec/
Version: '10.3.17-MariaDB' socket: '/var/lib/
* 2020-09-15 14:38:29 259 [Warning] Aborted connection 259 to db: 'nova_api' user: 'nova_api' host: '192.168.24.1' (Got an error reading communication packets)
* https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_ee1/ 739457/ 27/check/ tripleo- ci-centos- 8-standalone- upgrade- ussuri/ ee18aa6/ logs/undercloud /var/log/ containers/ openvswitch/ ovn-controller. log
* 2020-09- 15T14:41: 08.575Z| 00051|lflow| WARN|Dropped 19 log messages in last 1622 seconds (most recently, 1607 seconds ago) due to excessive rate 15T14:50: 41.820Z| 00004|fatal_ signal( ovn_pinctrl0) |WARN|terminati ng with signal 15 (Terminated)
* 2020-09-
* https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_ee1/ 739457/ 27/check/ tripleo- ci-centos- 8-standalone- upgrade- ussuri/ ee18aa6/ logs/undercloud /var/log/ containers/ openvswitch/ ovsdb-server- sb.log 15T13:16: 17.971Z| 00002|ovsdb_ server| INFO|ovsdb- server (Open vSwitch) 2.12.0 15T13:16: 19.947Z| 00005|reconnect |WARN|unix# 6: connection dropped (Connection reset by peer) 15T14:12: 32.939Z| 00005|reconnect |WARN|unix# 0: connection dropped (Broken pipe) 15T14:59: 56.235Z| 00002|daemon_ unix(monitor) |INFO|pid 152 died, exit status 0, exiting
* 2020-09-
* 2020-09-
* 2020-09-
* 2020-09-
* https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_ee1/ 739457/ 27/check/ tripleo- ci-centos- 8-standalone- upgrade- ussuri/ ee18aa6/ logs/undercloud /var/log/ containers/ stdouts/ ovn-dbs- bundle. log
* 2020-09- 15T13:16: 18.332804574+ 00:00 stderr F (operation_ finished) notice: ovndb_servers_ start_0: 48:stderr [ ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ]