some L3 HA routers does not work
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Pike
DVR + L3_HA
L2population enabled
Some of our L3 HA routers are not working correctly. They are not reachable from instances.
After deep investigation, I've found that "HA port tenant <tenant id>" ports are in state DOWN.
They are DOWN because they don't have binding information.
They don't have binding information because 'HA network tenant <tenant_id>' network is corrupted.
I mean it does not have provider:
The weird thing is that this network was OK and worked but in some point in time has been corrupted. I don't have any logs from this point in time.
For comparison working HA tenant network:
+------
| Field | Value |
+------
| admin_state_up | True |
| availability_
| availability_zones | nova |
| created_at | 2018-02-
| description | |
| id | fa2fea5c-
| ipv4_address_scope | |
| ipv6_address_scope | |
| mtu | 9000 |
| name | HA network tenant afeeb372d793479
| port_security_
| project_id | |
| provider:
| provider:
| provider:
| revision_number | 3 |
| router:external | False |
| shared | False |
| status | ACTIVE |
| subnets | 5cbc612d-
| tags | |
| tenant_id | |
| updated_at | 2018-02-
+------
and not working HA tenant network:
+------
| Field | Value |
+------
| admin_state_up | True |
| availability_
| availability_zones | |
| created_at | 2018-01-
| description | |
| id | 6390c381-
| ipv4_address_scope | |
| ipv6_address_scope | |
| mtu | 9000 |
| name | HA network tenant 3e88cffb9dbb4e1
| port_security_
| project_id | |
| provider:
| provider:
| provider:
| revision_number | 5 |
| router:external | False |
| shared | False |
| status | ACTIVE |
| subnets | 4d579b00-
| tags | |
| tenant_id | |
| updated_at | 2018-01-
+------
I've found that all working networks have revision_number = 3 and all not working networks have revision_number = 5.
When HA network tenant network is corrupted ALL L3-HA routers in a particular tenant are not working.
Is there any way to fix this without removing all existing L3-HA routers in this tenant?
Unfortunately I can't find any code responsible for "HA network tenant" updating or modification so I hit a wall in my debugging process.
It is probable that network has been corrupted during some automatic network resources provisioning using Heat stack but I can't reproduce this.
summary: |
- some L3 HA routrers does not work + some L3 HA routers does not work |
Changed in neutron: | |
status: | New → Invalid |
The HA network is created in [1]. I wonder if deleting the HA network or the HA port and restarting the master/standby agents will trigger a rebuild. It's been a while before I looked at this part of the code and I am not 100% sure what's in there anymore.
[1] http:// git.openstack. org/cgit/ openstack/ neutron/ tree/neutron/ db/l3_hamode_ db.py#n197