Deleting csnat port fails due to no fixed ips
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| neutron |
High
|
Miguel Lavalle |
Bug Description
This code [1] ends up emitting an "IndexError: list index out of range" exception that ends up with a trace like this [2]. Essentially, there are no fixed ips on the port. Not sure yet how it got in to this state. This failure is linked to various tempest failures in gate-tempest-
tempest.
tempest.
[1] https:/
[2] http://
[3] http://
Carl Baldwin (carl-baldwin) wrote : | #1 |
Changed in neutron: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
tags: | added: gate-failure ipv6 l3-dvr-backlog l3-ipam-dhcp |
Changed in neutron: | |
assignee: | nobody → Carl Baldwin (carl-baldwin) |
Carl Baldwin (carl-baldwin) wrote : | #2 |
The dualnet aspect of the test causes tempest to create a different network for the ipv6 subnets [1]. The multi_prefix aspect creates two ipv6 subnets instead of one. There are other tests which test both of these aspects individually [2] and with neither aspect but I haven't seen those tests fail in this way.
[1] https:/
[2] https:/
Fix proposed to branch: master
Review: https:/
Changed in neutron: | |
assignee: | Carl Baldwin (carl-baldwin) → Kevin Benton (kevinbenton) |
status: | Confirmed → In Progress |
Changed in neutron: | |
assignee: | Kevin Benton (kevinbenton) → Carl Baldwin (carl-baldwin) |
Changed in neutron: | |
assignee: | Carl Baldwin (carl-baldwin) → Kevin Benton (kevinbenton) |
Carl Baldwin (carl-baldwin) wrote : | #4 |
It occurred to me that regular router ports also use a single port for multiple ipv6 subnets. The csnat ports were likely designed to mirror this behavior. I also have a vague recollection of asking why this was the case. I think we were in the Adobe office in Utah for the Neutron mid-cycle when I asked. I don't recall the response. With that in mind, I dig this up [1]
Carl Baldwin (carl-baldwin) wrote : | #5 |
When the proposed fix merges, we should file a follow up bug to keep an eye on the issue using the new log message.
Carl Baldwin (carl-baldwin) wrote : | #6 |
I looked a little bit in the logs for a failure [1]. The router interface that got the 500 was the ipv4 one. Yet, this seems to only happen when coupled with another network (dualnet) with two ipv6 subnets. I still haven't figure it out and need to step away.
[1] http://
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit bcef61703061c57
Author: Kevin Benton <email address hidden>
Date: Tue Aug 2 18:46:11 2016 -0700
Fix indexerror in delete_csnat_port
The code assumed every port returned from the csnat port query
would have a fixed_ip that it could compare the subnet it is
looking for to. This should be a valid assumption however there
is a path leading to a condition where it has no IPs. This makes
the cleanup code handle this case and dump a warning until we can
figure out what causes the interface to lose the IP.
Partial-Bug: #1609540
Change-Id: Ida024a231bb3fc
Carl Baldwin (carl-baldwin) wrote : | #8 |
I'm going to watch this to see if "CSNAT port has no IPs" appears anywhere in logstash. So far, no occurrences of it but we should start to see them.
Changed in neutron: | |
assignee: | Kevin Benton (kevinbenton) → Carl Baldwin (carl-baldwin) |
Miguel Lavalle (minsel) wrote : | #9 |
We need to keep watching logstash for occurrences of "CSNAT port has no IPs". I'll check with infra to make sure we get debug level messages from logstash
If this is no longer offending in the gate can we demote its severity?
Carl Baldwin (carl-baldwin) wrote : | #11 |
This isn't causing gate failures anymore. But, we still need to treat it like a bug because we haven't found the root cause yet.
Changed in neutron: | |
importance: | Critical → High |
tags: | removed: gate-failure |
Carl Baldwin (carl-baldwin) wrote : | #12 |
I wonder if my logstash queries are not searching debug log messages.
Miguel Lavalle (minsel) wrote : | #13 |
I checked with the infra team today. Logstash / kibana doesn't capture debug level messages. That is why we never see the "CSNAT port has no IPs" message
Carl Baldwin (carl-baldwin) wrote : | #14 |
I was afraid of that, especially since it was my suggestion to make it debug level in the first place. So, now to get any kind of data about this problem, we'd have to bump up the log level of that message.
Related fix proposed to branch: master
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 268e10ef4d3106e
Author: Carl Baldwin <email address hidden>
Date: Thu Sep 15 09:47:40 2016 -0600
Raise level of message to info
We need to be able to search for occurences of this issue in the gate.
But, logstash doesn't index debug messages. So, they are impossible to
search for. Hence, raising the level to info which, I'm told, are
indexed.
Change-Id: Ie9116c362f3e3d
Related-Bug: #1609540
Oleg Bondarev (obondarev) wrote : | #17 |
Comment 13 from https:/
Miguel Lavalle (minsel) wrote : | #18 |
Following up https:/
Changed in neutron: | |
assignee: | Carl Baldwin (carl-baldwin) → Miguel Lavalle (minsel) |
Miguel Lavalle (minsel) wrote : | #19 |
Not seen over the past 4 weeks by triggering tests with this patchset: https:/
Changed in neutron: | |
status: | In Progress → Invalid |
tags: | added: neutron-proactive-backport-potential |
Kevin mentioned that he suspects that the code that handles multiple ipv6 addresses on a single csnat port may be the culprit. It is difficult to follow and complicates things. It was added in this patch [1].
Each ipv4 subnet gets its own port, so why not the ipv6 ones?
[1] https:/ /review. openstack. org/#/c/ 225319