Invalid input for operation: IP allocation requires subnets for network

Bug #1583759 reported by Carl Baldwin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
High
Bharat Kumar

Bug Description

Several people now, including Brian Haley and me, have been chasing down this stack trace [1] for a few weeks. We've seen it in failed jobs and we begin chasing it down only to find out that it is a red herring.

I'm filing this bug because we ought to capture what we know about it, figure out if it is correlated with any failures, and hopefully eliminate the trace so that no longer distracts us from other problems.

I was poking through the stack trace in github. Since I had the links handy, I thought I'd include them here [2-11]. Also, this logstash query might be helpful [12].

[1] http://paste.openstack.org/show/497738/
[2] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/api/rpc/handlers/dhcp_rpc.py#L211
[3] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/api/rpc/handlers/dhcp_rpc.py#L93
[4] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/plugins/common/utils.py#L162
[5] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/plugins/ml2/plugin.py#L1137
[6] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/plugins/ml2/plugin.py#L1106
[7] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/db/db_base_plugin_v2.py#L1247
[8] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/db/ipam_non_pluggable_backend.py#L204
[9] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/db/ipam_non_pluggable_backend.py#L362
[10] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/db/ipam_non_pluggable_backend.py#L245
[11] https://github.com/openstack/neutron/blob/79c1d7efc1/neutron/db/ipam_backend_mixin.py#L335-L337
[12] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22InvalidInput%3A%20Invalid%20input%20for%20operation%3A%20IP%20allocation%20requires%20subnets%20for%20network%5C%22

Tags: l3-ipam-dhcp
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → phani regalla (divyaregalla)
Changed in neutron:
assignee: phani regalla (divyaregalla) → nobody
Changed in neutron:
assignee: nobody → Bharat Kumar (bharatkumar)
Revision history for this message
Brian Haley (brian-haley) wrote :

This error seems to be happening more and more, especially in the dvr-multinode-full job.

The typical symptom is that the DHCP port allocation fails to get an IP, and it appears the subnet is getting deleted simultaneously with the request.

For example, here is the request on the server to allocate an IP for DHCP:

http://logs.openstack.org/14/356714/3/check/gate-tempest-dsvm-neutron-dvr-multinode-full/31d68fa/logs/screen-q-svc.txt.gz#_2016-08-19_03_01_02_267

But less than a second before that, the subnet was deleted due to the network being deleted:

http://logs.openstack.org/14/356714/3/check/gate-tempest-dsvm-neutron-dvr-multinode-full/31d68fa/logs/screen-q-svc.txt.gz#_2016-08-19_03_01_01_933

So that means this failure is the symptom of some other issue, such as why is tempest deleting this resource? We really need to figure this out...

Changed in neutron:
importance: Medium → High
Revision history for this message
Bharat Kumar (bharatkumar) wrote :

Hi Brian,

Thanks for note. Can I have setup details? How to reproduce?
I agree with your analysis ie to find the resource is getting deleted.

Thanks
Bharat

Revision history for this message
Bharat Kumar (bharatkumar) wrote :

The problem happens due to a deadlock. After the deadlock the network cleanup happened. But while port creating the same network is used. This results in given traceback.

DBDeadlock: (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE ports SET status=%(status)s WHERE ports.id = %(ports_id)s'] [parameters: {'status': 'ACTIVE', 'ports_id': u'a9bed451-d126-4f3a-aecc-f3965b1a6252'}]

Revision history for this message
Brian Haley (brian-haley) wrote :

Ok, so it's an internal neutron error that's causing an async deletion of a resource during a test. Was that deadlock from the same log as above?

Revision history for this message
Bharat Kumar (bharatkumar) wrote :

Yes Brian. It is in the same log.

Revision history for this message
Bharat Kumar (bharatkumar) wrote :

Is it a setup issue? Can you plz clarify? If it is not setup issue, Can you plz provide reproduction steps?

Revision history for this message
Brian Haley (brian-haley) wrote :

Bharat - I don't have the exact reproduction steps, it's just seen a lot in the logs. If you could come up with a reproducer that would be great.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Might have been trampled by a more recent bug report?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.