l3 agent migration tests failed with l3 agent hosting router with id not found.

Bug #1539707 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
8.0.x
Fix Released
High
Oleg Bondarev
Mitaka
Fix Released
High
Oleg Bondarev
Mirantis OpenStack
Fix Released
High
Oleg Bondarev

Bug Description

Steps:

Scenario:
1. Revert snapshot with neutron cluster
2. Create an instance with a key pair
3. Manually reschedule router from primary controller
to another one
4. Destroy controller with l3-agent
5. Check l3-agent was rescheduled
6. Check network connectivity from instance via
dhcp namespace
7. Run OSTF

Actual:
l3 agent hosting router with id:cb0cfa86-457c-4dfa-8ab3-c27079a425b0 not found.

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ha_neutron_destructive_vlan/fuelweb_test/helpers/decorators.py", line 83, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ha_neutron_destructive_vlan/fuelweb_test/tests/tests_strength/test_neutron.py", line 114, in neutron_l3_migration_after_destroy_vlan
    super(self.__class__, self).neutron_l3_migration_after_destroy()
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ha_neutron_destructive_vlan/fuelweb_test/tests/tests_strength/test_neutron_base.py", line 357, in neutron_l3_migration_after_destroy
    self.reschedule_router_manually(os_conn, router_id)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ha_neutron_destructive_vlan/fuelweb_test/__init__.py", line 57, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ha_neutron_destructive_vlan/fuelweb_test/tests/tests_strength/test_neutron_base.py", line 58, in reschedule_router_manually
    " not found.".format(router_id))
NotFound: l3 agent hosting router with id:cb0cfa86-457c-4dfa-8ab3-c27079a425b0 not found.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
tags: added: area-library
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
status: New → Confirmed
ruhe (ruhe)
Changed in fuel:
status: Confirmed → Triaged
status: Triaged → Confirmed
tags: added: team-bugfix
tags: added: swarm-blocker
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

I think it is a problem in neutron python client. If we try to remove router manually from console neutronclient - it was done:

root@node-2:~# neutron router-list
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
| id | name | external_gateway_info | distributed | ha |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
| 5b2710c4-9634-41fc-a7ae-fefa04c95030 | router04 | {"network_id": "d3be38c2-6747-4094-ab6c-ac033bc6c210", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "96961479-e144-46d7-8186-338cb4a07394", "ip_address": "10.109.3.128"}]} | False | False |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+

root@node-4:~# neutron router-port-list 5b2710c4-9634-41fc-a7ae-fefa04c95030
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+
| e285498e-8c17-438c-96e1-9c564d68de21 | | fa:16:3e:26:65:d9 | {"subnet_id": "1e45b5cb-f5f9-4941-a521-dd4c7f2d9d8c", "ip_address": "10.109.4.1"} |
| f873c8e2-a33e-4db1-ac85-bddc75bd8a4b | | fa:16:3e:e2:70:6b | {"subnet_id": "96961479-e144-46d7-8186-338cb4a07394", "ip_address": "10.109.3.128"} |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+

root@node-4:~# neutron router-interface-delete 5b2710c4-9634-41fc-a7ae-fefa04c95030 1e45b5cb-f5f9-4941-a521-dd4c7f2d9d8c
Removed interface from router 5b2710c4-9634-41fc-a7ae-fefa04c95030.
root@node-4:~# neutron router-interface-delete 5b2710c4-9634-41fc-a7ae-fefa04c95030 96961479-e144-46d7-8186-338cb4a07394
Router 5b2710c4-9634-41fc-a7ae-fefa04c95030 has no interface on subnet 96961479-e144-46d7-8186-338cb4a07394

root@node-4:~# neutron router-delete 5b2710c4-9634-41fc-a7ae-fefa04c95030
Deleted router: 5b2710c4-9634-41fc-a7ae-fefa04c95030
root@node-4:~# neutron router-list

root@node-4:~#

So it seems to be a bug in python-neutronclient. Move to mos team.

tags: added: area-ci
removed: area-library
tags: added: area-mos
removed: area-ci
Andrey Maximov (maximov)
tags: removed: team-bugfix
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Not sure I got what the last comment means: how deleting router with id 5b2710c4-9634-41fc-a7ae-fefa04c95030 is related to error "l3 agent hosting router with id:cb0cfa86-457c-4dfa-8ab3-c27079a425b0 not found." Please clarify why you think it is neutron client error?

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Okay, just removing router from agent and adding to other agent works too:
root@node-2:/etc/puppet/modules/osnailyfacter/modular/openstack-network# neutron l3-agent-router-remove 7216354c-8c1b-4419-b894-9372ce81dcdb 940028d4-f7a0-4307-b99c-b5a842342f38
Removed router 940028d4-f7a0-4307-b99c-b5a842342f38 from L3 agent
root@node-2:/etc/puppet/modules/osnailyfacter/modular/openstack-network# neutron l3-agent-router-add 94fc000c-0e09-469f-b697-5a3c39cd9e64 940028d4-f7a0-4307-b99c-b5a842342f38
Added router 940028d4-f7a0-4307-b99c-b5a842342f38 to L3 agent
root@node-2:/etc/puppet/modules/osnailyfacter/modular/openstack-network#

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

It means that there are no problem from console client, but a problem with approach qa team use neutron client. Technically, move router from one l3 agent to another is possibly - I provided proofs one comment ago.

Move bug back to ostf team.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Actually test failed before attempts to remove something (or migrate), it fails on attempts to find router after deployment
So move back to you ) Stas

Revision history for this message
Oleg Bondarev (obondarev) wrote :

"NotFound: l3 agent hosting router with id:cb0cfa86-457c-4dfa-8ab3-c27079a425b0 not found." - is this a failure message on attempt to find a router? Not to say this message is confusing by itself (what is not found? router? agent?) It should be better described in the bug on which step the failure happens.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :
Download full text (10.5 KiB)

So, there is data from failed environment:

root@node-1:/etc/puppet/modules/osnailyfacter/modular/openstack-network# neutron router-list
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------+-------------+-------+
| id | name | external_gateway_info | distributed | ha |
+--------------------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
| b4f76c72-659d-47ec-a90c-68a2c82e577b | router04 | {"network_id": "7821444d-d0de-4b58-ae1d-a81925484f0b", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "1a2b80c7-189e-41e9-86e9-284ab9b5555a", "ip_address": "10.109.35.128"}]} | False | False |
+--------------------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
root@node-1:/etc/puppet/modules/osnailyfacter/modular/openstack-network# neutron agent-list
+--------------------------------------+--------------------+--------------------------+-------+----------------+---------------------------+
| id | agent_type | host | alive | admin_state_up | binary |
+--------------------------------------+--------------------+--------------------------+-------+----------------+---------------------------+
| 04b9bd72-71fc-4da1-a480-3ec960345548 | L3 agent | node-3.test.domain.local | :-) | True | neutron-l3-agent |
| 085939b3-8666-4e4f-a129-81270acca211 | L3 agent | node-1.test.domain.local | :-) | True | neutron-l3-agent |
| 1ca981b4-7d5f-4caf-8236-e4431dcae5ab | DHCP agent | node-2.test.domain.local | :-) | True | neutron-dhcp-agent |
| 1eb3966c-8623-423f-b1d9-2fc23acf679c | Metadata agent | node-2.test.domain.local | :-) | True | neutron-metadata-agent |
| 3717fa76-bc95-4d0e-9dd2-dcecc967fc5c | DHCP agent | node-3.test.domain.local | :-) | True | neutron-dhcp-agent |
| 54823daf-292c-42a2-98cf-eead056b1eca | DHCP agent | node-1.test.domain.local | :-) | True | neutron-dhcp-agent |
| 5fd26d91-b375-4368-b1bd-561e22587fa4 | Open vSwitch agent | node-6.test.domain.local | :-) | True | neutron-openvswitch-agent |
| 6348f626-4a87-4baa-99bc-7729d45a260f | Open vSwitch agent | node-2.test.domain.local | :-) | True | neutron-openvswitch-agent |
| 7619848c-6ea0-4696-9f71-d43c65c271d6 | Open vSwitch agent | node-3.test.domain.local | :-) | T...

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

QA code is okay. Router didn't stick to agent on failed env.

After looking into logs it shows that at time when router was created there were no l3 agents alive - so it wasn't scheduling to some. Why router didn't scheduling after that - it is a question. I suppose that it is can be a bug in neutron itself. So I move this bug to neutron team - if there will be no root cause found, we will try to create a workaround solution.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/16645

Revision history for this message
Oleg Bondarev (obondarev) wrote :

It appeared to be a regression in routers auto scheduling logic (see fix ^^) which our deployment relies on when creating resources before starting neutron agents. Corresponding bug was filed to fix this logic as well: https://bugs.launchpad.net/fuel/+bug/1541297

tags: added: hit-hcf
Revision history for this message
Oleg Bondarev (obondarev) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/16645
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: 408d2e6f19de60d059dc8bddadb62a0dbbfaf4e7
Author: Oleg Bondarev <email address hidden>
Date: Wed Feb 3 10:14:03 2016

Fix regression in routers auto scheduling logic

Routers auto scheduling works when an l3 agent starst and performs
a full sync with neutron server. Neutron server looks for all
unscheduled routers and schedules them to that agent if applicable.
This was broken by commit 062d16ac0163f921d4255d9b77d6f903a7c5f110
which changed full sync logic a bit: now l3 agent requests all ids
of routers scheduled to it first. get_router_ids() didb't call
routers auto scheduling which caused the regression.
This patch adds routers auto scheduling to get_router_ids().

Closes-Bug: #1539707
Change-Id: If6d4e7b3a4839c93296985e169631e5583d9fa12

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Fix merged ^

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

verified 523 iso

tags: added: wait-for-stable
Revision history for this message
Alexander Ignatov (aignatov) wrote :

@Tatyanka, we expect that fix is in the ISO which now contains updated packages from upstream. Reassigning this bug to you for additional verification on swarm suite.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

verified on 76 iso. works fine, Thank you guys

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
affects: fuel → mos
Changed in mos:
milestone: 9.0 → none
milestone: none → 9.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.