[DVR] Deleting qrouter namespace from compute after destroying non-primary controller

Bug #1493739 reported by Kristina Berezovskaia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Oleg Bondarev
7.0.x
Won't Fix
Medium
Sergii Rizvan
8.0.x
Fix Released
High
Oleg Bondarev

Bug Description

After destroying controller with snat qrouter interface disappeared from compute with vm and there are no connection to internet from VM now

Steps to reproduce:
1) Deploy env with DVR
2) Create new net1, subnet1
3) Create DVR router1 with gateway to external net
4) Connect router with new subnet1
5) Boot vm in new net1 without floating
6) Find controller with snat for router1
7) Destroy controller with snat
8) Wait 10 min
9) Check reshedulling snat

Expected result: router is scheduled on all health controllers and compute with vm, snat was rescheduled and ping 8.8.8.8 from vm works
Current result: no qrouter interface on compute and connection to 8.8.8.8 failed

iso:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "287"
  build_id: "287"
  nailgun_sha: "46a7a2177a0b7ef91422284c1c90295fee8f5c84"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "a717657232721a7fafc67ff5e1c696c9dbeb0b95"
  fuel-library_sha: "43224223dab8cf9627b5ecf737e60216a3fdd114"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"

vxlan, 3 controllers, 2 compute

Tags: neutron dvr
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
Changed in mos:
status: New → Confirmed
Revision history for this message
Oleg Bondarev (obondarev) wrote :
Download full text (6.0 KiB)

Trace seen in server logs:

2015-09-09 06:08:59.732 3394 ERROR neutron.db.l3_agentschedulers_db [req-271baca5-bad6-4bd7-a1da-9ea7a87a8944 ] Exception encountered during router rescheduling.
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db Traceback (most recent call last):
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 121, in reschedule_routers_from_down_agents
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self.reschedule_router(context, binding.router_id)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 265, in reschedule_router
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self.schedule_router(context, router_id, candidates=candidates)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 528, in schedule_router
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self, context, router, candidates=candidates)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/scheduler/l3_agent_scheduler.py", line 358, in schedule
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db plugin, context, router_id, candidates=candidates)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/scheduler/l3_agent_scheduler.py", line 240, in _schedule_router
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self.bind_router(context, router_id, chosen_agent)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/scheduler/l3_agent_scheduler.py", line 216, in bind_router
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db context.session.add(binding)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 470, in __exit__
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self.rollback()
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db compat.reraise(exc_type, exc_value, exc_tb)
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 467, in __exit__
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self.commit()
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 377, in commit
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db self._prepare_impl()
2015-09-09 06:08:59.732 3394 TRACE neutron.db.l3_agentschedulers_db File "/usr/l...

Read more...

Revision history for this message
Oleg Bondarev (obondarev) wrote :

This was reproduced only once, all other attempts were successful. Also there is a workaround: start new VM on affected compute. I have a fix in mind but it's probably a bit risky after HCF. Moving to High.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/11413

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/13897

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/13897
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: b41a04f475cb22d50fd55e5c95fd5bd38bea83fc
Author: Oleg Bondarev <email address hidden>
Date: Tue Nov 24 13:36:35 2015

DVR: do not unschedule router from agents on computes while rescheduling

Scheduling/unscheduling of DVR routers with l3 agents in 'dvr' mode
running on a compute nodes is done according to DVR serviced ports
created/deleted on those compute nodes.
It doesn't make sense to unschedule router from l3 agent on compute
node - no other l3 agent can handle VMs running on that compute node.
Commit 68da5fa28e8e904a7f0e48fb6d8e9345fcb9c534 fixed one case, but
there is another case which this patch is fixing.

Closes-Bug: #1493739
Change-Id: Ie96878cef44cc595b94130ee5da91ecdb45bba53

Revision history for this message
Sergii Rizvan (srizvan) wrote :

The bug was reproduced in very rare case, also it doesn't have affected customers and workaround exists. That's why we are about to decrease priority to Medium and close it as Won't Fix for 7.0.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Change abandoned by Sergii Rizvan <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/11413
Reason: Change abandoned due to https://bugs.launchpad.net/mos/+bug/1493739/comments/7

Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "361"
  build_id: "361"
  fuel-nailgun_sha: "53c72a9600158bea873eec2af1322a716e079ea0"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "7463551bc74841d1049869aaee777634fb0e5149"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "ba8063d34ff6419bddf2a82b1de1f37108d96082"
  fuel-ostf_sha: "889ddb0f1a4fa5f839fd4ea0c0017a3c181aa0c1"
  fuel-mirror_sha: "8adb10618bb72bb36bb018386d329b494b036573"
  fuelmenu_sha: "824f6d3ebdc10daf2f7195c82a8ca66da5abee99"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "07d5f1c3e1b352cb713852a3a96022ddb8fe2676"
(neutron+dvr+vlan, neutron+dvt+vxlan, 3 controllers, 2 compute)

After destroying controller with snat many times qrouter interface on compute didn't disappear and connection to 8.8.8.8 didn't fail

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/18403

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/neutron (9.0/mitaka)

Change abandoned by Oleg Bondarev <email address hidden> on branch: 9.0/mitaka
Review: https://review.fuel-infra.org/18403
Reason: Not needed since bp/improve-dvr-l3-agent-binding was implemented in Mitaka

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.