FixedIPsTestJson fails server build with "was re-scheduled: operation failed: filter 'nova-no-nd-reflection' already exists with uuid"

Bug #1612875 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Newton
Fix Committed
Medium
Matt Riedemann

Bug Description

Seen here:

http://logs.openstack.org/98/355098/1/check/gate-tempest-dsvm-full-ubuntu-xenial/3c301f3/logs/screen-n-cond.txt.gz?level=TRACE#_2016-08-13_00_53_27_621

2016-08-13 00:53:27.621 15971 ERROR nova.scheduler.utils [req-e7b83619-01ae-43f2-b293-af02c5cb35a8 tempest-FixedIPsTestJson-2017152444 tempest-FixedIPsTestJson-2017152444] [instance: ac2c9a4a-1b07-43e1-8f4c-b75541331307] Error from last host: ubuntu-xenial-rax-ord-3453779 (node ubuntu-xenial-rax-ord-3453779): [u'Traceback (most recent call last):\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1778, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1973, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance ac2c9a4a-1b07-43e1-8f4c-b75541331307 was re-scheduled: operation failed: filter 'nova-no-nd-reflection' already exists with uuid 1f47eeb2-d473-481e-998a-c4d64a44ac5e\n"]
2016-08-13 00:53:27.676 15971 WARNING nova.scheduler.utils [req-e7b83619-01ae-43f2-b293-af02c5cb35a8 tempest-FixedIPsTestJson-2017152444 tempest-FixedIPsTestJson-2017152444] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 199, in inner
    return func(*args, **kwargs)

  File "/opt/stack/new/nova/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 74, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

2016-08-13 00:53:27.676 15971 WARNING nova.scheduler.utils [req-e7b83619-01ae-43f2-b293-af02c5cb35a8 tempest-FixedIPsTestJson-2017152444 tempest-FixedIPsTestJson-2017152444] [instance: ac2c9a4a-1b07-43e1-8f4c-b75541331307] Setting instance to ERROR state.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22was%20re-scheduled%3A%20operation%20failed%3A%20filter%20'nova-no-nd-reflection'%20already%20exists%20with%20uuid%5C%22%20AND%20tags%3A%5C%22screen-n-cond.txt%5C%22&from=7d

5 hits in 7 days, check queue only, but multiple changes.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Failed differently here:

http://logs.openstack.org/95/325895/13/gate/gate-tempest-dsvm-postgres-full-ubuntu-xenial/bb138b1/logs/screen-n-cond.txt.gz?level=TRACE

2016-08-12 19:46:39.498 12354 ERROR nova.scheduler.utils [req-efd43cdc-26fa-4391-a65c-687239446e74 tempest-FloatingIPsTestJSON-1799526450 tempest-FloatingIPsTestJSON-1799526450] [instance: ec8ea43e-b71a-48d1-8a71-826b1f5133a2] Error from last host: ubuntu-xenial-ovh-bhs1-3445846 (node ubuntu-xenial-ovh-bhs1-3445846): [u'Traceback (most recent call last):\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1778, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1973, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance ec8ea43e-b71a-48d1-8a71-826b1f5133a2 was re-scheduled: operation failed: filter 'nova-base' already exists with uuid 95b42da7-c704-4eca-bcd0-3d7c8168804c\n"]

Changed in nova:
status: New → Confirmed
tags: added: gate-failure
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (5.4 KiB)

Ah it's not a scheduler filter (which is what I thought was related to this), it's a libvirt firewall driver network filter:

http://logs.openstack.org/24/374324/1/gate/gate-tempest-dsvm-cells-ubuntu-xenial/81f45de/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-09-21_21_20_24_446

2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [req-462fe36b-6d85-4b74-9eab-4d2a0ea9c3d5 tempest-ListImageFiltersTestJSON-1517708761 tempest-ListImageFiltersTestJSON-1517708761] [instance: 9103de54-6d83-4877-8075-9807ec099778] Instance failed to spawn
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] Traceback (most recent call last):
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/compute/manager.py", line 2078, in _build_resources
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] yield resources
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] block_device_info=block_device_info)
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2583, in spawn
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] post_xml_callback=gen_confdrive)
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 4816, in _create_domain_and_network
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] network_info)
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/virt/libvirt/firewall.py", line 339, in setup_basic_filtering
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] self.nwfilter.setup_basic_filtering(instance, network_info)
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/virt/libvirt/firewall.py", line 122, in setup_basic_filtering
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] self._ensure_static_filters()
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/stack/new/nova/nova/virt/libvirt/firewall.py", line 225, in _ensure_static_filters
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] self._define_filter(self.nova_no_nd_reflection_filter())
2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] File "/opt/s...

Read more...

Changed in nova:
importance: Undecided → High
tags: added: firewall libvirt network
Revision history for this message
Matt Riedemann (mriedem) wrote :

I see that right before the failure we lookup the filter by name and get the uuid:

http://logs.openstack.org/24/374324/1/gate/gate-tempest-dsvm-cells-ubuntu-xenial/81f45de/logs/screen-n-cpu.txt.gz#_2016-09-21_21_20_24_441

2016-09-21 21:20:24.441 12207 DEBUG nova.virt.libvirt.firewall [req-462fe36b-6d85-4b74-9eab-4d2a0ea9c3d5 tempest-ListImageFiltersTestJSON-1517708761 tempest-ListImageFiltersTestJSON-1517708761] UUID for filter 'nova-no-nd-reflection' is '0e742215d36e4ebb8830ef09d513c9a1' _get_filter_uuid /opt/stack/new/nova/nova/virt/libvirt/firewall.py:256

And then 5ms after that it fails because the filter exists with some other uuid:

2016-09-21 21:20:24.446 12207 ERROR nova.compute.manager [instance: 9103de54-6d83-4877-8075-9807ec099778] libvirtError: operation failed: filter 'nova-no-nd-reflection' already exists with uuid e740c5ec-c715-4f73-9874-630cc73d4ac2

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah apparently this doesn't exist in libvirt 1.2.1:

https://github.com/libvirt/libvirt/blob/v1.2.1/src/conf/nwfilter_conf.c#L2979

Which is our minimum supported version, and ubuntu 14.04 runs with 1.2.2.

In ubuntu 16.04 we're running with libvirt 1.3.1 which shows the check and error:

https://github.com/libvirt/libvirt/blob/v1.3.1/src/conf/nwfilter_conf.c#L3099-L3101

            virReportError(VIR_ERR_OPERATION_FAILED,
                           _("filter '%s' already exists with uuid %s"),
                           def->name, uuidstr);

It's unfortunate that there isn't a specific error code, we'll have to check for VIR_ERR_OPERATION_FAILED and the message.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/374975

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/374975
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2ce193e16d7ca5b93671d7f0d2e3b35761f5d386
Submitter: Jenkins
Branch: master

commit 2ce193e16d7ca5b93671d7f0d2e3b35761f5d386
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 22 12:41:46 2016 -0400

    libvirt: ignore conflict when defining network filters

    We have a latent race in the libvirt firewall code when setting
    up static filters which is now an error with libvirt>=1.2.7,
    which is why we started seeing this in CI failures starting in
    newton which run on xenial nodes that have libvirt 1.3.1 (but
    didn't see it on trusty nodes with libvirt 1.2.2).

    Libvirt commit 46a811db0731cedaea0153fc223faa6096cee5b5 checks
    for an existing filter with the same name but a different uuid
    when defining network filters and raises an error if found. That
    was added in the 1.2.7 release.

    This change simply handles the error and ignores it so we don't
    fail to boot the instance.

    Unfortunately we don't have a specific error code from libvirt
    when this happens so the best we can do is compare the error
    message from the libvirt error which is only going to work for
    English locales because the error message from libvirt is
    translated.

    Change-Id: I161be26d605351f168e351d3ed3d308234346f6f
    Closes-Bug: #1612875

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/380645

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/380645
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c7a9998f9824e4e9508108429eefa86eadbc9f50
Submitter: Jenkins
Branch: stable/newton

commit c7a9998f9824e4e9508108429eefa86eadbc9f50
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 22 12:41:46 2016 -0400

    libvirt: ignore conflict when defining network filters

    We have a latent race in the libvirt firewall code when setting
    up static filters which is now an error with libvirt>=1.2.7,
    which is why we started seeing this in CI failures starting in
    newton which run on xenial nodes that have libvirt 1.3.1 (but
    didn't see it on trusty nodes with libvirt 1.2.2).

    Libvirt commit 46a811db0731cedaea0153fc223faa6096cee5b5 checks
    for an existing filter with the same name but a different uuid
    when defining network filters and raises an error if found. That
    was added in the 1.2.7 release.

    This change simply handles the error and ignores it so we don't
    fail to boot the instance.

    Unfortunately we don't have a specific error code from libvirt
    when this happens so the best we can do is compare the error
    message from the libvirt error which is only going to work for
    English locales because the error message from libvirt is
    translated.

    Change-Id: I161be26d605351f168e351d3ed3d308234346f6f
    Closes-Bug: #1612875
    (cherry picked from commit 2ce193e16d7ca5b93671d7f0d2e3b35761f5d386)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.1

This issue was fixed in the openstack/nova 14.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b1

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.