Cleaning up ovnmeta namespace fails, unexpected exception in notify_loop

Bug #1858662 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
wes hayutin

Bug Description

Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (5.2 KiB)

{2} tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_list_migrations_in_flavor_resize_situation [306.507839s] ... FAILED

{2} tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor [306.310031s] ... FAILED

{1} setUpClass (tempest.api.compute.servers.test_create_server.ServersTestBootFromVolume) [0.000000s] ... FAILED

tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_revert_cold_migration [306.110744s] ... FAILED

{1} tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_attached_volume [605.951803s] ... FAILED

{2} tempest.api.compute.admin.test_volumes_negative.VolumesAdminNegativeTest.test_update_attached_volume_with_nonexistent_volume_in_body [602.707313s] ... FAILED

{1} tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state [305.738424s] ... FAILED

{2} tearDownClass (tempest.api.compute.admin.test_volumes_negative.VolumesAdminNegativeTest) [0.000000s] ... FAILED

{0} tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_attach_detach_volume [465.673337s] ... FAILED

{1} tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume [509.927364s] ... FAILED
{2} tempest.api.compute.servers.test_device_tagging.TaggedBootDevicesTest.test_tagged_boot_devices [629.952744s] ... FAILED

{0} tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments [919.333764s] ... FAILED

{2} tearDownClass (tempest.api.compute.servers.test_device_tagging.TaggedBootDevicesTest) [0.000000s] ... FAILED

{0} tearDownClass (tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON) [0.000000s] ... FAILED

{2} tempest.api.compute.servers.test_device_tagging.TaggedBootDevicesTest_v242.test_tagged_boot_devices [510.463829s] ... FAILED

{2} tearDownClass (tempest.api.compute.servers.test_device_tagging.TaggedBootDevicesTest_v242) [0.000000s] ... FAILED

{2} tempest.api.compute.servers.test_disk_config.ServerDiskConfigTestJSON.test_resize_server_from_auto_to_manual [306.839021s] ... FAILED

{2} tempest.api.compute.servers.test_disk_config.ServerDiskConfigTestJSON.test_resize_server_from_manual_to_auto [306.628151s] ... FAILED

{0} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [623.687822s] ... FAILED

{2} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_rebuild_server_with_volume_attached [393.756537s] ... FAILED

{2} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm [311.078329s] ... FAILED

{2} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped [669.624568s] ... FAILED

{2} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert [314.802816s] ... FAILED

{0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance [408.551256s] ... FAILED

{2} tempest.api.compute.servers.test_server_actions.ServerAc...

Read more...

wes hayutin (weshayutin)
description: updated
Revision history for this message
wes hayutin (weshayutin) wrote :

Noticed... there is an error when removing namespaces

https://sf.hosted.upshift.rdu2.redhat.com/logs/13/188813/1/check/periodic-tripleo-ci-centos-7-bm_envD-1ctlr_2comp-featureset020-master/76caba0/logs/overcloud-novacompute-1/var/log/containers/neutron/ovn-metadata-agent.log.txt.gz

2020-01-07 07:47:12.722 48153 INFO networking_ovn.agent.metadata.agent [-] Cleaning up ovnmeta-54a0f865-963b-4787-bd9b-2c0aae572824 namespace which is not needed anymore
2020-01-07 07:47:13.602 48153 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: : ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event [-] Unexpected exception in notify_loop: ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event Traceback (most recent call last):
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/ovsdbapp/event.py", line 143, in notify_loop
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event match.run(event, row, updates)
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 97, in run
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event self.agent.update_datapath(str(row.datapath.uuid))
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 308, in update_datapath
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event self.teardown_datapath(datapath)
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 282, in teardown_datapath
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event self._process_monitor, datapath, self.conf, namespace)
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/driver.py", line 209, in destroy_monitored_metadata_proxy
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event pm.disable()
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event utils.execute(cmd, run_as_root=self.run_as_root)
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event returncode=returncode)
2020-01-07 07:47:13.603 48153 ERROR ovsdbapp.event ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:

I don't see the same issue w/ http://logs.rdoproject.org/20/701120/7/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/ccc1ca9/logs/overcloud-novacompute-0/var/log/containers/neutron/ovn-metadata-agent.log.txt.gz

tags: added: promotion-blocker
removed: alert
Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (6.3 KiB)

See a very similar trace here:

http://logs.rdoproject.org/39/24339/1/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-master/aebe746/logs/overcloud-novacompute-0/var/log/containers/neutron/ovn-metadata-agent.log.txt.gz

2020-01-06 13:28:09.712 28498 INFO networking_ovn.agent.metadata.agent [-] Cleaning up ovnmeta-1261c640-2b8e-43f2-ac3a-d633fb54bd5f namespace which is not needed anymore
2020-01-06 13:28:09.713 28498 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'haproxy-kill', '9', '34656'] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:87
2020-01-06 13:28:10.046 28498 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(parent_port=[], virtual_parent=[], gateway_chassis=[], nat_addresses=[u'fa:16:3e:52:5c:70 10.0.0.104 is_chassis_resident("cr-lrp-4e03923a-96d3-4cfd-a668-f7a9e6fe5ebb")', u'fa:16:3e:52:5c:70 10.0.0.106 is_chassis_resident("cr-lrp-4e03923a-96d3-4cfd-a668-f7a9e6fe5ebb")'], ha_chassis_group=[], datapath=88e5cb3f-da14-467a-ba05-2785d000ad60, logical_port=4e03923a-96d3-4cfd-a668-f7a9e6fe5ebb, mac=[u'router'], chassis=[], encap=[], tunnel_key=4, external_ids={u'neutron:cidrs': u'10.0.0.104/24', u'neutron:revision_number': u'4', u'neutron:port_name': u'', u'neutron:network_name': u'neutron-b42e39c9-7570-4620-a936-d77a4e5af4d4', u'neutron:project_id': u'', u'neutron:security_group_ids': u'', u'neutron:device_id': u'a2f76863-f492-49fa-9c29-e18c32fb78da', u'neutron:device_owner': u'network:router_gateway'}, tag=[], type=patch, options={u'peer': u'lrp-4e03923a-96d3-4cfd-a668-f7a9e6fe5ebb'}) old=Port_Binding(nat_addresses=[u'fa:16:3e:52:5c:70 10.0.0.104 is_chassis_resident("cr-lrp-4e03923a-96d3-4cfd-a668-f7a9e6fe5ebb")', u'fa:16:3e:ff:9b:d7 10.0.0.106 is_chassis_resident("686dfa0a-d6f5-41ce-9528-0629e59b9a86")']) matches /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-06 13:28:10.076 28498 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(parent_port=[], virtual_parent=[], gateway_chassis=[], nat_addresses=[], ha_chassis_group=[], datapath=1261c640-2b8e-43f2-ac3a-d633fb54bd5f, logical_port=686dfa0a-d6f5-41ce-9528-0629e59b9a86, mac=[u'fa:16:3e:98:97:38 10.100.0.13'], chassis=[], encap=[], tunnel_key=3, external_ids={u'neutron:cidrs': u'10.100.0.13/28', u'neutron:revision_number': u'5', u'neutron:port_name': u'', u'neutron:network_name': u'neutron-222b4029-4994-49eb-8d5e-bfd63f7b8bf5', u'neutron:project_id': u'ade0025391c64541877be2812006d350', u'neutron:security_group_ids': u'78354527-2b7e-4428-b664-58103bac3c17', u'neutron:device_id': u'23694187-71ce-4f2d-a15a-4474f5b82e95', u'neutron:device_owner': u'compute:nova'}, tag=[], type=, options={u'requested-chassis': u'overcloud-novacompute-0.localdomain'}) old=Port_Binding(external_ids={u'neutron:cidrs': u'10.100.0.13/28', u'neutron:revision_number': u'4', u'neutron:port_name': u'', u'neutron:netw...

Read more...

summary: - master tempest fs020 timeout tracker
+ Cleaning up ovnmeta namespace fails, unexpected exception in notify_loop
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/701403
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=4a1526f8d48d5da7f812a70e7d2d407f7650380b
Submitter: Zuul
Branch: master

commit 4a1526f8d48d5da7f812a70e7d2d407f7650380b
Author: Wes Hayutin <email address hidden>
Date: Tue Jan 7 10:10:43 2020 -0700

    update master skip list, nova issues and timeout

    We'll need to update the bugs on this, but
    for now we need to get fs020 running w/o
    timing out. clearly there is an issue w/ nova

    Related-Bug: #1858662
    Change-Id: Ia30a3c07bbffc6d86539514b13aa258bd6eec8d8

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

I notice in http://logs.rdoproject.org/39/24339/1/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-master/aebe746/logs/overcloud-novacompute-0/var/log/extra/network-netns.gz there are lots of errors like:

==== (id: (4)====
### IPv4 addresses
Cannot open network namespace "(id:": No such file or directory
### IPv4 routing
Cannot open network namespace "(id:": No such file or directory
### IPTables (IPv4)
Cannot open network namespace "(id:": No such file or directory
==== (id: (6)====
### IPv6 addresses
Cannot open network namespace "(id:": No such file or directory
### IPv6 routing
Cannot open network namespace "(id:": No such file or directory
### IPTables (IPv6)
Cannot open network namespace "(id:": No such file or directory
Cannot open network namespace: No such file or directory

==== 13) (4)====
### IPv4 addresses
Cannot open network namespace "13)": No such file or directory
### IPv4 routing
Cannot open network namespace "13)": No such file or directory
### IPTables (IPv4)
Cannot open network namespace "13)": No such file or directory
==== 13) (6)====
### IPv6 addresses
Cannot open network namespace "13)": No such file or directory
### IPv6 routing
Cannot open network namespace "13)": No such file or directory
### IPTables (IPv6)
Cannot open network namespace "13)": No such file or directory
Cannot open network namespace: No such file or directory

And given the ip netns output:
[root@overcloud-novacompute-0 log]# ip netns
ovnmeta-532f3f45-9736-49eb-a639-69ffc102b844 (id: 28)
ovnmeta-00f1199e-e176-4c1f-a3b1-e1c447dcd18a (id: 27)
ovnmeta-f024f2d6-5225-4030-9b83-3f252b0da7d9 (id: 26)
ovnmeta-f5feb64b-b570-482a-bae1-306730be9e20 (id: 25)
ovnmeta-bfba06ea-7894-4711-9ef4-f44fbcbcedcf (id: 24)
ovnmeta-e11c7d50-1cc3-4fbf-9da7-5c43ad364a4f (id: 23)
ovnmeta-97672eea-c4b4-45f6-975a-7c456048087a (id: 22)
ovnmeta-fe89fb3f-9d21-40f7-8213-88b0088258a4 (id: 21)
ovnmeta-5ce826a6-ec38-4230-8315-601ec9600043 (id: 20)
ovnmeta-7b034bdb-33b7-4ccc-aa1e-4f9ff98dadf3 (id: 19)
ovnmeta-52a3234d-e42c-474d-bf33-cd7a348494a8 (id: 18)
ovnmeta-f91833b4-61e5-4eec-9698-f01449308945 (id: 17)
ovnmeta-cc202d9f-7bfa-4cab-94cc-b38f898da4f6 (id: 16)
ovnmeta-0cb77ba7-8cfe-4c79-b07e-e189a0e830e2 (id: 15)
ovnmeta-07a11d99-8833-4842-a0f2-f27c54df35c3 (id: 14)
ovnmeta-3cb1a989-9a73-404d-a7ab-bb0f9e26f8bf (id: 13)
ovnmeta-f367932a-4133-47f7-901f-e1219709a86a (id: 12)
ovnmeta-65ff0f88-7e7e-4578-bd69-85bbe579ebfa (id: 11)
ovnmeta-546bf3a9-9402-4358-abf4-65ac0b70f449 (id: 10)
ovnmeta-ffc03b2c-0fc6-4598-a7d3-726b328018a6 (id: 9)
ovnmeta-9323e798-6121-43f7-b144-9df959363bad (id: 8)
ovnmeta-ef27ec0b-2caa-4baf-8cd8-9a98b8aeac54 (id: 7)
ovnmeta-6f6f7eab-7929-4640-ae41-548ed9d97ace (id: 6)
ovnmeta-3a164299-ae87-4797-a865-494cf363b56f (id: 5)
ovnmeta-6f2e843a-340c-451e-8fd2-ef239dd0f5fa (id: 4)
ovnmeta-f605cc9d-3f1c-4dca-b8f4-da5ffa5d9324 (id: 3)
ovnmeta-4096de67-4661-456e-bbc7-8fb6eab305cf (id: 2)
ovnmeta-8cf48f2e-99ea-4154-9232-5a9a4b0e0c57 (id: 1)
ovnmeta-729007f4-c6d1-4414-b465-c2621b544163 (id: 0)

It looks like something is parsing the output of 'ip netns' incorrectly. I don't remember seeing the (id: nn) previously, is this something that has changed recently?

Changed in tripleo:
milestone: ussuri-1 → ussuri-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/702269

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/702269
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=17d97f2618e56be606c9307551efdaadda8dff69
Submitter: Zuul
Branch: master

commit 17d97f2618e56be606c9307551efdaadda8dff69
Author: Brent Eagles <email address hidden>
Date: Mon Jan 13 13:56:08 2020 -0330

    Remove --rm=true from sidecar container sync

    Neutron uses kill-scripts which remove the container after stopping it.
    If the container is launched with docker and --rm=true, the container
    will automatically be cleaned up and the $(CLI) rm <container id> in the
    kill script with error out because the container can't be found.

    Related-Bug: #1858662
    Change-Id: I3d7940cb0816adce58e0fa778469dcec95302f67

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/703936

Revision history for this message
wes hayutin (weshayutin) wrote :

working from 1861296 now.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/706379

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Chandan Kumar (raukadah) (<email address hidden>) on branch: master
Review: https://review.opendev.org/703936
Reason: Got removed with this revert https://review.opendev.org/#/c/704962/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/706379
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=60342041a3eb2c091916e36e4bf66199937db335
Submitter: Zuul
Branch: stable/train

commit 60342041a3eb2c091916e36e4bf66199937db335
Author: Alex Schultz <email address hidden>
Date: Thu Nov 7 16:06:54 2019 -0700

    [TRAIN] Backport tripleo-systemd-wrapper (squash)

    This is a combination of 4 commits.

    This is the 1st commit message:

    Implement tripleo-systemd-wrapper role

    This patch adds a new role that will be used to manage side containers
    with systemd instead of docker.socket or nsenter. The main use case here
    is Neutron, although this role is designed to work with any service.

    This role will create a series of systemd files to monitor a file which
    gets mounted into a container. Additionally a wrapper script is
    generated which is mounted in the container that will provide the
    arguments that should be used to launch new containers.

    Blueprint: safe-side-containers
    Change-Id: I4821b7ca0260e4dfd1717ba976cef700d160f84f
    Co-Authored-By: Dan Prince <email address hidden>
    Co-Authored-By: Emilien Macchi <email address hidden>
    Co-Authored-By: Alex Schultz <email address hidden>
    (cherry picked from commit 699249f1790dd5646556173bf5331e7e71135ad4)

    This is the commit message #2:

    Remove --rm=true from sidecar container sync

    Neutron uses kill-scripts which remove the container after stopping it.
    If the container is launched with docker and --rm=true, the container
    will automatically be cleaned up and the $(CLI) rm <container id> in the
    kill script with error out because the container can't be found.

    Related-Bug: #1858662
    Change-Id: I3d7940cb0816adce58e0fa778469dcec95302f67
    (cherry picked from commit 17d97f2618e56be606c9307551efdaadda8dff69)

    This is the commit message #3:

    Fix substitution in kill-script

    In the kill-script there is a string "Unknown action ${SIG} for
    ${$CT_NAME} ${CT_ID}" which results in a "bad substitution" error, as
    there is no variable named with what the contents of the CT_NAME
    environment variable contains. Remove the extraneous '$'.

    Change-Id: I4c76071083bf5cb4f876d3b78c379822a8bd8db1
    Fixes-Bug: #1860155
    (cherry picked from commit b45d4c6d219e8e27219bca341acdfd634155d6f6)

    This is the commit message #4:

    Add handling of signal 15 in kill script

    The reason bug #1860155 was triggered was because the kill script did
    not have a stanza for handling the signal that was passed in, which is
    signal 15. Since signal 15 is unhandled, keepalived processes will
    still stick around. Add handling for signal 15.

    Change-Id: I632a3ef5ec137df10f647335f6354589c2316fd0
    Related-bug: #1860155
    (cherry picked from commit 06dc258a28784db98e44a3de488c204f01b97613)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.