periodic fedora28 standalone job failing at test_volume_boot_pattern

Bug #1824315 reported by Quique Llorente
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
tripleo
Invalid
Critical
Quique Llorente

Bug Description

From tempest

http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/04caef1/logs/tempest.html.gz
    raise value
  File "/usr/lib/python3.6/site-packages/tempest/common/compute.py", line 236, in create_test_server
    clients.servers_client, server['id'], wait_until)
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 76, in wait_for_server_status
    server_id=server_id)
tempest.exceptions.BuildErrorException: Server aa994612-f431-4e90-98ea-a993b5c1ab5c failed to build and is in ERROR status
Details: {'code': 500, 'created': '2019-04-11T07:29:27Z', 'message': 'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance aa994612-f431-4e90-98ea-a993b5c1ab5c.'}

And from nova-compute

http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/04caef1/logs/undercloud/var/log/containers/nova/nova-compute.log.txt.gz
</domain>
: libvirt.libvirtError: internal error: Unable to add port tap02d12a34-c4 to OVS bridge br-int
2019-04-11 07:29:22.833 9 ERROR nova.virt.libvirt.driver [req-d91a391f-1040-4522-b6b2-9720e72fdfdb 492b60374c184ae794fc48e140e80da4 f3efacfd7e50440882a661dc5987d8f4 - default default] [instance: aa994612-f431-4e90-98ea-a993b5c1ab5c] Failed to start libvirt guest: libvirt.libvirtError: internal error: Unable to add port tap02d12a34-c4 to OVS bridge br-int
2019-04-11 07:29:22.834 9 DEBUG nova.virt.libvirt.vif [req-d91a391f-1040-4522-b6b2-9720e72fdfdb 492b60374c184ae794fc48e140e80da4 f3efacfd7e50440882a661dc5987d8f4 - default default] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=False,availability_zone='nova',cell_name=None,cleaned=False,config_drive='',created_at=2019-04-11T07:29:05Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=None,disable_terminate=False,display_description='tempest-TestVolumeBootPattern-server-2018261125',display_name='tempest-

at "oc" ovs

http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/04caef1/logs/undercloud/var/log/containers/openvswitch/ovn-controller.log.txt.gz

019-04-11T07:28:48.414Z|00153|pinctrl|INFO|DHCPOFFER fa:16:3e:27:14:2d 10.100.0.4
2019-04-11T07:28:48.421Z|00154|pinctrl|INFO|DHCPACK fa:16:3e:27:14:2d 10.100.0.4
2019-04-11T07:28:58.874Z|00155|binding|INFO|Releasing lport a9085a12-aa3d-4345-9ce2-ba774084b5aa from this chassis.
2019-04-11T07:29:02.342Z|00156|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connection closed by peer
2019-04-11T07:29:02.342Z|00157|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connection closed by peer
2019-04-11T07:29:03.197Z|00158|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2019-04-11T07:29:03.197Z|00159|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (No such file or directory)
2019-04-11T07:29:03.197Z|00160|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: waiting 2 seconds before reconnect
2019-04-11T07:29:03.197Z|00161|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2019-04-11T07:29:03.197Z|00162|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (No such file or directory)
2019-04-11T07:29:03.197Z|00163|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: waiting 2 seconds before reconnect
2019-04-11T07:29:05.198Z|00164|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
201

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Quique Llorente (quiquell) wrote :
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Quique Llorente (quiquell) wrote :

Looks like is working now and upstream is working too let's close this.

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
chandan kumar (chkumar246) wrote :

Is it possible to track specific tests in sova?

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Changed in tripleo:
status: Fix Released → In Progress
wes hayutin (weshayutin)
tags: added: tempest
Revision history for this message
Arx Cruz (arxcruz) wrote :

Found this on http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/36994ee/logs/undercloud/var/log/containers/nova/nova-api.log.txt.gz:

2019-04-16 07:16:09.070 20 INFO nova.compute.api [req-83e26ba8-f8ac-4b9b-9608-3c2aa7e9672d abaf4efd50ed493db1a49ce33d44717a 59ede4886038422f9d6c0cd324db4545 - default default] [instance: 3e7e8f96-9bb0-4d7f-b394-14bd3fb4ca17] Skipping quiescing instance: QEMU guest agent is not enabled
Traceback (most recent call last):

  File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 229, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
    function_name, call_dict, binary, tb)

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value

  File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
    return f(self, context, *args, **kw)

  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8491, in quiesce_instance
    self.driver.quiesce(context, instance, image_meta)

  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2202, in quiesce
    self._set_quiesced(context, instance, image_meta, True)

  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2181, in _set_quiesced
    self._can_quiesce(instance, image_meta)

  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2175, in _can_quiesce
    raise exception.QemuGuestAgentNotEnabled()

nova.exception.QemuGuestAgentNotEnabled: QEMU guest agent is not enabled

Revision history for this message
Arx Cruz (arxcruz) wrote :

Here's what I found:

There's a qemu error in [1] with the following:

2019-04-16 07:16:09.070 20 INFO nova.compute.api [req-83e26ba8-f8ac-4b9b-9608-3c2aa7e9672d abaf4efd50ed493db1a49ce33d44717a 59ede4886038422f9d6c0cd324db4545 - default default] [instance: 3e7e8f96-9bb0-4d7f-b394-14bd3fb4ca17] Skipping quiescing instance: QEMU guest agent is not enabled
Traceback (most recent call last):

File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 229, in inner
return func(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
function_name, call_dict, binary, tb)

File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in exit
self.force_reraise()

File "/usr/lib/python3.6/site-packages/osloutils/excutils.py", line 196, in force_reraise
six.reraise(self.type, self.value, self.tb)

File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value

File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
return f(self, context, *args, **kw)

File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8491, in quiesce_instance
self.driver.quiesce(context, instance, image_meta)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2202, in quiesce
self._set_quiesced(context, instance, image_meta, True)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2181, in _set_quiesced
self._can_quiesce(instance, image_meta)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2175, in _can_quiesce
raise exception.QemuGuestAgentNotEnabled()

nova.exception.QemuGuestAgentNotEnabled: QEMU guest agent is not enabled

There's no qemu-agent package installed on the undercloud as pointed in [2], so I believe it's installed in the container.

Checking nova driver code at [3] I notice this failure is due the lack of image meta property hw_qemu_guest_agent.

So, I believe we are going to need some nova expert here to help us because validate-tempest role that execute tempest, create the cirros image in the same way, independent of distro, so I don't think it's related to how tempest is uploading the cirros image to glance, and this is missing.
I also notice that on this particular job, we are running 3 other scenarions, that are creating vm's with the same cirros image, and tests are passing...

1 - http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/36994ee/logs/undercloud/var/log/containers/nova/nova-api.log.txt.gz
2 - http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-fedora-28-standalone-master/36994ee/logs/undercloud/var/log/extra/rpm-list.txt.gz
3 - https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2175

no longer affects: neutron
Changed in tripleo:
milestone: stein-rc1 → train-1
Revision history for this message
wes hayutin (weshayutin) wrote :

Arx, I don't see that that qemu-guest-agent rpm is installed on the standalone deployment.
That is something we can do in the os-temepst role itself or via adding qemu-guest-agent as part of the deployment.

Revision history for this message
wes hayutin (weshayutin) wrote :
wes hayutin (weshayutin)
Changed in tripleo:
status: In Progress → Invalid
wes hayutin (weshayutin)
Changed in nova:
status: New → Invalid
Revision history for this message
melanie witt (melwitt) wrote :

Just to add info to this bug report, the traceback in the logs:

2019-04-16 07:16:09.070 20 INFO nova.compute.api [req-83e26ba8-f8ac-4b9b-9608-3c2aa7e9672d abaf4efd50ed493db1a49ce33d44717a 59ede4886038422f9d6c0cd324db4545 - default default] [instance: 3e7e8f96-9bb0-4d7f-b394-14bd3fb4ca17] Skipping quiescing instance: QEMU guest agent is not enabled
Traceback (most recent call last):

File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 229, in inner
return func(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
function_name, call_dict, binary, tb)

File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in exit
self.force_reraise()

File "/usr/lib/python3.6/site-packages/osloutils/excutils.py", line 196, in force_reraise
six.reraise(self.type, self.value, self.tb)

File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value

File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
return f(self, context, *args, **kw)

File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8491, in quiesce_instance
self.driver.quiesce(context, instance, image_meta)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2202, in quiesce
self._set_quiesced(context, instance, image_meta, True)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2181, in _set_quiesced
self._can_quiesce(instance, image_meta)

File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2175, in _can_quiesce
raise exception.QemuGuestAgentNotEnabled()

nova.exception.QemuGuestAgentNotEnabled: QEMU guest agent is not enabled

is not actually related to the job failure. The nova code will quiesce the server if it can before taking the volume snapshot, but if the qemu quest agent is not available, it will skip it and move on.

We should not be logging the traceback in nova, but it's happening because the exception is coming to nova-api remotely over RPC and the oslo.messaging code will automatically append the traceback to the exception object 'message' attribute, and we're currently logging 'LOG.info(..., exp)' there and thus inadvertently logging the traceback as well.

I'll propose a nova patch to stop logging the traceback, to help prevent future confusion.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/659374

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/659374
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=66070416ab670daf4734f2c30ad2df5d96e615e6
Submitter: Zuul
Branch: master

commit 66070416ab670daf4734f2c30ad2df5d96e615e6
Author: melanie witt <email address hidden>
Date: Wed May 15 20:51:41 2019 +0000

    Stop logging traceback when skipping quiesce

    During snapshot of a volume-backed instance, we attempt to quiesce the
    instance before doing the snapshot. If quiesce is not supported or the
    qemu guest agent is not enabled, we will skip the quiesce and move on
    to the snapshot.

    Because quiesce is a call to nova-compute over RPC, when the libvirt
    driver raises QemuGuestAgentNotEnabled, oslo.messaging will append the
    full traceback to the exception message [1] for the remote caller. So,
    a LOG.info(..., exp) log of the exception object will result in a log
    of the full traceback. Logging of the full traceback causes confusion
    for those debugging CI failures.

    We would rather not log the full traceback in this case where we are
    catching the exception and emitting an INFO message, so we should use
    exp.format_message() instead of oslo.messaging's __str__ override.

    [1] https://github.com/openstack/oslo.messaging/blob/40c25c2/oslo_messaging/_drivers/common.py#L212

    Related-Bug: #1824315

    Change-Id: Ibfedcb8814437c53081f5a2993ab84b25d73e557

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/660325

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/660327

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/660328

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/660325
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f7b1031a9b80d15052fcd9b22831f1db0486a4a
Submitter: Zuul
Branch: stable/stein

commit 2f7b1031a9b80d15052fcd9b22831f1db0486a4a
Author: melanie witt <email address hidden>
Date: Wed May 15 20:51:41 2019 +0000

    Stop logging traceback when skipping quiesce

    During snapshot of a volume-backed instance, we attempt to quiesce the
    instance before doing the snapshot. If quiesce is not supported or the
    qemu guest agent is not enabled, we will skip the quiesce and move on
    to the snapshot.

    Because quiesce is a call to nova-compute over RPC, when the libvirt
    driver raises QemuGuestAgentNotEnabled, oslo.messaging will append the
    full traceback to the exception message [1] for the remote caller. So,
    a LOG.info(..., exp) log of the exception object will result in a log
    of the full traceback. Logging of the full traceback causes confusion
    for those debugging CI failures.

    We would rather not log the full traceback in this case where we are
    catching the exception and emitting an INFO message, so we should use
    exp.format_message() instead of oslo.messaging's __str__ override.

    [1] https://github.com/openstack/oslo.messaging/blob/40c25c2/oslo_messaging/_drivers/common.py#L212

    Related-Bug: #1824315

    Change-Id: Ibfedcb8814437c53081f5a2993ab84b25d73e557
    (cherry picked from commit 66070416ab670daf4734f2c30ad2df5d96e615e6)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/660327
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9caf7d9c7423a658c837aaedb7c4fb3580bda47
Submitter: Zuul
Branch: stable/rocky

commit b9caf7d9c7423a658c837aaedb7c4fb3580bda47
Author: melanie witt <email address hidden>
Date: Wed May 15 20:51:41 2019 +0000

    Stop logging traceback when skipping quiesce

    During snapshot of a volume-backed instance, we attempt to quiesce the
    instance before doing the snapshot. If quiesce is not supported or the
    qemu guest agent is not enabled, we will skip the quiesce and move on
    to the snapshot.

    Because quiesce is a call to nova-compute over RPC, when the libvirt
    driver raises QemuGuestAgentNotEnabled, oslo.messaging will append the
    full traceback to the exception message [1] for the remote caller. So,
    a LOG.info(..., exp) log of the exception object will result in a log
    of the full traceback. Logging of the full traceback causes confusion
    for those debugging CI failures.

    We would rather not log the full traceback in this case where we are
    catching the exception and emitting an INFO message, so we should use
    exp.format_message() instead of oslo.messaging's __str__ override.

    [1] https://github.com/openstack/oslo.messaging/blob/40c25c2/oslo_messaging/_drivers/common.py#L212

    Related-Bug: #1824315

    Change-Id: Ibfedcb8814437c53081f5a2993ab84b25d73e557
    (cherry picked from commit 66070416ab670daf4734f2c30ad2df5d96e615e6)
    (cherry picked from commit 2f7b1031a9b80d15052fcd9b22831f1db0486a4a)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/660328
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0cad3089d3e84e8e3436e171d00c8cce8717fd6d
Submitter: Zuul
Branch: stable/queens

commit 0cad3089d3e84e8e3436e171d00c8cce8717fd6d
Author: melanie witt <email address hidden>
Date: Wed May 15 20:51:41 2019 +0000

    Stop logging traceback when skipping quiesce

    During snapshot of a volume-backed instance, we attempt to quiesce the
    instance before doing the snapshot. If quiesce is not supported or the
    qemu guest agent is not enabled, we will skip the quiesce and move on
    to the snapshot.

    Because quiesce is a call to nova-compute over RPC, when the libvirt
    driver raises QemuGuestAgentNotEnabled, oslo.messaging will append the
    full traceback to the exception message [1] for the remote caller. So,
    a LOG.info(..., exp) log of the exception object will result in a log
    of the full traceback. Logging of the full traceback causes confusion
    for those debugging CI failures.

    We would rather not log the full traceback in this case where we are
    catching the exception and emitting an INFO message, so we should use
    exp.format_message() instead of oslo.messaging's __str__ override.

    [1] https://github.com/openstack/oslo.messaging/blob/40c25c2/oslo_messaging/_drivers/common.py#L212

    Related-Bug: #1824315

    Change-Id: Ibfedcb8814437c53081f5a2993ab84b25d73e557
    (cherry picked from commit 66070416ab670daf4734f2c30ad2df5d96e615e6)
    (cherry picked from commit 2f7b1031a9b80d15052fcd9b22831f1db0486a4a)
    (cherry picked from commit b9caf7d9c7423a658c837aaedb7c4fb3580bda47)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.