libvirt.libvirtError during rally and tempest test

Bug #1903564 reported by Marian Gasparovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
New
Undecided
Unassigned

Bug Description

During rally, one nova.boot_servers test failed with VM in error state

nova-compute.log shows

2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [req-1d411d0a-901d-4da7-aeb4-13dc3564b00a ecbb2a627ba746498595647250bb7382 2b074ed301f64f3f8ce8a87f8a533559 - 168a475fb94a45b6ad7da21456c22ebe 168a475fb94a45b6ad7da21456c22ebe] [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-11-07T05:02:46.474305Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: Failed to create PTY: No such file or directory
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] Traceback (most recent call last):
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2614, in _build_resources
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] yield resources
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2374, in _build_and_run_instance
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] self.driver.spawn(context, instance, image_meta,
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3657, in spawn
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] self._create_domain_and_network(
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6557, in _create_domain_and_network
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] self._cleanup_failed_start(
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] self.force_reraise()
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] six.reraise(self.type_, self.value, self.tb)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] raise value
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6527, in _create_domain_and_network
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] guest = self._create_domain(
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6465, in _create_domain
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] guest.launch(pause=pause)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 140, in launch
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] LOG.error('Error launching a defined domain '
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] self.force_reraise()
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] six.reraise(self.type_, self.value, self.tb)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] raise value
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 137, in launch
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] return self._domain.createWithFlags(flags)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] rv = execute(f, *args, **kwargs)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] six.reraise(c, e, tb)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] raise value
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] rv = meth(*args, **kwargs)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-11-07T05:02:46.474305Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: Failed to create PTY: No such file or directory
2020-11-07 05:02:46.623 1406544 ERROR nova.compute.manager [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce]
2020-11-07 05:02:46.635 1406544 INFO nova.compute.manager [req-1d411d0a-901d-4da7-aeb4-13dc3564b00a ecbb2a627ba746498595647250bb7382 2b074ed301f64f3f8ce8a87f8a533559 - 168a475fb94a45b6ad7da21456c22ebe 168a475fb94a45b6ad7da21456c22ebe] [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] Terminating instance
2020-11-07 05:02:47.523 1406544 INFO nova.virt.libvirt.driver [-] [instance: 39bc0886-6e16-4bd9-879c-2c058d3838ce] Instance destroyed successfully.

logs: https://oil-jenkins.canonical.com/artifacts/d091d501-2a03-4a68-973c-136f44c665f7/index.html

Revision history for this message
Michael Skalka (mskalka) wrote :
Download full text (7.8 KiB)

Saw this again during an octavia rally test, more info here: https://solutions.qa.canonical.com/testruns/testRun/abd5c68e-8383-496c-ae11-4ffeb8170598
Crashdump here: https://oil-jenkins.canonical.com/artifacts/abd5c68e-8383-496c-ae11-4ffeb8170598/generated/generated/openstack/juju-crashdump-openstack-2021-04-15-15.46.59.tar.gz

- amphora is launched
- nova-compute on machine 3 fails to launch the instance due to:
  2021-04-15T14:56:07.299919Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on: Fail
ed to create PTY: No such file or directory

- nova-conductor barfs, throws a stack trace at octavia-worker
- octavia-worker fails due to json decode error trying to parse the stack trace
- amphora state set to error, test fails

there is some noise in syslog but I don't know enough about the internals of qemu/libvirt to point the finder:

Apr 15 14:56:07 mudkip libvirtd[146579]: Unable to read from monitor: Connection reset by peer
Apr 15 14:56:07 mudkip libvirtd[146579]: internal error: qemu unexpectedly closed the monitor: 2021-04-15T14:56:07.299919Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on: Failed to create PTY: No such file or directory
Apr 15 14:56:07 mudkip networkd-dispatcher[150298]: #033[0;1;31mInterface "tap88374789-e5" not found.#033[0m
Apr 15 14:56:07 mudkip systemd[1]: networkd-dispatcher.service: Got notification message from PID 150298, but reception only permitted for main PID 1202
Apr 15 14:56:07 mudkip dnsmasq[5319]: reading /etc/resolv.conf
Apr 15 14:56:07 mudkip dnsmasq[5319]: using local addresses only for domain lxd
Apr 15 14:56:07 mudkip libvirtd[146579]: internal error: process exited while connecting to monitor: 2021-04-15T14:56:07.299919Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on: Failed to create PTY: No such file or directory
Apr 15 14:56:07 mudkip dnsmasq[5319]: using nameserver 127.0.0.53#53
Apr 15 14:56:07 mudkip networkd-dispatcher[1202]: ERROR:Failed to get interface "tap88374789-e5" status: Command '['/usr/bin/networkctl', 'status', '--no-pager', '--no-legend', '--', 'tap88374789-e5']' returned non-zero exit status 1.
Apr 15 14:56:07 mudkip libvirtd[146579]: cannot open /dev/null: Operation not permitted
Apr 15 14:56:07 mudkip libvirtd[146579]: internal error: could not remove profile for 'libvirt-cc54af1f-10d7-46b4-9646-9e0dab8c7e96'
Apr 15 14:56:07 mudkip libvirtd[146579]: cannot open /dev/null: Operation not permitted
Apr 15 14:56:07 mudkip libvirtd[146579]: internal error: Unable to delete port tap7d4f67b1-5c from OVS
Apr 15 14:56:07 mudkip systemd[1]: machine-qemu\x2d10\x2dinstance\x2d0000001a.scope: Succeeded.
Apr 15 14:56:07 mudkip libvirtd[146579]: Unable to remove /sys/fs/cgroup/cpu,cpuacct/system.slice/libvirtd.service/ (16)
Apr 15 14:56:07 mudkip libvirtd[146579]: Unable to remove /sys/fs/cgroup/cpu,cpuacct/system.slice/libvirtd.service/ (16)
Apr 15 14:56:07 mudkip libvirtd[146579]: Unable to remove /sys/fs/cgroup/memory/system.slice/libvirtd.service/ (16)
Apr 15 14:56:07 mudkip libvirtd[146579]: Unable to remove /sys/fs/cgroup/devices/system.slice/libvirtd.service/ (16)
Apr 15 14:56:07 mudki...

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Do we know whether this was a config error or infra/other error? It's just that the tests have later passed so something may/must have changed? Thanks.

Changed in charm-nova-compute:
status: New → Incomplete
Revision history for this message
Michael Skalka (mskalka) wrote :

Alex,

Could you clarify what you're asking here? We don't believe this to be a config issue, we were able to successfully launch instances during the preceding rally and tempest tests and this is not isolated to a single node in our lab.

For example we've seen this same behavior again here: https://solutions.qa.canonical.com/testruns/testRun/ef058cac-e956-4bb4-999c-5c136679eb77

Crashdump from that run is here: https://oil-jenkins.canonical.com/artifacts/ef058cac-e956-4bb4-999c-5c136679eb77/generated/generated/openstack/juju-crashdump-openstack-2021-05-11-23.06.24.tar.gz

And the failure itself is on node 0's libvirt:

0/baremetal/var/log/syslog:
...
May 11 23:00:43 azurill libvirtd[136141]: Unable to read from monitor: Connection reset by peer
May 11 23:00:43 azurill libvirtd[136141]: internal error: qemu unexpectedly closed the monitor: 2021-05-11T23:00:43.189709Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on: Failed to create PTY: No such file or directory
May 11 23:00:43 azurill libvirtd[136141]: cannot open /dev/null: Operation not permitted
May 11 23:00:43 azurill libvirtd[136141]: internal error: could not remove profile for 'libvirt-ac16a13c-3364-4523-a2f7-59744ef89308'
May 11 23:00:43 azurill libvirtd[136141]: cannot open /dev/null: Operation not permitted
May 11 23:00:43 azurill libvirtd[136141]: internal error: Unable to delete port tapee69c0b3-a1 from OVS
...

Changed in charm-nova-compute:
status: Incomplete → New
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Michael

Thanks for clarifying. I thought we'd had some infra issues leading to odd failures, and wondered if it was something to do with that, but clearly this is something else. Thanks.

summary: - libvirt.libvirtError during rally test
+ libvirt.libvirtError during rally and tempest test
tags: added: cdo-tempest
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

This has been coming up again on cis-hardening runs:
https://solutions.qa.canonical.com/bugs/bugs/bug/1903564

E.g:
https://oil-jenkins.canonical.com/artifacts/8d379bc3-e6aa-418f-b193-9db714cd5a70/generated/generated/openstack/tempest_result.html

Crashdumps and configs are here:
https://oil-jenkins.canonical.com/artifacts/8d379bc3-e6aa-418f-b193-9db714cd5a70/index.html

In the nova compute logs we see:
```
 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
<very long traceback>
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
 ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs)
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
 ERROR oslo_messaging.rpc.server six.reraise(c, e, tb)
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
 ERROR oslo_messaging.rpc.server raise value
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
 ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs)
 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
 ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
 ERROR oslo_messaging.rpc.server libvirt.libvirtError: Cannot access storage file '/var/lib/nova/instances/4b1de57e-f79b-45ce-bca3-2c8bb69b38bf/disk' (as uid:64055, gid:108): Permission denied
```

Neither of the skus that are running into this have had passing runs in the last 14 days, I suppose it may be a bug in the cis-hardening rules. Let me know what you think.

Revision history for this message
Jeffrey Chang (modern911) wrote :

Again with 10 occurrence in SQA tests runs of fcb-proposed-yoga-focal-cis_hardening sku.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I think this is probably fixed in our zed package via https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1967956. We only provided hardening fixes in zed this past release.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Here are some backports of the fixes in the following PPAs if you would like to test with them. This isn't a guarantee that these will be kept updated or SRU'd to our official yoga packages. For the focal PPA below, you would have to install that along with the yoga cloud archive.

https://launchpad.net/~corey.bryant/+archive/ubuntu/focal-yoga-cis
https://launchpad.net/~corey.bryant/+archive/ubuntu/jammy-yoga-cis

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.