Activity log for bug #1766543

Date Who What changed Old value New value Message
2018-04-24 10:00:13 Junien F bug added bug
2018-04-24 10:01:11 Junien F bug added subscriber The Canonical Sysadmins
2018-07-26 10:55:54 James Page bug task added linux (Ubuntu)
2018-07-26 10:56:02 James Page nova (Ubuntu): status New Incomplete
2018-07-26 10:56:06 James Page nova (Ubuntu): status Incomplete New
2018-07-26 10:56:09 James Page nova (Ubuntu): importance Undecided Medium
2018-07-26 11:00:05 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-07-26 13:30:13 Junien F tags apport-collected uec-images xenial
2018-07-26 13:30:16 Junien F description Hi, I have a cloud running xenial/mitaka (with 18.02 charms). Sometimes, an instance will take minutes to delete. I tracked down the time taken to be file deletion : Apr 23 07:23:00 hostname nova-compute[54255]: 2018-04-23 07:23:00.920 54255 INFO nova.virt.libvirt.driver [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056-869f-084b38580b9a] Deleting instance files /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.767 54255 INFO nova.virt.libvirt.driver [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056-869f-084b38580b9a] Deletion of /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del complete As you can see, 4 minutes and 33 seconds have elapsed between the 2 lines. nova-compute logs absolutely _nothing_ during this time. Periodic tasks are not run, etc... Generally, a deletion takes a few seconds top. The logs above are generally immediately followed by : Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.771 54255 DEBUG oslo.messaging._drivers.impl_rabbit [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] Received recoverable error from kombu: on_error /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:683 (which is error: [Errno 104] Connection reset by peer) because nova-compute doesn't even maintain the rabbitmq connection (on the rabbitmq server I can see errors about "Missed heartbeats from client, timeout: 60s"). So nova-compute appears to be "frozen" during several minutes. This can cause problems because events can be missed, etc... We have telegraf on this host, and there's little to no CPU, disk, network or memory activity at that time. Nothing relevant in kern.log either. And this is happening on 3 different architectures, so this is all very puzzling. Is nova-compute supposed to be totally stuck while deleting instance files ? Have you ever seen something similar ? I'm going to try to repro on queens. Thanks Hi, I have a cloud running xenial/mitaka (with 18.02 charms). Sometimes, an instance will take minutes to delete. I tracked down the time taken to be file deletion : Apr 23 07:23:00 hostname nova-compute[54255]: 2018-04-23 07:23:00.920 54255 INFO nova.virt.libvirt.driver [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056-869f-084b38580b9a] Deleting instance files /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.767 54255 INFO nova.virt.libvirt.driver [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056-869f-084b38580b9a] Deletion of /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del complete As you can see, 4 minutes and 33 seconds have elapsed between the 2 lines. nova-compute logs absolutely _nothing_ during this time. Periodic tasks are not run, etc... Generally, a deletion takes a few seconds top. The logs above are generally immediately followed by : Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.771 54255 DEBUG oslo.messaging._drivers.impl_rabbit [req-35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] Received recoverable error from kombu: on_error /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:683 (which is error: [Errno 104] Connection reset by peer) because nova-compute doesn't even maintain the rabbitmq connection (on the rabbitmq server I can see errors about "Missed heartbeats from client, timeout: 60s"). So nova-compute appears to be "frozen" during several minutes. This can cause problems because events can be missed, etc... We have telegraf on this host, and there's little to no CPU, disk, network or memory activity at that time. Nothing relevant in kern.log either. And this is happening on 3 different architectures, so this is all very puzzling. Is nova-compute supposed to be totally stuck while deleting instance files ? Have you ever seen something similar ? I'm going to try to repro on queens. Thanks --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Jun 23 14:23 seq crw-rw---- 1 root audio 116, 33 Jun 23 14:23 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.18 Architecture: ppc64el ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Package: nova (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=UUID=bf574d52-09d7-4c9d-beb8-ccc542d50654 ro console=tty0 console=ttyS1,115200 panic=30 raid=noautodetect ProcLoadAvg: 10.43 10.79 9.76 11/2237 15123 ProcSwaps: Filename Type Size Used Priority /swap.img file 8388544 0 -1 ProcVersion: Linux version 4.4.0-128-generic (buildd@bos02-ppc64el-001) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) ) #154-Ubuntu SMP Fri May 25 14:13:59 UTC 2018 ProcVersionSignature: Ubuntu 4.4.0-128.154-generic 4.4.131 RelatedPackageVersions: linux-restricted-modules-4.4.0-128-generic N/A linux-backports-modules-4.4.0-128-generic N/A linux-firmware 1.157.20 RfKill: Error: [Errno 2] No such file or directory Tags: xenial uec-images xenial uec-images xenial uec-images Uname: Linux 4.4.0-128-generic ppc64le UnreportableReason: The report belongs to a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False cpu_cores: Number of cores present = 20 cpu_coreson: Number of cores online = 20 cpu_dscr: DSCR is 0 cpu_freq: min: 1.201 GHz (cpu 48) max: 3.710 GHz (cpu 112) avg: 2.615 GHz cpu_runmode: Could not retrieve current diagnostics mode, No kernel interface to firmware cpu_smt: SMT is off
2018-07-26 13:30:17 Junien F attachment added .sys.firmware.opal.msglog.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168114/+files/.sys.firmware.opal.msglog.txt
2018-07-26 13:30:19 Junien F attachment added CRDA.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168115/+files/CRDA.txt
2018-07-26 13:30:21 Junien F attachment added CurrentDmesg.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168116/+files/CurrentDmesg.txt
2018-07-26 13:30:23 Junien F attachment added DeviceTree.tar.gz https://bugs.launchpad.net/bugs/1766543/+attachment/5168117/+files/DeviceTree.tar.gz
2018-07-26 13:30:24 Junien F attachment added HookError_generic.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168118/+files/HookError_generic.txt
2018-07-26 13:30:25 Junien F attachment added HookError_powerpc.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168119/+files/HookError_powerpc.txt
2018-07-26 13:30:26 Junien F attachment added Lspci.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168120/+files/Lspci.txt
2018-07-26 13:30:28 Junien F attachment added OpalElog.tar.gz https://bugs.launchpad.net/bugs/1766543/+attachment/5168121/+files/OpalElog.tar.gz
2018-07-26 13:30:29 Junien F attachment added ProcCpuinfoMinimal.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168122/+files/ProcCpuinfoMinimal.txt
2018-07-26 13:30:30 Junien F attachment added ProcInterrupts.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168123/+files/ProcInterrupts.txt
2018-07-26 13:30:32 Junien F attachment added ProcLocks.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168124/+files/ProcLocks.txt
2018-07-26 13:30:34 Junien F attachment added ProcMisc.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168125/+files/ProcMisc.txt
2018-07-26 13:30:35 Junien F attachment added ProcModules.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168126/+files/ProcModules.txt
2018-07-26 13:30:38 Junien F attachment added ProcPpc64.tar.gz https://bugs.launchpad.net/bugs/1766543/+attachment/5168127/+files/ProcPpc64.tar.gz
2018-07-26 13:30:40 Junien F attachment added UdevDb.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168128/+files/UdevDb.txt
2018-07-26 13:30:44 Junien F attachment added WifiSyslog.txt https://bugs.launchpad.net/bugs/1766543/+attachment/5168129/+files/WifiSyslog.txt
2018-07-26 13:30:45 Junien F attachment added nvram.gz https://bugs.launchpad.net/bugs/1766543/+attachment/5168130/+files/nvram.gz
2018-07-26 13:32:47 Junien F linux (Ubuntu): status Incomplete Confirmed
2023-12-19 12:16:14 WangJing bug added subscriber WangJing