test_resize_volume_backed_server_confirm which fails randomly with Kernel panic

Bug #2039940 reported by yatin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
Unassigned
tempest
New
Undecided
Unassigned

Bug Description

tempest.api.compute.servers.test_server_actions.ServerActionsTestOtherA.test_resize_volume_backed_server_confirm fails randomly with:-

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection
    ssh.connect(self.host, port=self.port, username=self.username,
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect
    sock.connect(addr)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/decorators.py", line 106, in wrapper
    raise exc
  File "/opt/stack/tempest/tempest/lib/decorators.py", line 98, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 510, in test_resize_volume_backed_server_confirm
    linux_client.validate_authentication()
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 123, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 245, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 155, in _get_ssh_connection
    raise exceptions.SSHTimeout(host=self.host,
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.65 via SSH timed out.
User: cirros, Password: None

Guest console log says it's kernel panic:-
info: initramfs: up at 7.36
[ 8.403290] virtio_blk virtio2: [vda] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)
[ 8.440219] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 8.440726] GPT:229375 != 2097151
[ 8.440967] GPT:Alternate GPT header not at the end of the disk.
[ 8.441252] GPT:229375 != 2097151
[ 8.441503] GPT: Use GNU Parted to correct GPT errors.
[ 8.974064] virtio_gpu virtio0: [drm] drm_plane_enable_fb_damage_clips() not called
[ 9.068224] random: crng init done
currently loaded modules: 8021q 8139cp 8390 9pnet 9pnet_virtio ahci cec drm drm_kms_helper e1000 e1000e failover fb_sys_fops garp hid hid_generic ip6_udp_tunnel ip_tables isofs libahci libcrc32c llc mii mrp ne2k_pci net_failover nls_ascii nls_iso8859_1 nls_utf8 pcnet32 qemu_fw_cfg rc_core sctp stp syscopyarea sysfillrect sysimgblt udp_tunnel usbhid virtio_blk virtio_dma_buf virtio_gpu virtio_input virtio_net virtio_rng virtio_scsi x_tables
info: initramfs loading root from /dev/vda1
/sbin/init: can't load library 'libtirpc.so.3'
[ 11.288963] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00001000
[ 11.290203] CPU: 0 PID: 1 Comm: init Not tainted 5.15.0-71-generic #78-Ubuntu
[ 11.290952] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.15.0-1 04/01/2014
[ 11.291870] Call Trace:
[ 11.292973] <TASK>
[ 11.293458] show_stack+0x52/0x5c
[ 11.294280] dump_stack_lvl+0x4a/0x63
[ 11.294720] dump_stack+0x10/0x16
[ 11.295179] panic+0x15c/0x334
[ 11.295587] ? exit_to_user_mode_prepare+0x37/0xb0
[ 11.296118] do_exit.cold+0x15/0xa0
[ 11.296460] __x64_sys_exit+0x1b/0x20
[ 11.296880] do_syscall_64+0x5c/0xc0
[ 11.297283] ? ksys_write+0x67/0xf0
[ 11.297672] ? exit_to_user_mode_prepare+0x37/0xb0
[ 11.298172] ? syscall_exit_to_user_mode+0x27/0x50
[ 11.298683] ? __x64_sys_write+0x19/0x20
[ 11.299151] ? do_syscall_64+0x69/0xc0
[ 11.299644] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 11.300611] RIP: 0033:0x7f147a37555e
[ 11.301938] Code: 05 d7 2a 00 00 4c 89 f9 bf 02 00 00 00 48 8d 35 fb 0d 00 00 48 8b 10 31 c0 e8 50 d2 ff ff bf 10 00 00 00 b8 3c 00 00 00 0f 05 <48> 8d 15 f3 2a 00 00 f7 d8 89 02 48 83 ec 20 49 8b 8c 24 b8 00 00
[ 11.312012] RSP: 002b:00007fff85488500 EFLAGS: 00000207 ORIG_RAX: 000000000000003c
[ 11.318360] RAX: ffffffffffffffda RBX: 00007fff854897b0 RCX: 00007f147a37555e
[ 11.324215] RDX: 0000000000000002 RSI: 0000000000001000 RDI: 0000000000000010
[ 11.331344] RBP: 00007fff85489790 R08: 00007f147a36e000 R09: 00007f147a36e01a
[ 11.338406] R10: 0000000000000001 R11: 0000000000000207 R12: 00007f147a36f040
[ 11.347090] R13: 00000000004bae50 R14: 0000000000000000 R15: 0000000000403d66
[ 11.354220] </TASK>
[ 11.362227] Kernel Offset: 0x36400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 11.369248] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00001000 ]---

As per opensearch[1] there are 16 hits in last 12 days across multiple jobs in master/stable2023.2 branch.

Jobs:-
tempest-integrated-networking 31.3%
cinder-tempest-plugin-lvm-lio-barbican-fips 18.8%
cinder-tempest-plugin-lvm-lio-barbican 12.5%
tempest-integrated-storage 12.5%
nova-ceph-multistore 6.3%

Branches:-
master 93.8%
stable/2023.2 6.3%

Found an old issue in different test https://bugs.launchpad.net/tempest/+bug/1888224 but that mentioned was for arm64.

Example builds:-
- http://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ff8/898137/1/check/nova-ceph-multistore/ff880e4/testr_results.html
- https://d941f4f9ec28b33784a2-4d240979ebaa10fc274be1cd05b244a9.ssl.cf5.rackcdn.com/893025/3/check/tempest-integrated-networking/7ed5444/testr_results.html

[1] https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22load%20library%20!'libtirpc.so.3!'%22'),sort:!())

Revision history for this message
Brian Haley (brian-haley) wrote :

The last comment from the related bug is:

"It looks like rootfs image is corrupted."

Not sure there's anything neutron can do here.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.