Horizon got crashed after few hours of ideal installation

Bug #1904973 reported by Lokendra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Low
Austin Sun

Bug Description

Brief Description
-----------------
Horizon goes down after few hours of new starlingx AIO Duplex installation.
Details:

Severity
--------
Provide the severity of the defect.
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------

reinstall the setup with AIO Duplex setup.
    https://docs.starlingx.io/deploy_install_guides/r4_release/bare_metal/aio_duplex.html

after the installation do not anything and keep the setup idle for couple of days, it goes down saying errors as mentioned above.

Expected Behavior
------------------
xfs file system gives error as reported in kern.log
 XFS (dm-3): Unmount and run xfs_repair │
9] XFS (dm-3): First 128 bytes of corrupted metadata buffer: │
2] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
6] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
6] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
2] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
3] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
0] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x77fc008 len 8 error 74 │
7] XFS (dm-3): xfs_dabuf_map: bno 8388608 dir: inode 122425344 │
3] XFS (dm-3): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 │
8] XFS (dm-3): Internal error xfs_da_do_buf(1) at line 2558 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_da_read_buf+0x6c/0x120 [xfs] │
671] CPU: 1 PID: 83210 Comm: heat-manage Kdump: loaded Tainted: G O --------- -t - 4.18.0-147.3.1.el8_1.7.tis.x86_64 #1 │
672] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 04/18/2019

│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644673] Call Trace:
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644679] dump_stack+0x5a/0x73
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644708] xfs_dabuf_map.constprop.18+0x166/0x380 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644725] xfs_da_read_buf+0x6c/0x120 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644739] xfs_da3_node_read+0x1e/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644754] xfs_da3_node_lookup_int+0x6e/0x340 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644775] ? kmem_zone_alloc+0x95/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644791] xfs_dir2_node_removename+0x4e/0x610 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644806] ? xfs_bmap_last_extent+0x5c/0xa0 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644821] ? xfs_bmap_last_offset+0x54/0xc0 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644841] ? kmem_alloc+0x96/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644858] xfs_dir_removename+0x16d/0x180 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644879] xfs_remove+0x250/0x300 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644900] xfs_vn_unlink+0x55/0xa0 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644904] vfs_unlink+0xe1/0x1a0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644909] ovl_do_remove+0x381/0x490 [overlay]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644912] vfs_unlink+0xe1/0x1a0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644914] do_unlinkat+0x25f/0x2b0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644917] do_syscall_64+0x5b/0x1c0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644919] entry_SYSCALL_64_after_hwframe+0x65/0xca │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644921] RIP: 0033:0x7f7334c08147

Actual Behavior
----------------

Reproducibility
---------------
<Reproducible>
have tried two times and issue was seen again.

Timestamp/Logs
--------------

System Configuration
--------------------
<Two node system>

Timestamp/Logs
--------------
openstack horizon pod logs:
 openstack_horizon-665b494c8d-4pxcz_40a15855-36cc-4b20-b20a-744e95acb833/horizon/0.log

 --

2020-11-11T11:55:18.573554412Z stdout F 2020-11-11 11:55:18.573494 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:55:18.573692524Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:18 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:55:18.573707942Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:18 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:55:28.573607994Z stdout F 2020-11-11 11:55:28.573529 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:55:28.5739878Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:28 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:55:28.573997319Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:28 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:55:34.150177144Z stdout F 2020-11-11 11:55:34.150099 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:55:34.150219383Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:34 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:55:34.150224494Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:34 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:55:38.573563539Z stdout F 2020-11-11 11:55:38.573468 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:55:38.573918168Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:38 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:55:38.573928838Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:38 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:55:48.573651224Z stdout F 2020-11-11 11:55:48.573579 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:55:48.573682234Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:48 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:55:48.577606395Z stdout F 10.41.16.57 - - [11/Nov/2020:11:55:48 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18

2020-11-11T11:57:08.573552496Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:08 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:57:18.575382542Z stdout F 2020-11-11 11:57:18.573687 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:57:18.575392282Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:18 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:57:18.575395798Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:18 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:57:28.573649787Z stdout F 2020-11-11 11:57:28.573574 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:57:28.573772808Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:28 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:57:28.573826761Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:28 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:57:34.149704277Z stdout F 2020-11-11 11:57:34.149638 AH00036: access to / failed (filesystem path '/var')
2020-11-11T11:57:34.150178273Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:34 +0000] "GET / HTTP/1.1" 403 202
2020-11-11T11:57:34.150192562Z stdout F 10.41.16.57 - - [11/Nov/2020:11:57:34 +0000] "GET / HTTP/1.1" 403 202 "-" "kube-probe/1.18"
2020-11-11T11:57:34.505472237Z stdout F Exception ignored in: <bound method Session.__del__ of <keystoneauth1.session.Session object at 0x7f739e1e7a58>>
2020-11-11T11:57:34.505485488Z stdout F Traceback (most recent call last):
2020-11-11T11:57:34.505490006Z stdout F File "/var/lib/openstack/lib/python3.6/site-packages/keystoneauth1/session.py", line 396, in __del__
2020-11-11T11:57:34.505493388Z stdout F NameError: name 'Exception' is not defined
2020-11-11T11:57:34.505496871Z stdout F Exception ignored in: <bound method Session.__del__ of <keystoneauth1.session.Session object at 0x7f739e21aba8>>
2020-11-11T11:57:34.505499443Z stdout F Traceback (most recent call last):
2020-11-11T11:57:34.505501976Z stdout F File "/var/lib/openstack/lib/python3.6/site-packages/keystoneauth1/session.py", line 396, in __del__
2020-11-11T11:57:34.505504498Z stdout F NameError: name 'Exception' is not defined
2020-11-11T11:57:34.505507202Z stdout F Exception ignored in: <bound method Session.__del__ of <keystoneauth1.session.Session object at 0x7f739c5a03c8>>
2020-11-11T11:57:34.505509626Z stdou

Further debuging the collect log.

Kern.log
 XFS (dm-3): Unmount and run xfs_repair │
9] XFS (dm-3): First 128 bytes of corrupted metadata buffer: │
2] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
6] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
6] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
2] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
5] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
3] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ │
0] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x77fc008 len 8 error 74 │
7] XFS (dm-3): xfs_dabuf_map: bno 8388608 dir: inode 122425344 │
3] XFS (dm-3): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 │
8] XFS (dm-3): Internal error xfs_da_do_buf(1) at line 2558 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_da_read_buf+0x6c/0x120 [xfs] │
671] CPU: 1 PID: 83210 Comm: heat-manage Kdump: loaded Tainted: G O --------- -t - 4.18.0-147.3.1.el8_1.7.tis.x86_64 #1 │
672] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 04/18/2019

│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644673] Call Trace:
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644679] dump_stack+0x5a/0x73
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644708] xfs_dabuf_map.constprop.18+0x166/0x380 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644725] xfs_da_read_buf+0x6c/0x120 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644739] xfs_da3_node_read+0x1e/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644754] xfs_da3_node_lookup_int+0x6e/0x340 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644775] ? kmem_zone_alloc+0x95/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644791] xfs_dir2_node_removename+0x4e/0x610 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644806] ? xfs_bmap_last_extent+0x5c/0xa0 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644821] ? xfs_bmap_last_offset+0x54/0xc0 [xfs] │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644841] ? kmem_alloc+0x96/0x100 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644858] xfs_dir_removename+0x16d/0x180 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644879] xfs_remove+0x250/0x300 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644900] xfs_vn_unlink+0x55/0xa0 [xfs]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644904] vfs_unlink+0xe1/0x1a0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644909] ovl_do_remove+0x381/0x490 [overlay]
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644912] vfs_unlink+0xe1/0x1a0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644914] do_unlinkat+0x25f/0x2b0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644917] do_syscall_64+0x5b/0x1c0
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644919] entry_SYSCALL_64_after_hwframe+0x65/0xca │
│2020-11-11T11:55:16.824 controller-1 kernel: warning [419472.644921] RIP: 0033:0x7f7334c08147

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi,

From you current log, it seems that the filesystem become corrupted and repair failed
The root cause might not relate to horizon
Could you retry it on other set up as well.

Thanks!
Zhipeng

zhipeng liu (zhipengs)
Changed in starlingx:
status: New → Incomplete
Revision history for this message
Austin Sun (sunausti) wrote :

please update, if no feedback , we will close it.

Changed in starlingx:
importance: Undecided → Low
assignee: nobody → Austin Sun (sunausti)
Revision history for this message
Thales Elero Cervi (tcervi) wrote :

Closing it now due to inactivity.

Changed in starlingx:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.