DC: rbd mounted devices becomes read only after enabling https on system controller

Bug #1901449 reported by Difu Hu on 2020-10-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Mihnea Saracin

Bug Description

Brief Description
-----------------
On DC system, rbd mounted devices becomes read only after enabling https on system controller

Severity
--------
Major

Steps to Reproduce
------------------
1. pre-condition: https is not enabled on DC.
2. system --os-region-name RegionOne modify --https_enabled="true"

Expected Behavior
------------------
rbd mounted devices should be mounted with read and write permissions.

Actual Behavior
----------------
rbd mounted devices becomes read only after enabling https.

Reproducibility
---------------
Happened 1/1 time

System Configuration
--------------------
DC system
Lab-name: WCP_80_91 (DC-1)

Branch/Pull Time/Commit
-----------------------
2020-06-27_18-35-20 + some changes

Last Pass
---------
Not sure

Timestamp/Logs
--------------
controller-0
| 2020-10-21T12:45:59.492 | 326 | service-scn | haproxy | enabled-active | disabling | restart safe requested
| 2020-10-21T12:45:59.544 | 327 | service-scn | haproxy | disabling | disabled | disable success
| 2020-10-21T12:46:00.066 | 328 | service-scn | haproxy | disabled | enabling | enabled-active state requested
| 2020-10-21T12:46:00.085 | 329 | service-scn | haproxy | enabling | enabled-active | enable success
| 2020-10-21T12:46:00.786 | 330 | service-scn | horizon | enabled-active | disabling | restart safe requested
| 2020-10-21T12:46:00.795 | 331 | service-scn | horizon | disabling | disabled | disable success
| 2020-10-21T12:46:01.080 | 332 | service-scn | horizon | disabled | enabling | enabled-active state requested
| 2020-10-21T12:46:01.735 | 333 | service-scn | horizon | enabling | enabled-active | enable success
| 2020-10-21T12:46:02.100 | 334 | service-scn | lighttpd | enabled-active | disabling | restart safe requested

2020-10-21T12:47:48.236 controller-0 kernel: err [ 3127.702215] Aborting journal on device rbd0-8.
2020-10-21T12:47:48.236 controller-0 kernel: err [ 3127.707177] Buffer I/O error on dev rbd0, logical block 491520, lost sync page write
2020-10-21T12:47:48.236 controller-0 kernel: err [ 3127.715832] JBD2: Error -5 detected when updating journal superblock for rbd0-8.
2020-10-21T12:47:48.244 controller-0 kernel: warning [ 3127.724180] EXT4-fs (rbd0): discard request in group:17 block:20517 count:1 failed with -5
2020-10-21T12:47:49.237 controller-0 kernel: crit [ 3128.688532] EXT4-fs error (device rbd0): ext4_find_entry:1318: inode #131076: comm elasticsearch[m: reading directory lblock 0
2020-10-21T12:47:49.237 controller-0 kernel: crit [ 3128.701321] EXT4-fs error (device rbd0): ext4_read_inode_bitmap:163: comm elasticsearch[m: Cannot read inode bitmap - block_group = 16, inode_bitmap = 524304
2020-10-21T12:47:49.253 controller-0 kernel: crit [ 3128.717070] EXT4-fs error (device rbd0): ext4_journal_check_start:56: Detected aborted journal
2020-10-21T12:47:49.253 controller-0 kernel: crit [ 3128.724185] EXT4-fs (rbd0): Remounting filesystem read-only

2020-10-21T12:51:19.346 controller-0 kernel: err [ 3338.745010] Aborting journal on device rbd1-8.
2020-10-21T12:51:19.346 controller-0 kernel: err [ 3338.745013] Buffer I/O error on dev rbd1, logical block 19431424, lost sync page write
2020-10-21T12:51:19.346 controller-0 kernel: err [ 3338.745013] JBD2: Error -5 detected when updating journal superblock for rbd1-8.
2020-10-21T12:51:19.346 controller-0 kernel: crit [ 3338.745130] EXT4-fs error (device rbd1): ext4_journal_check_start:56: Detected aborted journal
2020-10-21T12:51:19.346 controller-0 kernel: crit [ 3338.745131] EXT4-fs (rbd1): Remounting filesystem read-only
2020-10-21T12:51:19.346 controller-0 kernel: crit [ 3338.745300] EXT4-fs error (device rbd1): ext4_wait_block_bitmap:516: comm elasticsearch[m: Cannot read block bitmap - block_group = 8, block_bitmap = 1052
2020-10-21T12:51:19.346 controller-0 kernel: crit [ 3338.745301] EXT4-fs error (device rbd1): ext4_discard_preallocations:4035: comm elasticsearch[m: Error reading block bitmap for 8

Test Activity
-------------
Regression Testing

Ghada Khalil (gkhalil) wrote :

This looks like there was filesystem corruption

tags: added: stx.storage
description: updated
tags: added: stx.5.0
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - one time occurrence, but should be investigated

Peng Peng (ppeng) wrote :

Issue was reproduced on
2020-06-27_18-35-20
DC-3

log:
[2020-11-13 01:15:51,605] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[fd01:11::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne modify --https_enabled="true"'

from kern.log
2020-11-13T01:18:39.667 controller-1 kernel: crit [ 3052.773012] EXT4-fs error (device rbd0): ext4_wait_block_bitmap:516: comm elasticsearch[m: Cannot read block bitmap - block_group = 7, block_bitmap = 1051

collect log added:
https://files.starlingx.kube.cengn.ca/launchpad/1901449

Frank Miller (sensfan22) on 2020-11-19
Changed in starlingx:
assignee: nobody → Mihnea Saracin (msaracin)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers