AIO-Duplex: After active server forced to shut down, it comes with availability state failed

Bug #1844765 reported by Yatindra on 2019-09-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Austin Sun

Bug Description

Brief Description
-----------------
While testing the Duplex HA mode scenario:
 I forcibly restart the active server(controller-0), it goes on restarting mode and standby server (controller-1) comes as the active server. Till here things work perfectly fine but when the controller-0 comes up , it comes in the Operational State:Disabled and Availability as: Failed. Please see attached image.

Severity
--------
Provide the severity of the defect.

Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Test the HA mode in AIO-DX system with forcibly restarting the active server.

Expected Behavior
------------------
Controller-0 should come up as standby server with Operational state Enabled and Availability : online.

Actual Behavior
----------------
Controller-0 in Failed state

Reproducibility
---------------

State if the issue is 100% reproducible.

System Configuration
--------------------
AIO- Duplex

Branch/Pull Time/Commit
-----------------------
Sept 5 ISO pulled from the Mirror repository.

Last Pass
---------
No. this test have been not passed previously.
Timestamp/Logs
--------------
Attach the collected logs for debugging

Test Activity
-------------

Yatindra (yatindra) wrote :
Yatindra (yatindra) wrote :
Yatindra (yatindra) wrote :
Ghada Khalil (gkhalil) wrote :

@Yatindra, can you please add the info regarding the load you are using? If you built your own load, please put the output of /etc/build.info. If you are using a CENGN pre-build load, please put the link.

Changed in starlingx:
status: New → Incomplete
Yatindra (yatindra) wrote :

I used CENGN pre-build ISO image of September 5. Currently it is removed from the link.

http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/

Yatindra (yatindra) wrote :

After multiple restart of the current standby server controller-0 , still it is in booting state and failed / degraded after long time

Yatindra (yatindra) wrote :
Yatindra (yatindra) on 2019-09-25
summary: - AIO-Duplex: After active server forced to comes with availability state
- failed
+ AIO-Duplex: After active server forced to shut down, it comes with
+ availability state failed
description: updated
Ghada Khalil (gkhalil) on 2019-10-02
tags: added: stx.helpwanted stx.metal
Ghada Khalil (gkhalil) wrote :

Hi Cindy, can someone from your team help triage this issue? Thanks

Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
Austin Sun (sunausti) on 2019-10-08
Changed in starlingx:
assignee: Cindy Xie (xxie1) → Austin Sun (sunausti)
Yatindra (yatindra) wrote :

Hi Austin,

Any input you need more or any update regarding it.

Austin Sun (sunausti) wrote :

Hi Yatindra:
in controller-0 puppet log:The hard disk check is failed on controller-0, have you changed partitions outside 'sysinv command' before it ?

2019-09-19T15:34:10.190 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: sysinv 2019-09-19 15:34:09.964 77683 INFO manage-partitions [-] Executing command: 'parted -s /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 unit mib mkpart primary 254998 279574'
2019-09-19T15:34:10.192 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: sysinv 2019-09-19 15:34:10.078 77683 CRITICAL sysinv [-] Could not create partition 5 of 24576MiB on disk /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0: Error: You requested a partition from 254998MiB to 279574MiB (sectors 522235904..572567551).
2019-09-19T15:34:10.194 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: The closest location we can manage is 255297MiB to 279574MiB (sectors 522848256..572567551).
2019-09-19T15:34:10.196 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: 2019-09-19 15:34:10.078 77683 TRACE sysinv Traceback (most recent call last):
2019-09-19T15:34:10.198 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: 2019-09-19 15:34:10.078 77683 TRACE sysinv File "/usr/bin/manage-partitions", line 893, in <module>
2019-09-19T15:34:10.201 Notice: 2019-09-19 15:34:10 +0000 /Stage[main]/Platform::Partitions/Platform_manage_partition[check]/Exec[manage-partitions-check]/returns: 2019-09-19 15:34:10.078 77683 TRACE sysinv main(sys.argv)

Cindy Xie (xxie1) on 2019-10-11
Changed in starlingx:
importance: Undecided → Medium
Yatindra (yatindra) wrote :

Hi Austin,

No, I had not changed any partition.
It occurred when I force off the active system.

Ghada Khalil (gkhalil) wrote :

Since this is marked as medium priority, it gates stx.3.0 for now

tags: added: stx.3.0
Changed in starlingx:
status: Incomplete → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers