Backup & Restore: AIO-DX+worker Controller failed to become active after restore

Bug #1854169 reported by Senthil Mukundakumar on 2019-11-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
High
Ovidiu Poncea

Bug Description

Brief Description
-----------------

In AIO-DX+worker configuration, the active controller failed to become active after restore and unlock.

/var/log/puppet/latest/puppet.log
error with running `drbdadm create-md drbd-pgsql -W--peer-max-bio-size=128k`

Severity
--------
Critical: Unable to restore active controller

Steps to Reproduce
------------------
1. Bring up the AIO-DX+worker system system
2. Backup the system using ansible locally
3. Re-install the controller with the same load
4. Restore the active controller
5. Unlock active controller

Expected Behavior
------------------
The active controller should be successfully restored and become active

Actual Behavior
----------------
Active controller failed to become active after unlock

Reproducibility
---------------
Reproducible

System Configuration
--------------------
AIO-DX+worker

Branch/Pull Time/Commit
-----------------------
 BUILD_ID="2019-11-25_20-00-00"

Test Activity
-------------
Feature Testing

Ghada Khalil (gkhalil) on 2019-11-27
tags: added: stx.update
Ovidiu Poncea (ovidiu.poncea) wrote :

Hi Senthil,
We need more details:
1. The lab that has the issue was on (thanks for the email, it is wp_8_12 and it had IPv6 support)
2. The collect after backup and before the reinstall of controller-0
3. The backup archive

We will do a test on an AIO-DX IPv6 to see if it works & will need a DX+Worker lab in IPv6 mode to restest.

Changed in starlingx:
status: New → Incomplete
Ghada Khalil (gkhalil) wrote :

stx.3.0 / high priority - issue w/ B&R feature which is an stx.3.0 deliverable

Changed in starlingx:
assignee: nobody → Senthil Mukundakumar (smukunda)
importance: Undecided → High
tags: added: stx.3.0
Senthil Mukundakumar (smukunda) wrote :

1. wp_8_12 it is reproducible in IPV6 configuration
2. Both backup and collect file copied to /folk/cgts_logs/logs/LP_1854169

Changed in starlingx:
status: Incomplete → New
Ghada Khalil (gkhalil) on 2019-12-02
Changed in starlingx:
status: New → Triaged
assignee: Senthil Mukundakumar (smukunda) → Ovidiu Poncea (ovidiu.poncea)
Ovidiu Poncea (ovidiu.poncea) wrote :

I can't find any issue with it in the logs, it also should not be related to IPv6 since "drbdadm create-md drbd-pgsql -W--peer-max-bio-size=128" since it's a drive initialization command nor to the fact that we have workers added to a DX, it is also in the middle of a script which seems to had all prerequisites met.

We need to test this on a HW deployment. I have a DX IPv6 lab reserved for next week. If it doesn't reproduce we'll have to go on the same lab that had the issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers