During setup controllers remain in Degraded status - postgres standalone

Bug #1850836 reported by Cristopher Lemus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Don Penney

Bug Description

Brief Description
-----------------
During the setup stages of Duplex, Standard (2+2 and 2+2+2) configurations, both controllers remain on degraded state after unlocking controller-1.

Severity
--------
Provide the severity of the defect.
Critical

Steps to Reproduce
------------------
Follow up docs procedure for duplex, standard or standard external storage configurations.

Expected Behavior
------------------
Controllers on Available status

Actual Behavior
----------------
Controllers on Degraded status

Reproducibility
---------------
100%

System Configuration
--------------------
Duplex, standard and standard external configurations.

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20191031T013000Z"

Last Pass
---------
Last pass on previous day: 20191030T013000Z

Timestamp/Logs
--------------
Full log attached from a Standard Local storage configuration. Here are general details from this config: http://paste.openstack.org/show/785692/

Test Activity
-------------
Sanity

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Ghada Khalil (gkhalil)
summary: - During setup controllers remain on Degraded status
+ During setup controllers remain in Degraded status -postgres standalone
summary: - During setup controllers remain in Degraded status -postgres standalone
+ During setup controllers remain in Degraded status - postgres standalone
Revision history for this message
Don Penney (dpenney) wrote :

On controller-0:
Disk /dev/mapper/cgts--vg-pgsql--lv: 42.9 GB, 42949672960 bytes, 83886080 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

On controller-1:
Disk /dev/mapper/cgts--vg-pgsql--lv: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

So the first controller was setup with 40G for the postgres LV, while the second controller only 20G

Revision history for this message
Don Penney (dpenney) wrote :

The user.log is showing ansible setting the size to 20G on controller-0, but then setting it again to 40G immediately after:

$ grep pgsql-lv controller-0_20191031.100019/var/log/user.log
2019-10-30T22:57:42.000 localhost ansible-command: info Invoked with warn=True executable=None _uses_shell=False _raw_params=lvextend -L20G /dev/cgts-vg/pgsql-lv removes=None argv=None creates=None chdir=None stdin=None
2019-10-30T22:57:43.000 localhost ansible-command: info Invoked with warn=True executable=None _uses_shell=False _raw_params=lvextend -L40G /dev/cgts-vg/pgsql-lv removes=None argv=None creates=None chdir=None stdin=None

I don't know where this second ansible directive is coming from.

Revision history for this message
Don Penney (dpenney) wrote :

Can you provide the content of your localhost.yaml file?

Revision history for this message
Don Penney (dpenney) wrote :

From: https://opendev.org/starlingx/ansible-playbooks/src/branch/master/playbookconfig/src/playbooks/roles/bootstrap/persist-config/tasks/one_time_config_tasks.yml#L102

The LV is initially sized to 10G. Then if the disk is larger, it gets resized to 20G. With the logs here showing 20G and 40G, it looks like somehow the initial configuration of controller-0 is using ansible-playbooks from a previous load.

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Here's the content:

system_mode: duplex

dns_servers:
  - 192.168.100.60

docker_registries:
  defaults:
    url: 192.168.100.60

is_secure_registry: False

external_oam_subnet: 192.168.200.0/24
external_oam_gateway_address: 192.168.200.1
external_oam_floating_address: 192.168.200.207
external_oam_node_0_address: 192.168.200.82
external_oam_node_1_address: 192.168.200.83

management_subnet: 10.10.54.0/24
management_start_address: 10.10.54.10
management_end_address: 10.10.54.100
management_gateway_address: 10.10.54.1

ansible_become_pass: ANSIBLE_PASS
admin_password: ADMIN_PASS

Just the ANSIBLE_PASS/ADMIN_PASS are replaced by actual passwords.

Revision history for this message
Don Penney (dpenney) wrote :

Ok, you can ignore my previous comments. The problem is that the two updates to resize the filesystem were not properly coordinated:
https://review.opendev.org/692129
https://review.opendev.org/692125

One went in yesterday, the other this morning. So the next build should be ok.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released.
As per above, this will be addressed in the 2019-11-01 load

tags: added: stx.config
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
importance: Undecided → Critical
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.