B&R operation failed when trying to run psql command

Bug #2004189 reported by Guilherme Schons
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Guilherme Schons

Bug Description

Brief Description
-----------------
Performed successful patches then attempted B&R procedure. The restore failed to try to run a psql command:

psql: error: could not connect to server: No such file or directory
            Is the server running locally and accepting
            connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Checking the log files, it looks like the postgresql cluster is having problems getting the /etc/ssl/private/ssl-cert-snakeoil.key value (mismatching).

Severity
--------
standard

Steps to Reproduce
------------------
Upload, apply, and install RR patches;

Perform backup operation with ansible-playbook and copy .tgz off-box;

Install the controller again and run the restore ansible playbook.

Expected Behavior
------------------
The restore completes successfully

Actual Behavior
----------------
The restore fails to try to run a psql command.

Reproducibility
---------------
100% reproducible.

System Configuration
--------------------
Tested on standard and AIO-SX. But probably happens in different environments.

Branch/Pull Time/Commit
-----------------------
Master, 2022-12-18

Last Pass
---------
N/A

Timestamp/Logs
--------------
2022-12-21 20:34:44,669 p=2690 u=sysadmin n=ansible | TASK [restore-platform/restore-more-data : Set all the hosts, except storage nodes to locked/disabled/offline state] *******************
2022-12-21 20:34:44,669 p=2690 u=sysadmin n=ansible | Wednesday 21 December 2022 20:34:44 +0000 (0:00:00.020) 0:34:21.150 ****
2022-12-21 20:34:45,149 p=2690 u=sysadmin n=ansible | fatal: [localhost]: FAILED! => changed=true
  cmd: psql -c "update i_host set administrative='locked', operational='disabled', availability='offline' where personality!='storage'" sysinv
  delta: '0:00:00.049671'
  end: '2022-12-21 20:34:45.125470'
  msg: non-zero return code
  rc: 2
  start: '2022-12-21 20:34:45.075799'
  stderr: |-
    psql: error: could not connect to server: No such file or directory
            Is the server running locally and accepting
            connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2022-12-21 20:34:45,150 p=2690 u=sysadmin n=ansible | PLAY RECAP

Test Activity
-------------
Developer testing

Workaround
----------
Remove the /etc/ssl/private/ssl-cert-snakeoil.key from the backup file.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/872208
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/e30abe3d17f3bdf2bf61946f691d97c2b45010e5
Submitter: "Zuul (22348)"
Branch: master

commit e30abe3d17f3bdf2bf61946f691d97c2b45010e5
Author: Guilherme Schons <email address hidden>
Date: Mon Jan 30 15:58:23 2023 -0300

    Remove ssl-cert-snakeoil.key from backup file

    An error was found trying to restore a backup from a system with
    comitted patches. The error is about PostgreSQL trying to use the
    ssl-cert-snakeoil.key certificate, but the value from this certificate
    mismatches the expected value.

    This certificate is changed when a patch is applied.

    By removing this file from the backup, the restore process works fine,
    with the system generating a new certificate

    Test Plan:
    - PASS: Perform B&R AIO-SX with patches applied.

    - PASS: Perform B&R AIO-SX Without patches applied.

    - PASS: Perform B&R Standard with patches applied.

    - PASS: Perform B&R Standard without patches applied.

    Closes-Bug: 2004189
    Signed-off-by: Guilherme Schons <email address hidden>
    Change-Id: Ic1e2c18f3f7f68687531e1b471b4dfda4738ff2f

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.update
Changed in starlingx:
assignee: nobody → Guilherme Schons (gdossant)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.