Private Registry automatically removed after unlock

Bug #1831145 reported by Cristopher Lemus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Tee Ngo

Bug Description

Brief Description
-----------------
With ISO 20190528T202529Z, using Ansible on Baremetal, and private registry, after unlock of controller-0, private registry information is removed from /etc/docker/daemon.json

Severity
--------
Critical: Setup of StarlingX cannot be completed.

Steps to Reproduce
------------------
Follow up wiki instructions. Ansible play completes successfully and docker/daemon.json has the correct information of private registry. Continue with Setup and after the unlock of controller-0, private registry IP from daemon.json is removed. This configuration is replicated to secondary nodes and the install fails because they cannot start docker (Docker complains that daemon.json is on an invalid json format).

Expected Behavior
------------------
After controller-0 unlock, /etc/docker/daemon.json should have private registry information, both, lab ip and controller-0 ip.

Actual Behavior
----------------
After controller-0 unlock, /etc/docker/daemon.json is as follows:

controller-0:~$ sudo cat /etc/docker/daemon.json
Password:
{
    "insecure-registries" : [ "" ]
}

Reproducibility
---------------
100%

System Configuration
--------------------
Simplex, Duplex, Standard (2+2) and Standard (2+2+2) on Baremetal with Local/Mirror registry.

Branch/Pull Time/Commit
-----------------------
ISO 20190528T202529Z

Last Pass
---------
First sanity execution with Ansible. No previous pass.

Timestamp/Logs
--------------
Attached is a full collect from controller-0. But here's what I found relevant. First thing, we noticed that the unlock of all secondary nodes is failing. This is the error that we got from the console:

        Starting Network Time Service...
[ OK ] Started Network Time Service.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Name Service Cache Daemon...
[ OK ] Started Name Service Cache Daemon.
         Starting Naming services LDAP client daemon....
[ OK ] Started Naming services LDAP client daemon..

Being that it fails to start it, it cannot pull images and the "install_state" goes to "failed", sending the unlocked nodes to a loop.

This is the status of docker daemon:

controller-1:~$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-stx-override.conf
        /usr/lib/systemd/system/docker.service.d
           └─starlingx-docker-override.conf
   Active: failed (Result: exit-code) since Thu 2019-05-30 12:04:36 UTC; 2min 54s ago
     Docs: https://docs.docker.com
 Main PID: 80463 (code=exited, status=1/FAILURE)

From /var/log/daemon.log:

2019-05-30T08:57:56.158 controller-1 dockerd[82857]: info insecure registry is not valid: invalid host ""

This is the content of /etc/docker/daemon.json, the exact same configuration as controller-0:

controller-1:~$ sudo cat /etc/docker/daemon.json
Password:
{
    "insecure-registries" : [ "" ]
}

Something that might Help.

This is the behavior on controller-0 (simplex, duplex and both standard), AFTER ansible play, daemon is properly filled with the local (mirror) ip address that we have in our lab, BEFORE the unlock:
{
    "insecure-registries" : [ "192.168.100.60" ]
}

However, AFTER the unlock, is as follows:
{
    "insecure-registries" : [ "" ]
}

Test Activity
-------------
Sanity

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Looks like this issue is already resolved with: https://review.opendev.org/#/c/661657/

I'll verify using the patches and ISO once it's merged.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Tee to confirm if this will be fixed by the above commit

Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
tags: added: stx.2.0 stx.config
Revision history for this message
Tee Ngo (teewrs) wrote :

Please retest with May 31st build.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Incomplete until testing is done with the latest load

Changed in starlingx:
importance: Undecided → Critical
status: New → Incomplete
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Just to confirm, this is fixed on ISO: 20190531T013000Z

A sanity is in progress, however, the majority of them has passed. Full sanity report will be sent.

Thanks.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released since it's confirmed that this issue is addressed by https://review.opendev.org/#/c/661657/ and is a duplicate of https://bugs.launchpad.net/starlingx/+bug/1830319

Changed in starlingx:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers