StarlingX

Private Registry automatically removed after unlock

Bug #1831145 reported by Cristopher Lemus on 2019-05-30

This bug report is a duplicate of: Bug #1830319: [ansible] cannot start armada service when deploy with private registry. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Critical	Tee Ngo

Bug Description

Brief Description
-----------------
With ISO 20190528T202529Z, using Ansible on Baremetal, and private registry, after unlock of controller-0, private registry information is removed from /etc/docker/daemon.json

Severity
--------
Critical: Setup of StarlingX cannot be completed.

Steps to Reproduce
------------------
Follow up wiki instructions. Ansible play completes successfully and docker/daemon.json has the correct information of private registry. Continue with Setup and after the unlock of controller-0, private registry IP from daemon.json is removed. This configuration is replicated to secondary nodes and the install fails because they cannot start docker (Docker complains that daemon.json is on an invalid json format).

Expected Behavior
------------------
After controller-0 unlock, /etc/docker/daemon.json should have private registry information, both, lab ip and controller-0 ip.

Actual Behavior
----------------
After controller-0 unlock, /etc/docker/daemon.json is as follows:

controller-0:~$ sudo cat /etc/docker/daemon.json
Password:
{
"insecure-registries" : [ "" ]
}

Reproducibility
---------------
100%

System Configuration
--------------------
Simplex, Duplex, Standard (2+2) and Standard (2+2+2) on Baremetal with Local/Mirror registry.

Branch/Pull Time/Commit
-----------------------
ISO 20190528T202529Z

Last Pass
---------
First sanity execution with Ansible. No previous pass.

Timestamp/Logs
--------------
Attached is a full collect from controller-0. But here's what I found relevant. First thing, we noticed that the unlock of all secondary nodes is failing. This is the error that we got from the console:

        Starting Network Time Service...
[ OK ] Started Network Time Service.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Name Service Cache Daemon...
[ OK ] Started Name Service Cache Daemon.
         Starting Naming services LDAP client daemon....
[ OK ] Started Naming services LDAP client daemon..

Being that it fails to start it, it cannot pull images and the "install_state" goes to "failed", sending the unlocked nodes to a loop.

This is the status of docker daemon:

controller-1:~$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-stx-override.conf
        /usr/lib/systemd/system/docker.service.d
           └─starlingx-docker-override.conf
   Active: failed (Result: exit-code) since Thu 2019-05-30 12:04:36 UTC; 2min 54s ago
     Docs: https://docs.docker.com
Main PID: 80463 (code=exited, status=1/FAILURE)

From /var/log/daemon.log:

2019-05-30T08:57:56.158 controller-1 dockerd[82857]: info insecure registry is not valid: invalid host ""

This is the content of /etc/docker/daemon.json, the exact same configuration as controller-0:

controller-1:~$ sudo cat /etc/docker/daemon.json
Password:
{
"insecure-registries" : [ "" ]
}

Something that might Help.

This is the behavior on controller-0 (simplex, duplex and both standard), AFTER ansible play, daemon is properly filled with the local (mirror) ip address that we have in our lab, BEFORE the unlock:
{
"insecure-registries" : [ "192.168.100.60" ]
}

However, AFTER the unlock, is as follows:
{
"insecure-registries" : [ "" ]
}

Test Activity
-------------
Sanity

Tags: