Docker Subordinate Charm

charm stays in "docker login failed, see juju debug-log" status even after later success

Bug #1915280 reported by David Coronel on 2021-02-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Docker Subordinate Charm	Triaged	High	Unassigned

Bug Description

UPDATE: After investigation, the problem seems to be that the charm is not doing a status-set after things are OK again. The docker-registry itself seems to be working. See comment #1 for details.
------

I want to add a private docker registry with the charm Docker Registry #173 from https://jaas.ai/u/containers/docker-registry to my CDK deployment. I'm using it as an extra registry for the docker subordinates on my K8s workers (I don't plan to upload all the rocks and k8s and gcr images to it).

I use a relation from docker-registry to easyrsa to get SSL certificates as per https://ubuntu.com/kubernetes/docs/docker-registry#deploy

The docker-registry charm has a default binding to an oam space but all the other bindings are set to another space (internal-space):

    bindings:
      "": *oam-space
      cert-provider: *internal-space
      docker-registry: *internal-space
      dockerhost: *internal-space
      nrpe-external-master: *internal-space
      sdn-plugin: *internal-space
      website: *internal-space

With the following options:

    options:
      auth-basic-user: "admin"
      auth-basic-password: "password"
      http_proxy: *http-proxy
      https_proxy: *http-proxy
      no_proxy: *no-proxy

I tried no-proxy with some cidrs("127.0.0.1,localhost,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16") and also with just the IP of my docker-registry and the 2 ipv4 localhost addresses ("172.31.246.81,127.0.0.1,localhost").

After deploying docker-registry and relating it to docker, I can do a docker login from the docker-registry unit to its oam private IP with “docker login -u admin -p password 172.31.222.132:5000” but not with the internal space IP with “docker login -u admin -p password 172.31.246.81:5000”. I get “Error response from daemon: Get https://172.31.246.81:5000/v2/: x509: certificate is valid for 172.31.222.132, 172.31.222.132, not 172.31.246.81” which is expected because I didn’t set http-host yet.

My k8s workers can’t reach that private oam IP so I need to add the internal space IP. If I try to reach the private oam IP from the k8s workers I get the following error, as expected:

“Error response from daemon: Get https://172.31.222.132:5000/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”

From the workers, a docker login to the internal space IP with “docker login -u admin -p password 172.31.246.81:5000” gives me the same error about the SSL cert not being valid for that IP, which is what I expect since I have the same thing on the docker-registry locally.

Then I set the http-host of docker-registry to my internal space IP:

juju config -m kubernetes docker-registry http-host="https://172.31.246.81:5000/"

Now “docker login -u admin -p password 172.31.246.81:5000” works both from the docker-registry unit and the k8s workers. I do hit the following bug though but I can workaround it by apt installing the pass package:

https://github.com/docker/compose/issues/6023
https://bugs.launchpad.net/ubuntu/+source/golang-github-docker-docker-credential-helpers/+bug/1794307
https://bugs.launchpad.net/ubuntu/+source/docker-compose/+bug/1796119

But still, all docker units are blocked.

In the juju debug logs there’s always the 2 following lines but I doubt they are responsible for this problem:

2021-02-10 13:26:54 WARNING juju-log docker-registry:101: Passing NO_PROXY string that includes a cidr. This may not be compatible with software you are running in your shell.
2021-02-10 13:27:35 WARNING docker-registry-relation-changed WARNING: No swap limit support

I also see this error:

unit-docker-0: 14:06:00 ERROR juju.worker.metrics.sender could not remove batch "b4633618-cef5-4815-8bde-226855bc091e" from spool: remove /var/lib/juju/metricspool/b4633618-cef5-4815-8bde-226855bc091e: no such file or directory

Looking at the docker charm code, I see that the "docker login failed" error happens when ['docker', 'login', netloc, '-u', registry.basic_user, '-p', registry.basic_password] fails. But it works when I do it manually on the workers (after installing the pass package of course).

    # handle auth data
    if registry.has_auth_basic():
        hookenv.log('Logging into docker registry: {}.'.format(netloc))
        cmd = ['docker', 'login', netloc,
               '-u', registry.basic_user, '-p', registry.basic_password]
        try:
            check_output(cmd, stderr=subprocess.STDOUT)
        except CalledProcessError as e:
            if b'http response' in e.output.lower():
                # non-tls login with basic auth will error like this:
                # Error response ... server gave HTTP response to HTTPS client
                msg = 'docker login requires a TLS-enabled registry'
            elif b'unauthorized' in e.output.lower():
                # invalid creds will error like this:
                # Error response ... 401 Unauthorized
                msg = 'Incorrect credentials for docker registry'
            else:
                msg = 'docker login failed, see juju debug-log'
            status.blocked(msg)

I'm trying to find out what the value of netloc is here. If it's the oam private IP, it's going to fail because I have to use the internal space IP.

I'm not sure if it's http-host config or the private-addr of the unit. I’m seeing in the docker-registry charm that netloc should be http-host but I'm not sure if that really applies to me:

def get_netloc():
'''Get the network location (host:port) for this registry.

    If http-host config is present, return the netloc for that config.
    If related to a proxy, return the proxy netloc. Otherwise, return
    our private_adddress:port.
    '''
    charm_config = hookenv.config()
    netloc = None
    if charm_config.get('http-host'):
        netloc = urlparse(charm_config['http-host']).netloc
    else:

See original description

Tags:

Revision history for this message

David Coronel (davecore) wrote on 2021-02-10:

After investigation, I think there's not really any real problem with the docker registry. I think the problem is just that the charm is not doing a status-set after things are OK again. When I have the wrong IP (oam private IP), it did a status-set:

unit-docker-0: 17:02:19 INFO unit.docker/0.juju-log docker-registry:105: Logging into docker registry: 172.31.222.132:5000.
unit-docker-0: 17:02:49 INFO unit.docker/0.juju-log docker-registry:105: Invoking reactive handler: reactive/docker.py:679:docker_restart
unit-docker-0: 17:02:49 WARNING unit.docker/0.juju-log docker-registry:105: Passing NO_PROXY string that includes a cidr. This may not be compatible with software you are running in your shell.
unit-docker-0: 17:03:19 INFO unit.docker/0.juju-log docker-registry:105: Restarting docker service.
unit-docker-0: 17:03:19 INFO unit.docker/0.juju-log docker-registry:105: Setting runtime to {'apt': ['docker.io'], 'upstream': ['docker-ce'], 'nvidia': ['docker-ce', 'nvidia-docker2', 'nvidia-container-runtime', 'nvidia-container-runtime-hook']}
unit-docker-0: 17:03:19 INFO unit.docker/0.juju-log docker-registry:105: Reloading system daemons.
unit-docker-0: 17:03:31 WARNING unit.docker/0.docker-registry-relation-changed WARNING: No swap limit support
unit-docker-0: 17:03:31 INFO unit.docker/0.juju-log docker-registry:105: status-set: blocked: docker login failed, see juju debug-log

I think we just never re-set status, which would also explain why my docker subs all stay in blocked state after I remove the docker-registry relation.

To push the investigation, I run “juju debug-hooks -m kubernetes docker/0” and then run “juju run -m kubernetes --unit docker/0 'hooks/update-status'” in another window which triggers the update-status hook in my debug-hook session

Then I run “charms.reactive -p get_flags” and see the following:

['docker.available',
'docker.ready',
'docker.registry.configured',
'endpoint.docker-registry.changed.basic_password',
'endpoint.docker-registry.changed.basic_user',
'endpoint.docker-registry.changed.egress-subnets',
'endpoint.docker-registry.changed.ingress-address',
'endpoint.docker-registry.changed.private-address',
'endpoint.docker-registry.changed.registry_netloc',
'endpoint.docker-registry.changed.registry_url',
'endpoint.docker-registry.changed.tls_ca',
'endpoint.docker-registry.joined',
'endpoint.docker-registry.ready',
'endpoint.docker.available',
'endpoint.docker.changed',
'endpoint.docker.changed.egress-subnets',
'endpoint.docker.changed.ingress-address',
'endpoint.docker.changed.private-address',
'endpoint.docker.changed.sandbox_image',
'endpoint.docker.joined',
'endpoint.docker.reconfigure']

The only time in the docker charm code that we set status.active('Container runtime available.') and set_state('docker.available') are set is in the signal_workloads_start function:

@when('docker.ready')
@when_not('docker.available')
def signal_workloads_start():

But right now in my debug-hook in update-status I have docker.available set, so this won’t run.

Am I understanding this right?

I think we just never re-set status, which would also explain why my docker subs all stay in blocked state after I remove the docker-registry relation.

Then I run “charms.reactive -p get_flags” and see the following:

['docker.available',
 'docker.ready',
 'docker.registry.configured',
 'endpoint.docker-registry.changed.basic_password',
 'endpoint.docker-registry.changed.basic_user',
 'endpoint.docker-registry.changed.egress-subnets',
 'endpoint.docker-registry.changed.ingress-address',
 'endpoint.docker-registry.changed.private-address',
 'endpoint.docker-registry.changed.registry_netloc',
 'endpoint.docker-registry.changed.registry_url',
 'endpoint.docker-registry.changed.tls_ca',
 'endpoint.docker-registry.joined',
 'endpoint.docker-registry.ready',
 'endpoint.docker.available',
 'endpoint.docker.changed',
 'endpoint.docker.changed.egress-subnets',
 'endpoint.docker.changed.ingress-address',
 'endpoint.docker.changed.private-address',
 'endpoint.docker.changed.sandbox_image',
 'endpoint.docker.joined',
 'endpoint.docker.reconfigure']
 
The only time in the docker charm code that we set  status.active('Container runtime available.') and set_state('docker.available') are set is in the signal_workloads_start function:

@when('docker.ready')
@when_not('docker.available')
def signal_workloads_start():

But right now in my debug-hook in update-status I have docker.available set, so this won’t run.

Am I understanding this right?

tags:

added: cpe-onsite

Revision history for this message

David Coronel (davecore) wrote on 2021-02-10:

Removing the docker.available flag on my docker units fixed my blocked status.

charms.reactive clear_flag docker.available

Note that I also have to install the "pass" package (apt install pass) on the docker subs because of https://bugs.launchpad.net/ubuntu/+source/golang-github-docker-docker-credential-helpers/+bug/1794307 (docker login fails Cannot autolaunch D-Bus without X11).

description:

updated

Revision history for this message

George Kraft (cynerva) wrote on 2021-02-11:

Reproduction steps:

1. Deploy Charmed Kubernetes with docker instead of containerd
2. Deploy docker-registry as described in the docs[1]
3. Configure docker-registry with a bogus http-host e.g. https://127.0.0.1:12345
4. Relate docker-registry to docker
5. Wait for docker to enter blocked/idle status with "docker login failed, see juju debug-log"
6. Configure docker-registry with http-host=
7. Wait for docker-registry and docker charms to run their hooks and enter idle state

At this point the docker charm should still show blocked/idle, even though it has logged in successfully.

[1]: https://ubuntu.com/kubernetes/docs/docker-registry

summary:	- docker login failed when relating to docker-registry with http-host - configured + charm stays in "docker login failed, see juju debug-log" status even + after later success
Changed in charm-docker:
importance:	Undecided → High
status:	New → Triaged

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.