image corruption with docker-registry charm

Bug #2049360 reported by Andrew Liaw
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Docker Registry Charm
Fix Released
High
Kevin W Monroe

Bug Description

We have a instance of docker-registry charm configured as a pull-through cache for dockerhub.
The juju machine disk is used as storage.

Recently, we have encountered an issue when pulling a image will always results in failure. Upon investigation it seems one layer of the image was missing. When pulling that layer the docker-registry always returns an empty file.

We have `juju ssh` into the machine and using `docker exec -it sh` to poke around the docker-registry container. Most of the image layers are found under the corresponding directory name after the first 2 character of their hash, however one layer was missing on the disk.

This seems to be a docker-registry problem and not the charm itself.

We ended up removing the docker-registry container with the juju action (https://charmhub.io/docker-registry/actions#stop), then starting a new container with juju action.
This has resolved our issue.

We are wondering if there are some settings or configuration the charm has to offset this issue.
E.g., is there a time-to-live for the cached images? Or some better solution?

Revision history for this message
Andrew Liaw (aliaw) wrote :

Changed to the docker-registry charm project. The ticket previously was filed under the wrong project.

affects: charm-aws-cloud-provider → layer-docker-registry
Revision history for this message
Andrew Liaw (aliaw) wrote :

Hi, are there any updates on this issue?

We have tried upgrade the `docker-registry` image to the latest version (2.8.3). The issue still persist.

Revision history for this message
Christopher Bartz (bartz) wrote :
Changed in layer-docker-registry:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Kevin W Monroe (kwmonroe)
milestone: none → 1.29+ck1
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Great find @bartz! It does sound like cache expiration could be the culprit here. This comment was particularly useful in finding a good option that we could use to disable the descriptor cache:

https://github.com/distribution/distribution/issues/2367#issuecomment-1874449361

PR for review:

https://github.com/canonical/docker-registry-charm/pull/68

Fwiw, another mitigation could be to adjust the ttl as @aliaw noted in the description. Looks like this is planned/committed, but it won't land until v3 later this year; requests to backport to v2.x were denied:

https://github.com/distribution/distribution/pull/3880
https://github.com/distribution/distribution/pull/4090

Revision history for this message
Kevin W Monroe (kwmonroe) wrote (last edit ):

If you'd like to give this a try while it's being reviewed, it's available as rev 74 in the candidate channel:

https://charmhub.io/docker-registry?channel=latest/candidate

Specifically, I think the fix for you will be to configure with storage-cache="disabled", like this:

juju deploy docker-registry --channel candidate --config storage-cache="disabled"

Changed in layer-docker-registry:
status: In Progress → Fix Committed
Revision history for this message
Yanks (charlie4284) wrote :

We've had it running for 12 days now - no particular issues were detected. Thank you for the fix!

Changed in layer-docker-registry:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.