unable to download local: charm due to hash mismatch in multi-model deployment

Bug #1541482 reported by James Page
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Menno Finlay-Smits
OpenStack Charm Test Infra
Fix Released
High
Ryan Beisner
juju-core
Fix Released
High
Menno Finlay-Smits

Bug Description

This may be related to bug 1541479; I had a subordinate service unit spinning with:

l:trusty/midonet-agent-1": failed to download charm "local:trusty/midonet-agent-1" from ["https://10.0.3.59:17070/environment/525a7728-251f-4cff-8eea-c81ef97ecf29/charms?file=%2A&url=local%3Atrusty%2Fmidonet-agent-1"]: expected sha256 "81b5a5312401857142328613b6c2889857dee9815626c3a20189c7231bf6575f", got "3c1e49f3989282b242efeebb2f8544b4dcede0c38b9c9680e3d1adfc62fb33f2"

I upgraded the charm in the model a few times, and this resolved; I had previously deployed midonet-agent (and changed it) in a different model on the same controller - its a modified local copy of a charm so maybe the namespacing of charms is not related to models they are deployed in?

That may be 2+2 = 5 but that's what it felt like.

James Page (james-page)
summary: - unable to download charm due to hash mismatch
+ unable to download charm due to hash mismatch in multi-model deployment
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta1
tags: added: juju-release-support
Revision history for this message
James Page (james-page) wrote : Re: unable to download charm due to hash mismatch in multi-model deployment

Adding some more information - this only impacts charms deployed from local source, rather than the charm store.

cs:trusty/nova-cloud-controller-1 is always the same
local:trusty/nova-cloud-controller-1 might not be depending on the model

summary: - unable to download charm due to hash mismatch in multi-model deployment
+ unable to download local: charm due to hash mismatch in multi-model
+ deployment
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta1 → 2.0-beta2
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta2 → 2.0-beta3
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta3 → 2.0-beta4
tags: added: 2.0-count
Changed in juju-core:
assignee: nobody → Menno Smits (menno.smits)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I can reproduce this. Here's a simple repro:

juju bootstrap ...

juju deploy --series trusty ~/canonical/repository/trusty/ubuntu
# Wait for deploy to finish (perhaps not necessary)

juju create-model foo

# Make a trivial change to the charm but *don't* bump the charm
# revision. Changing the description text is enough.

juju deploy --series trusty ~/canonical/repository/trusty/ubuntu

# When the new unit agent comes up, the uniter will fail with the hash
# mismatch error.

Digging more...

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The problem was that the charm download API had a cache which didn't distinguish between charms across models. I have a fix for this but dealing with that problem has lead to a discovery of another deeper problem with charm storage in multi-model scenarios. I'm looking at that now.

Revision history for this message
Matt Bruzek (mbruzek) wrote :

I am also seeing this problem on 2.0-beta3-xenial-amd64 with AWS cloud provider.

For me the error message is repeated over and over again in the unit-log:

2016-04-11 20:43:53 ERROR juju.worker.dependency engine.go:509 "metric-collect" manifold worker returned unexpected error: failed to read charm from: /var/lib/juju/agents/unit-kubernetes-0/charm: stat /var/lib/juju/agents/unit-kubernetes-0/charm: no such file or directory
2016-04-11 20:43:55 ERROR juju.worker.dependency engine.go:509 "uniter" manifold worker returned unexpected error: preparing operation "install local:trusty/kubernetes-0": failed to download charm "local:trusty/kubernetes-0" from ["https://172.31.41.147:17070/model/32a6078b-a957-4192-8dd2-e60bfa4fc03c/charms?file=%2A&url=local%3Atrusty%2Fkubernetes-0"]: expected sha256 "bbe9d919ea6f7ccdd3a13e54a19c9f04682ff18cc74ead4ea34bda4a8f5a2441", got "243e74a58d288336dcee0843a300f290f73832d301a47b576bb3bf20115cbe37"

This is a locally built and deployed charm to AWS, don't know if that matters.

Revision history for this message
Matt Bruzek (mbruzek) wrote :

A juju status shows all these units: Waiting for agent initialization to finish

It seems like juju is downloading them in a loop when the sha256 does not match, which may fill up the disk space eventually. I have not waited this long yet, but that is a problem that also needs to be addressed.

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Running the command: juju upgrade-charm kubernetes Seems to work-around this error. Perhaps it refreshes the shasum?

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

@mbruzek: as per my earlier comment, I have fixed this issue. Unfortunately fixing the problem lead to the discovery of another issue. The fixes for these problems need to be landed together.

See bug 1569054 for details on the other issue.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :
Changed in juju-core:
milestone: 2.0-beta4 → 2.0-rc1
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta5 → none
milestone: none → 2.0-beta5
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Download full text (4.1 KiB)

We've started to see a frequent and disruptive occurrence of this in OpenStack CI, using Juju 1.25.6.

The juju status shows unit's agent-state as 'failed', and inspection of the unit's machine log show sha256 mismatch in retrieving the charm from the controller.

A few examples with artifacts and logs (we have a lot more available if needed):

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-nova-cloud-controller/375028/11/146/index.html

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-ceph/375009/14/189/index.html

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-swift-proxy/375035/11/173/index.html

2016-10-07 23:11:48 INFO juju.worker.uniter modes.go:567 ModeContinue starting
2016-10-07 23:11:48 INFO juju.worker.uniter modes.go:59 resuming charm install
2016-10-07 23:11:48 INFO juju.worker.uniter modes.go:569 ModeContinue exiting
2016-10-07 23:11:48 INFO juju.worker.uniter modes.go:567 ModeInstalling local:trusty/nova-cloud-controller-501 starting
2016-10-07 23:11:48 INFO juju.worker.uniter.operation executor.go:69 running operation install local:trusty/nova-cloud-controller-501
2016-10-07 23:11:48 INFO juju.worker.uniter.operation executor.go:103 preparing operation "install local:trusty/nova-cloud-controller-501"
2016-10-07 23:11:48 INFO juju.worker.uniter.charm bundles.go:60 downloading local:trusty/nova-cloud-controller-501 from https://172.17.100.7:17070/environment/9dca5c31-16ed-437f-875e-e07c3452f4e5/charms?file=%2A&url=local%3Atrusty%2Fnova-cloud-controller-501
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:461 got address change
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:422 no config change seen yet, skipping config event
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:502 got storage change
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:438 got unit change
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:446 got service change
2016-10-07 23:11:48 DEBUG juju.worker.uniter.filter filter.go:717 charm check skipped, not yet installed.
2016-10-07 23:11:48 INFO juju.worker.uniter.charm bundles.go:69 download complete
2016-10-07 23:11:48 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] failed: install local:trusty/nova-cloud-controller-501
2016-10-07 23:11:48 INFO juju.worker.uniter modes.go:569 ModeInstalling local:trusty/nova-cloud-controller-501 exiting
2016-10-07 23:11:48 INFO juju.worker.uniter uniter.go:203 unit "nova-cloud-controller/1" shutting down: ModeInstalling local:trusty/nova-cloud-controller-501: preparing operation "install local:trusty/nova-cloud-controller-501": failed to download charm "local:trusty/nova-cloud-controller-501" from ["https://172.17.100.7:17070/environment/9dca5c31-16ed-437f-875e-e07c3452f4e5/charms?file=%2A&url=local%3Atrusty%2Fnova-cloud-controller-501"]: expected sha256 "d1d568bab3374bb76a0026a61f860b96be257daf44cef8cf6c7f7e63f5509ca4", got "c810a99450727181f0f9384de313ec6e18567c1d3a0b0f1ec86c2abfbc076c2e"
2016-10-07 23:11:48 DEBUG juju.worker.uniter runlistener.go:109 juju...

Read more...

Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI, in those CI artifact links:

First, under the 'test_charm_amulet_full' section, have a look at the juju-stat-tabular-collect.txt to determine which unit(s) are affected (FOO).

Then see FOO-var-log.tar.bz2, /etc/juju/unit-FOO logs, where the sha256 mismatch error will be.

tags: added: uosci
Changed in juju-core:
milestone: none → 1.25.7
assignee: nobody → Menno Smits (menno.smits)
importance: Undecided → High
status: New → Triaged
no longer affects: juju-core/1.25
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

It turns out this has likely already been fixed in the 1.25 branch (by me) but the fixes are awaiting a 1.25.7 release.

Keeping the ticket open to track the issue and ensure that it's definitely dealt with.

Changed in juju-core:
status: Triaged → In Progress
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Ryan Beisner (1chb1n)
Changed in charm-test-infra:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Ryan Beisner (1chb1n)
Ryan Beisner (1chb1n)
Changed in charm-test-infra:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.