juju 2.9.26 unable to deploy centos7

Bug #1964815 reported by Heitor
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Simon Richardson

Bug Description

On friday, everything was working fine for me.

Today, monday, Juju is broken:

```
$ juju deploy slurmd --series centos7 --debug
16:12:53 INFO juju.cmd supercommand.go:56 running juju [2.9.26 0 e7e941ad6e2c581fc4254417a944024fd2ad4aae gc go1.17.6]
16:12:53 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/18433/bin/juju", "deploy", "slurmd", "--series", "centos7", "--debug"}
16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [10.107.185.76:17070]
16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://10.107.185.76:17070/api"
16:12:53 INFO juju.api apiclient.go:688 connection established to "wss://10.107.185.76:17070/api"
16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [10.107.185.76:17070]
16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://10.107.185.76:17070/model/1e24842c-ba0b-4161-8143-d40aba4d7370/api"
16:12:53 INFO juju.api apiclient.go:688 connection established to "wss://10.107.185.76:17070/model/1e24842c-ba0b-4161-8143-d40aba4d7370/api"
16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [10.107.185.76:17070]
16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://10.107.185.76:17070/api"
16:12:53 INFO juju.api apiclient.go:688 connection established to "wss://10.107.185.76:17070/api"
16:12:53 DEBUG juju.cmd.juju.application.deployer deployer.go:387 cannot interpret as local charm: file does not exist
16:12:53 DEBUG juju.cmd.juju.application.deployer deployer.go:207 cannot interpret as a redeployment of a local charm from the controller
16:12:55 DEBUG juju.cmd.juju.application.store charmadapter.go:139 cannot interpret as charmstore bundle: panther (series) != "bundle"
16:12:55 INFO cmd charm.go:443 Preparing to deploy "slurmd" from the charmhub
16:12:57 DEBUG juju.api monitor.go:35 RPC connection died
ERROR series "centos7" not supported by charm, supported series are: panther,focal. Use --force to deploy the charm anyway.
16:12:57 DEBUG cmd supercommand.go:537 error stack:
/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/juju/application/deployer/charm.go:497: series "centos7" not supported by charm, supported series are: panther,focal. Use --force to deploy the charm anyway.

$ juju info slurmd
name: slurmd
charm-id: PNY1jWPbggzG1NZvpftYqLgg86qKyNtd
summary: |
  Slurmd, the compute node daemon of Slurm.
publisher: Omnivector Solutions
supports: focal, centos7
subordinate: false
store-url: https://charmhub.io/slurmd
description: |
  This charm provides slurmd, munged, and the bindings to other utilities
  that make lifecycle operations a breeze.

  slurmd is the compute node daemon of SLURM. It monitors all tasks running
  on the compute node, accepts work (tasks), launches tasks, and kills
  running tasks upon request.
relations:
  provides:
    slurmd: slurmd
  requires:
    fluentbit: fluentbit
channels: |
  latest/stable: 0.8.5 2022-01-14 (29) 5MB
  latest/candidate: ↑
  latest/beta: ↑
  latest/edge: 0.8.5 2022-01-14 (29) 5MB
```

I can't deploy any centos7 charm on my lxd cloud anymore. I changed NOTHING on my system.

Since last Friday, two snaps auto-updated:
- juju
- lxd

Tags: snap
Revision history for this message
Heitor (heitorpbittencourt) wrote :

juju add-machine --series centos7 also errors out.

Revision history for this message
John A Meinel (jameinel) wrote :

"supported series are: panther,focal" seems very surprising that it would mention "panther" as I'm pretty sure that is a Mac OS version (which might be something that your client knows about, but shouldn't be known to the server).

Revision history for this message
Heitor (heitorpbittencourt) wrote :

Oh, I did not know panther was Mac OS. This is very weird. Do you know of a workaround to fix this?

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1964815] Re: juju 2.9.26 unable to deploy centos7

I don't know of a workaround, but this line looks very wrong:
16:12:55 DEBUG juju.cmd.juju.application.store charmadapter.go:139 cannot
interpret as charmstore bundle: panther (series) != "bundle"

You've specified a series, but for some reason we are still using your
client series instead. And you aren't trying to deploy a local path
'slurmdb' vs './slurmdb', so there is no reason for us to assume anything
associated with your client.

On Tue, Mar 15, 2022 at 7:55 PM Heitor <email address hidden> wrote:

> Oh, I did not know panther was Mac OS. This is very weird. Do you know
> of a workaround to fix this?
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1964815
>
> Title:
> juju 2.9.26 unable to deploy centos7
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1964815/+subscriptions
>
>

Revision history for this message
Heitor (heitorpbittencourt) wrote :

Do you need any more logs/info to understand more the situation? Were
you able to reproduce this issue?

On Wed, 16 Mar 2022 14:29:57 -0000
John A Meinel <email address hidden> wrote:

> I don't know of a workaround, but this line looks very wrong:
> 16:12:55 DEBUG juju.cmd.juju.application.store charmadapter.go:139
> cannot interpret as charmstore bundle: panther (series) != "bundle"
>
> You've specified a series, but for some reason we are still using your
> client series instead. And you aren't trying to deploy a local path
> 'slurmdb' vs './slurmdb', so there is no reason for us to assume
> anything associated with your client.
>
>
> On Tue, Mar 15, 2022 at 7:55 PM Heitor <email address hidden>
> wrote:
>
> > Oh, I did not know panther was Mac OS. This is very weird. Do you
> > know of a workaround to fix this?
> >
> > --
> > You received this bug notification because you are subscribed to
> > juju. Matching subscriptions: juju bugs
> > https://bugs.launchpad.net/bugs/1964815
> >
> > Title:
> > juju 2.9.26 unable to deploy centos7
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/juju/+bug/1964815/+subscriptions
> >
> >
>

Revision history for this message
Harry Pidcock (hpidcock) wrote :

Are you using macOS on your client machine?

Revision history for this message
Heitor (heitorpbittencourt) wrote :
Download full text (3.9 KiB)

I am not. Using Ubuntu server 20.04 LTS. All snaps from latest/stable

On Wed, Mar 16, 2022, 21:20 Harry Pidcock <email address hidden>
wrote:

> Are you using macOS on your client machine?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1964815
>
> Title:
> juju 2.9.26 unable to deploy centos7
>
> Status in juju:
> New
>
> Bug description:
> On friday, everything was working fine for me.
>
> Today, monday, Juju is broken:
>
> ```
> $ juju deploy slurmd --series centos7 --debug
> 16:12:53 INFO juju.cmd supercommand.go:56 running juju [2.9.26 0
> e7e941ad6e2c581fc4254417a944024fd2ad4aae gc go1.17.6]
> 16:12:53 DEBUG juju.cmd supercommand.go:57 args:
> []string{"/snap/juju/18433/bin/juju", "deploy", "slurmd", "--series",
> "centos7", "--debug"}
> 16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [
> 10.107.185.76:17070]
> 16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://
> 10.107.185.76:17070/api"
> 16:12:53 INFO juju.api apiclient.go:688 connection established to
> "wss://10.107.185.76:17070/api"
> 16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [
> 10.107.185.76:17070]
> 16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://
> 10.107.185.76:17070/model/1e24842c-ba0b-4161-8143-d40aba4d7370/api"
> 16:12:53 INFO juju.api apiclient.go:688 connection established to
> "wss://10.107.185.76:17070/model/1e24842c-ba0b-4161-8143-d40aba4d7370/api"
> 16:12:53 INFO juju.juju api.go:78 connecting to API addresses: [
> 10.107.185.76:17070]
> 16:12:53 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://
> 10.107.185.76:17070/api"
> 16:12:53 INFO juju.api apiclient.go:688 connection established to
> "wss://10.107.185.76:17070/api"
> 16:12:53 DEBUG juju.cmd.juju.application.deployer deployer.go:387 cannot
> interpret as local charm: file does not exist
> 16:12:53 DEBUG juju.cmd.juju.application.deployer deployer.go:207 cannot
> interpret as a redeployment of a local charm from the controller
> 16:12:55 DEBUG juju.cmd.juju.application.store charmadapter.go:139
> cannot interpret as charmstore bundle: panther (series) != "bundle"
> 16:12:55 INFO cmd charm.go:443 Preparing to deploy "slurmd" from the
> charmhub
> 16:12:57 DEBUG juju.api monitor.go:35 RPC connection died
> ERROR series "centos7" not supported by charm, supported series are:
> panther,focal. Use --force to deploy the charm anyway.
> 16:12:57 DEBUG cmd supercommand.go:537 error stack:
>
> /build/snapcraft-juju-35d6cf/parts/juju/src/cmd/juju/application/deployer/charm.go:497:
> series "centos7" not supported by charm, supported series are:
> panther,focal. Use --force to deploy the charm anyway.
>
> $ juju info slurmd
> name: slurmd
> charm-id: PNY1jWPbggzG1NZvpftYqLgg86qKyNtd
> summary: |
> Slurmd, the compute node daemon of Slurm.
> publisher: Omnivector Solutions
> supports: focal, centos7
> subordinate: false
> store-url: https://charmhub.io/slurmd
> description: |
> This charm provides slurmd, munged, and the bindings to other utilities
> that make lifecycle...

Read more...

Revision history for this message
Heitor (heitorpbittencourt) wrote :

This problem still persists with the new 2.9.27 version.

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the metadata associated with the published charm, the series is wrong.

    series:
    - focal
    - panther

That's why juju is complaining about the only supported series being focal or panther.

This means that the charm has been incorrectly packed with an incorrect charmcraft manifest file.

$ juju info slurmd --format yaml
type: charm
id: PNY1jWPbggzG1NZvpftYqLgg86qKyNtd
name: slurmd
description: |
  This charm provides slurmd, munged, and the bindings to other utilities
  that make lifecycle operations a breeze.

  slurmd is the compute node daemon of SLURM. It monitors all tasks running
  on the compute node, accepts work (tasks), launches tasks, and kills
  running tasks upon request.
publisher: Omnivector Solutions
summary: |
  Slurmd, the compute node daemon of Slurm.
series:
- focal
- centos7
store-url: https://charmhub.io/slurmd
charm:
  config:
    options:
      custom-slurm-repo:
        type: string
        description: |
          Use a custom repository for Slurm installation.
          This can be set to the Organization's local mirror/cache of packages and supersedes the Omnivector repositories. Alternatively, it can be used to track a `testing` Slurm version, e.g. by setting to `ppa:omnivector/osd-testing` (on Ubuntu), or `https://omnivector-solutions.github.io/repo/centos7/stable/$basearch` (on CentOS).
          Note: The configuration `custom-slurm-repo` must be set *before* deploying the units. Changing this value after deploying the units will not reinstall Slurm.
        default: ""
      nhc-conf:
        type: string
        description: |
          Custom extra configuration to use for Node Health Check.
          These lines are appended to a basic `nhc.conf` provided by the charm.
        default: ""
      partition-config:
        type: string
        description: |
          Extra partition configuration, specified as a space separated `key=value` in a single line.
          Example usage: $ juju config slurmd partition-config="DefaultTime=45:00 MaxTime=1:00:00"
        default: ""
      partition-name:
        type: string
        description: |
          Name by which the partition may be referenced (e.g. `Interactive`).
          Note: the partition name should only contain letters, numbers, and hyphens. Spaces are not allowed.
      partition-state:
        type: string
        description: |
          State of partition or availability for use. Possible values are `UP`, `DOWN`, `DRAIN` and `INACTIVE`. The default value is `UP`. See also the related `Alternate` keyword.
        default: UP
  relations:
    provides:
      slurmd: slurmd
    requires:
      fluentbit: fluentbit
channel-map:
  latest/edge:
    released-at: "2022-01-14T14:16:26.076752+00:00"
    track: latest
    risk: edge
    revision: 29
    size: 5179454
    version: 0.8.5
    architectures:
    - amd64
    series:
    - focal
    - panther
  latest/stable:
    released-at: "2022-01-14T15:09:26.400335+00:00"
    track: latest
    risk: stable
    revision: 29
    size: 5179454
    version: 0.8.5
    architectures:
    - amd64
    series:
    - focal
    - panther
tracks:
- latest

Changed in juju:
status: New → Invalid
Revision history for this message
Heitor (heitorpbittencourt) wrote :

Hello Ian,

This points to an even deeper problem in the Juju ecosystem.

There is not one single "panther" string in the source of this charm, you can check it yourself: https://github.com/omnivector-solutions/slurm-charms

The manifest.yaml file generated by charmcraft appears to be correct, as it mentions centos. But after uploading to charmhub, this is turned into Mac OS.

So, how does the CharmHub page show this charm works for CentOS7, `juju info` shows this charm work for centos7, the source defines it to work for centos7, but juju can't deploy it anymore to centos7?

This charm is working on our production systems running centos7.

Can you help us understand what is going on and how to fix this situation?

Revision history for this message
Heitor (heitorpbittencourt) wrote :

Btw, if I pack the charm myself and deploy locally, it works:

$ juju deploy ./slurmd_ubuntu-20.04-amd64_centos-7-amd64.charm --series centos7 myslurmd
Located local charm "slurmd", revision 0
Deploying "myslurmd" from local charm "slurmd", revision 0

Revision history for this message
Ian Booth (wallyworld) wrote :

Not sure without trying it myself, I think the channel for centos might be wrong, it should be "centos7". What you have makes more sense but as I understand it, if we want to support those values, work is needed in juju (and maybe charmhub itself). So see if channel="centos7" works.

type: charm
bases:
  - build-on:
      - name: ubuntu
        channel: "20.04"
    run-on:
      - name: ubuntu
        channel: "20.04"
        architectures: [amd64]
      - name: centos
        channel: "7"
        architectures: [amd64]
parts:
  charm:
    build-packages: [git]
    charm-python-packages: [setuptools]

Revision history for this message
Heitor (heitorpbittencourt) wrote :

$ juju deploy slurmd --channel latest/edge --series centos
ERROR unknown OS for series: "centos"

$ juju deploy slurmd --channel latest/edge --series centos7
ERROR series "centos7" not supported by charm, supported series are: focal,panther. Use --force to deploy the charm anyway.

$ juju deploy ./slurmd_ubuntu-20.04-amd64_centos-7-amd64.charm --series centos7
Located local charm "slurmd", revision 2
Deploying "slurmd" from local charm "slurmd", revision 2

This appears to be a bug on the juju/charmhub server/backend? Where do we report bugs for those parts of Juju?

Revision history for this message
Joseph Phillips (manadart) wrote :

These are the current known name#channel#arch tuples across all of Charmhub excluding Ubuntu versions:

 centos#7#all
 centos#7#amd64
 windows#10#all
 windows#2012#all
 windows#2012hv#all
 windows#2012hvr2#all
 windows#2012r2#all
 windows#2016#all
 windows#2016hv#all
 windows#2016nano#all

We can see that there is consistency for the CentOS charms, so we should ensure Juju retrieves them correctly.

Changed in juju:
status: Invalid → New
importance: Undecided → High
Revision history for this message
james beedy (jamesbeedy) wrote :

Hey @manadart, thanks for chiming in here. Do you have any further insight as to how this happened? Like what code changed that could have introduced this breakage? I’m wondering specifically if this is the result of a snap auto-updating somewhere?

Revision history for this message
Joseph Phillips (manadart) wrote :

I haven't confirmed, but it looks like a change on the Charmhub side for these entries.

We were and still are querying like this:
https://pastebin.ubuntu.com/p/F5q3qhNS4f/

"centos7" being the channel that working before.

I'm submitting a patch to fix, but looking at the Slurm charms, I believe those will need to be re-uploaded without "series" in metadata.yaml.

There's a patch submitted to charmcraft to catch this in the linter:
https://github.com/canonical/charmcraft/pull/735

Revision history for this message
Heitor (heitorpbittencourt) wrote :

We can update the charms in the edge channel anytime. Candidate and Stable channels are a different story.

Is there a place that we can track this change in patched in Charmhub? Is going live in the next weeks?

Revision history for this message
Heitor (heitorpbittencourt) wrote :

We removed the series from the metadata and uploaded the new version of the charms to latest/edge.

Juju info slurmd --format yaml still shows panther instead of centos7.

Revision history for this message
james beedy (jamesbeedy) wrote (last edit ):

My gut feeling is that the charmstore is snapped software, someone pushed a new rev, the charmstore snap auto-updated and it broke our things, yet again.

I would love to be wrong here ...... possibly someone else can provide some insight as to how this has happened.

Can we simply roll back charmhub to the version that works?

Thanks

Revision history for this message
Ian Booth (wallyworld) wrote :

How charmstore is managed is not related to the issue. There's nothing to rollback that has anything related to the issue.

The issue here has been there from day one and is related to a change in the charm data model. Things worked before because centos charms had been published to the old store with channel set to "centos7" not "7". The import from the old store would have still used "centos7" but when the charm was repacked and published I think things got changed to "7". As per Joe's comment, things are now
 centos#7#all
 centos#7#amd64

Juju cannot resolve "7" to anything meaningful whereas it could with "centos7".

There are several pieces that need fixing. One was fixing the charm to remove series from metadata (better doc is needed here). Another is a juju fix to properly interpret the (osname,version) tuple from the charm data model.

The juju fix is being worked on.

Revision history for this message
Ian Booth (wallyworld) wrote :

Looks like this PR is a quick tactical fix to get things working quickly on the juju side

https://github.com/juju/juju/pull/13847

Changed in juju:
milestone: none → 2.9.29
assignee: nobody → Simon Richardson (simonrichardson)
status: New → Fix Committed
Revision history for this message
Heitor (heitorpbittencourt) wrote :

Thanks for the fix! Do you have an idea on when juju 2.9.29 will be released?

Revision history for this message
Heitor (heitorpbittencourt) wrote :

The published fix does not fix the case for the new metadata scheme. I added some comments in https://github.com/juju/juju/pull/13847 with what I got.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.