hook error on deployment on power8: mon-relation-changed

Bug #1581134 reported by Matt Rae
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph-osd (Juju Charms Collection)
Expired
Low
Unassigned

Bug Description

Hi, deploying ceph on power8 we are consistently seeing a ceph-osd hook error 'mon-relation-changed' on each ceph-osd node.

Here's the error from the ceph-osd agent log. We see that it seems to be trying to prepare a disk that is already up and running as an osd.

2016-05-11 14:39:47 INFO mon-relation-changed ceph-disk: Error: Device is mounted: /dev/sdl1
2016-05-11 14:39:47 INFO worker.uniter.jujuc server.go:172 running hook tool "juju-log" ["-l" "ERROR" "Unable to initialize device: /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:0:0"]
2016-05-11 14:39:47 ERROR juju-log mon:49: Unable to initialize device: /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:0:0
2016-05-11 14:39:47 INFO mon-relation-changed Traceback (most recent call last):
2016-05-11 14:39:47 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 260, in <module>
2016-05-11 14:39:47 INFO mon-relation-changed hooks.execute(sys.argv)
2016-05-11 14:39:47 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
2016-05-11 14:39:47 INFO mon-relation-changed self._hooks[hook_name]()
2016-05-11 14:39:47 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 201, in mon_relation
2016-05-11 14:39:47 INFO mon-relation-changed config('ignore-device-errors'))
2016-05-11 14:39:47 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/ceph.py", line 401, in osdize
2016-05-11 14:39:47 INFO mon-relation-changed osdize_dev(dev, osd_format, osd_journal, reformat_osd, ignore_errors)
2016-05-11 14:39:47 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/ceph.py", line 450, in osdize_dev
2016-05-11 14:39:47 INFO mon-relation-changed raise e
2016-05-11 14:39:47 INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:0:0']' returned non-zero exit status 1
2016-05-11 14:39:47 INFO juju.worker.uniter.context context.go:579 handling reboot
2016-05-11 14:39:47 ERROR juju.worker.uniter.operation runhook.go:107 hook "mon-relation-changed" failed: exit status 1

Full ceph-osd agent log here: http://paste.ubuntu.com/16378059/

 ceph-osd:
    charm: cs:trusty/ceph-osd-232
    can-upgrade-to: cs:trusty/ceph-osd-233
    exposed: false
    service-status:
      current: error
      message: 'hook failed: "mon-relation-changed"'
      since: 11 May 2016 09:39:48-05:00
    relations:
      mon:
      - ceph
    units:
      ceph-osd/0:
        workload-status:
          current: error
          message: 'hook failed: "mon-relation-changed" for ceph:osd'
          since: 11 May 2016 09:39:48-05:00
        agent-status:
          current: idle
          since: 11 May 2016 09:39:48-05:00
          version: 1.25.5
        agent-state: error
        agent-state-info: 'hook failed: "mon-relation-changed" for ceph:osd'
        agent-version: 1.25.5
        machine: "4"
        public-address: TCFTH32D0034.maas
      ceph-osd/1:
        workload-status:
          current: error
          message: 'hook failed: "mon-relation-changed" for ceph:osd'
          since: 29 Apr 2016 15:26:05-05:00
        agent-status:
          current: lost
          message: agent is not communicating with the server
          since: 29 Apr 2016 15:26:05-05:00
          version: 1.25.5
        agent-state: error
        agent-state-info: 'hook failed: "mon-relation-changed" for ceph:osd'
        agent-version: 1.25.5
        machine: "5"
        public-address: TCFTH32D0065.maas

Bundle sections:

ceph:
    annotations:
      gui-x: '750'
      gui-y: '500'
    charm: cs:trusty/ceph
    num_units: 3
    options:
      fsid: cbd8508e-d726-4785-bff9-2fbf4af2df61
      monitor-secret: AQDxeg9XUMRsKRAAyfClczi5hEV/3j0CuIN8dA==
      osd-devices: '/dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:0:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:1:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:2:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:3:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:4:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:5:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:6:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:7:0'
      osd-reformat: 'yes'
      source: cloud:trusty-liberty
    to:
    - '1'
    - '2'
    - '3'

ceph-osd:
    annotations:
      gui-x: '1000'
      gui-y: '500'
    charm: cs:trusty/ceph-osd
    num_units: 2
    options:
      osd-devices: '/dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:0:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:1:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:2:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:3:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:4:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:5:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:6:0 /dev/disk/by-path/pci-0002:01:00.0-scsi-0:1:7:0'
      osd-reformat: 'yes'
      source: cloud:trusty-liberty
    to:
    - '4'
    - '5'

Tags: cpec
Matt Rae (mattrae)
tags: added: cpec
Revision history for this message
Chris Holcombe (xfactor973) wrote :

Looks like the error that surfaced is that the disk was mounted and the mkfs failed. Which version of the ceph-osd charm are you using? The newest version has better detection around mounted drives.

Revision history for this message
James Page (james-page) wrote :

Marking 'Incomplete' pending a response to comment #1.

Set back to 'New' once provided.

Changed in ceph-osd (Juju Charms Collection):
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ceph-osd (Juju Charms Collection) because there has been no activity for 60 days.]

Changed in ceph-osd (Juju Charms Collection):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.