Juju error on HP cloud with Maximum number of attempts (3) reached sending request

Bug #1279879 reported by Matt Bruzek
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Won't Fix
Medium
Unassigned

Bug Description

I am not sure if this is juju-core or juju-deployer.

I am writing some juju tests for the charm store and when I deploy multiple instances of ceph and multiple instances of ceph-osd generates an error in on the machines that is visible using juju status.

The error is on machine #7.
error: failed to get list of flavour details

      caused by: Maximum number of attempts (3) reached sending request to https://az-1.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors/detail)'

This error is reproducible, but it is not always machine #7.

$ juju status
environment: hp-mbruzek
machines:
  "0":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.107.170
    instance-id: "3548727"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "1":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.119.227
    instance-id: "3548835"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "2":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.127.215
    instance-id: "3548839"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "3":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.89.123
    instance-id: "3548843"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "4":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.90.252
    instance-id: "3548847"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "5":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.100.131
    instance-id: "3548845"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "6":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.115.246
    instance-id: "3548851"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "7":
    agent-state-info: '(error: failed to get list of flavour details

      caused by: Maximum number of attempts (3) reached sending request to https://az-1.region-a.geo-1.c
ompute.hpcloudsvc.com/v1.1/17031369947864/flavors/detail)'
    instance-id: pending
    series: precise
  "8":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.90.148
    instance-id: "3548853"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
services:
  ceph:
    charm: local:precise/ceph-104
    exposed: true
    relations:
      mon:
      - ceph
    units:
      ceph/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "1"
        public-address: 15.185.119.227
      ceph/1:
        agent-state: started
        agent-version: 1.17.2
        machine: "2"
        public-address: 15.185.127.215
      ceph/2:
        agent-state: started
        agent-version: 1.17.2
        machine: "3"
        public-address: 15.185.89.123
  ceph-osd:
    charm: local:precise/ceph-osd-14
    exposed: true
    units:
      ceph-osd/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "4"
        public-address: 15.185.90.252
      ceph-osd/1:
        agent-state: started
        agent-version: 1.17.2
        machine: "5"
        public-address: 15.185.100.131
      ceph-osd/2:
        agent-state: started
        agent-version: 1.17.2
        machine: "6"
        public-address: 15.185.115.246
  ceph-osd-sentry:
    charm: local:precise/ceph-osd-sentry-0
    exposed: true
  ceph-radosgw:
    charm: local:precise/ceph-radosgw-25
    exposed: true
    units:
      ceph-radosgw/0:
        agent-state: pending
        machine: "7"
  ceph-radosgw-sentry:
    charm: local:precise/ceph-radosgw-sentry-0
    exposed: true
  ceph-sentry:
    charm: local:precise/ceph-sentry-0
    exposed: true
  relation-sentry:
    charm: local:precise/relation-sentry-0
    exposed: true
    units:
      relation-sentry/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "8"
        open-ports:
        - 9001/tcp
        public-address: 15.185.90.148

The test that I am running is an Amulet test to verify the ceph charm is working. I believe the following snippet will generate the error. To get amulet:
 sudo add-apt-repostiory -y ppa:juju/stable
 sudo apt-get update
 sudo apt-get install -y amulet

#!/usr/bin/python3

# This amulet code tests the ceph charm.

import amulet

# The ceph units should be an odd number greater than 3.
scale = 3
# The number of seconds to wait for the environment to set up.
seconds = 900
# Hardcode a uuid for the ceph cluster.
fsid = 'ecbb8960-0e21-11e2-b495-83a88f44db01'
# A ceph-authtool key pregenerated for this test.
cephAuthKey = 'AQA2zfJSUNjaJBAAmxH/PBRkORMkexRD+2eEHg=='
# The device (directory) to use for block storage for the ceph charms.
ceph_device = '/srv/ceph'
# The havana version of ceph supports directories as devices!
havana = 'cloud:precise-updates/havana'

# Create a dictionary of configuration values for the ceph charms.
ceph_configuration = {
    'auth-supported': 'cephx',
    'fsid': fsid,
    'monitor-count': 3,
    'monitor-secret': cephAuthKey,
    'osd-devices': ceph_device,
    'osd-journal': ceph_device,
    'osd-journal-size': 2048,
    'osd-format': 'ext4',
    'osd-reformat': 'yes', # Setting this value to anything will reformat.
    'source': havana
}
# The device (directory) to use for block storage for the osdL charms.
osd_device = '/srv/osd'
# Create a configuration dictionary for ceph-osd charms.
osd_configuration = {
    'osd-devices': osd_device,
    'source': havana
}
rados_configuration = {
    'source': havana
}

d = amulet.Deployment()
# Add the number of units of ceph to the deployment.
d.add('ceph', units=scale)
# Add the number of ceph-osd units to the deployment
d.add('ceph-osd', units=scale)
# Add ceph-radosgw charm to the deployment.
d.add('ceph-radosgw')
# The ceph charm requires configuration to deploy successfully.
d.configure('ceph', ceph_configuration)
# The ceph-osd charm requires configuration to deploy correctly.
d.configure('ceph-osd', osd_configuration)
# Configure the ceph-radosgw charm with the same version of openstack
d.configure('ceph-radosgw', rados_configuration)
# Relate ceph and ceph-osd.
d.relate('ceph:osd', 'ceph-osd:mon')
# Relate ceph and ceph-radosgw
d.relate('ceph:radosgw', 'ceph-radosgw:mon')
# Expose ceph
d.expose('ceph')
# Expose ceph-osd
d.expose('ceph-osd')
# Expose ceph-radosgw
d.expose('ceph-radosgw')

# Perform deployment.
try:
    d.setup(timeout=seconds)
    d.sentry.wait(seconds)
except amulet.helpers.TimeoutError:
    message = 'The environment did not setup in %d seconds.' % seconds
    amulet.raise_status(amulet.SKIP, msg=message)
except:
    raise
print('The ceph units successfully deployed!')

The test times out because there was an error with one or more of the machines. Unable to ssh to the error machine because it does not have a public ip address. Therefore getting the logs from the machine is not possible.

Is there any more information that would be helpful to this bug?

Revision history for this message
Matt Bruzek (mbruzek) wrote :
Matt Bruzek (mbruzek)
description: updated
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.18.0
Matt Bruzek (mbruzek)
tags: added: audit
Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

I'm seeing this on EC2 when I deploy a handful of machines using juju-deployer.

machines:
  "0":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-50-18-247-146.us-west-1.compute.amazonaws.com
    instance-id: i-2091057c
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "1":
    agent-state-info: '(error: cannot set up groups: Request limit exceeded. (RequestLimitExceeded))'
    instance-id: pending
    series: precise
  "2":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-54-219-107-61.us-west-1.compute.amazonaws.com
    instance-id: i-819e0add
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "3":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-204-236-184-129.us-west-1.compute.amazonaws.com
    instance-id: i-ad9105f1
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "4":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-50-18-99-36.us-west-1.compute.amazonaws.com
    instance-id: i-ac9105f0
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "5":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-54-219-226-138.us-west-1.compute.amazonaws.com
    instance-id: i-58930704
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Strike last, I'm actually impacted by bug #1227450

Changed in juju-core:
milestone: 1.20.0 → next-stable
Raghuram Kota (rkota)
tags: added: hs-arm64
tags: added: arm64
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
milestone: 1.21 → none
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

The doors have closed on HP Cloud...

Curtis Hovey (sinzui)
Changed in juju-core:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.