juju-core

Juju error on HP cloud with Maximum number of attempts (3) reached sending request

Bug #1279879 reported by Matt Bruzek on 2014-02-13

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Won't Fix	Medium	Unassigned

Bug Description

I am not sure if this is juju-core or juju-deployer.

I am writing some juju tests for the charm store and when I deploy multiple instances of ceph and multiple instances of ceph-osd generates an error in on the machines that is visible using juju status.

The error is on machine #7.
error: failed to get list of flavour details

caused by: Maximum number of attempts (3) reached sending request to https://az-1.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors/detail)'

This error is reproducible, but it is not always machine #7.

$ juju status
environment: hp-mbruzek
machines:
  "0":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.107.170
    instance-id: "3548727"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "1":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.119.227
    instance-id: "3548835"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "2":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.127.215
    instance-id: "3548839"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "3":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.89.123
    instance-id: "3548843"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "4":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.90.252
    instance-id: "3548847"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "5":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.100.131
    instance-id: "3548845"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "6":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.115.246
    instance-id: "3548851"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
  "7":
    agent-state-info: '(error: failed to get list of flavour details

      caused by: Maximum number of attempts (3) reached sending request to https://az-1.region-a.geo-1.c
ompute.hpcloudsvc.com/v1.1/17031369947864/flavors/detail)'
    instance-id: pending
    series: precise
  "8":
    agent-state: started
    agent-version: 1.17.2
    dns-name: 15.185.90.148
    instance-id: "3548853"
    instance-state: ACTIVE
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=30720M
services:
  ceph:
    charm: local:precise/ceph-104
    exposed: true
    relations:
      mon:
      - ceph
    units:
      ceph/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "1"
        public-address: 15.185.119.227
      ceph/1:
        agent-state: started
        agent-version: 1.17.2
        machine: "2"
        public-address: 15.185.127.215
      ceph/2:
        agent-state: started
        agent-version: 1.17.2
        machine: "3"
        public-address: 15.185.89.123
  ceph-osd:
    charm: local:precise/ceph-osd-14
    exposed: true
    units:
      ceph-osd/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "4"
        public-address: 15.185.90.252
      ceph-osd/1:
        agent-state: started
        agent-version: 1.17.2
        machine: "5"
        public-address: 15.185.100.131
      ceph-osd/2:
        agent-state: started
        agent-version: 1.17.2
        machine: "6"
        public-address: 15.185.115.246
  ceph-osd-sentry:
    charm: local:precise/ceph-osd-sentry-0
    exposed: true
  ceph-radosgw:
    charm: local:precise/ceph-radosgw-25
    exposed: true
    units:
      ceph-radosgw/0:
        agent-state: pending
        machine: "7"
  ceph-radosgw-sentry:
    charm: local:precise/ceph-radosgw-sentry-0
    exposed: true
  ceph-sentry:
    charm: local:precise/ceph-sentry-0
    exposed: true
  relation-sentry:
    charm: local:precise/relation-sentry-0
    exposed: true
    units:
      relation-sentry/0:
        agent-state: started
        agent-version: 1.17.2
        machine: "8"
        open-ports:
        - 9001/tcp
        public-address: 15.185.90.148

The test that I am running is an Amulet test to verify the ceph charm is working. I believe the following snippet will generate the error. To get amulet:
sudo add-apt-repostiory -y ppa:juju/stable
sudo apt-get update
sudo apt-get install -y amulet

#!/usr/bin/python3

# This amulet code tests the ceph charm.

import amulet

# The ceph units should be an odd number greater than 3.
scale = 3
# The number of seconds to wait for the environment to set up.
seconds = 900
# Hardcode a uuid for the ceph cluster.
fsid = 'ecbb8960-0e21-11e2-b495-83a88f44db01'
# A ceph-authtool key pregenerated for this test.
cephAuthKey = 'AQA2zfJSUNjaJBAAmxH/PBRkORMkexRD+2eEHg=='
# The device (directory) to use for block storage for the ceph charms.
ceph_device = '/srv/ceph'
# The havana version of ceph supports directories as devices!
havana = 'cloud:precise-updates/havana'

# Create a dictionary of configuration values for the ceph charms.
ceph_configuration = {
    'auth-supported': 'cephx',
    'fsid': fsid,
    'monitor-count': 3,
    'monitor-secret': cephAuthKey,
    'osd-devices': ceph_device,
    'osd-journal': ceph_device,
    'osd-journal-size': 2048,
    'osd-format': 'ext4',
    'osd-reformat': 'yes', # Setting this value to anything will reformat.
    'source': havana
}
# The device (directory) to use for block storage for the osdL charms.
osd_device = '/srv/osd'
# Create a configuration dictionary for ceph-osd charms.
osd_configuration = {
    'osd-devices': osd_device,
    'source': havana
}
rados_configuration = {
    'source': havana
}

d = amulet.Deployment()
# Add the number of units of ceph to the deployment.
d.add('ceph', units=scale)
# Add the number of ceph-osd units to the deployment
d.add('ceph-osd', units=scale)
# Add ceph-radosgw charm to the deployment.
d.add('ceph-radosgw')
# The ceph charm requires configuration to deploy successfully.
d.configure('ceph', ceph_configuration)
# The ceph-osd charm requires configuration to deploy correctly.
d.configure('ceph-osd', osd_configuration)
# Configure the ceph-radosgw charm with the same version of openstack
d.configure('ceph-radosgw', rados_configuration)
# Relate ceph and ceph-osd.
d.relate('ceph:osd', 'ceph-osd:mon')
# Relate ceph and ceph-radosgw
d.relate('ceph:radosgw', 'ceph-radosgw:mon')
# Expose ceph
d.expose('ceph')
# Expose ceph-osd
d.expose('ceph-osd')
# Expose ceph-radosgw
d.expose('ceph-radosgw')

# Perform deployment.
try:
    d.setup(timeout=seconds)
    d.sentry.wait(seconds)
except amulet.helpers.TimeoutError:
    message = 'The environment did not setup in %d seconds.' % seconds
    amulet.raise_status(amulet.SKIP, msg=message)
except:
    raise
print('The ceph units successfully deployed!')

The test times out because there was an error with one or more of the machines. Unable to ssh to the error machine because it does not have a public ip address. Therefore getting the logs from the machine is not possible.

Is there any more information that would be helpful to this bug?

See original description

Tags:

Revision history for this message

Matt Bruzek (mbruzek) wrote on 2014-02-13:

I ran this test again and machine #2 failed. This is the all-machines.log from the bootstrap node. I was unable to contact machine 2 to download it's log files. Edit (271.1 KiB, text/plain)

Matt Bruzek (mbruzek) on 2014-02-13

description:

updated

Curtis Hovey (sinzui) on 2014-02-13

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 1.18.0

Matt Bruzek (mbruzek) on 2014-02-13

tags:

added: audit

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2014-03-04:

I'm seeing this on EC2 when I deploy a handful of machines using juju-deployer.

machines:
  "0":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-50-18-247-146.us-west-1.compute.amazonaws.com
    instance-id: i-2091057c
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "1":
    agent-state-info: '(error: cannot set up groups: Request limit exceeded. (RequestLimitExceeded))'
    instance-id: pending
    series: precise
  "2":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-54-219-107-61.us-west-1.compute.amazonaws.com
    instance-id: i-819e0add
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "3":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-204-236-184-129.us-west-1.compute.amazonaws.com
    instance-id: i-ad9105f1
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "4":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-50-18-99-36.us-west-1.compute.amazonaws.com
    instance-id: i-ac9105f0
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M
  "5":
    agent-state: started
    agent-version: 1.17.4
    dns-name: ec2-54-219-226-138.us-west-1.compute.amazonaws.com
    instance-id: i-58930704
    instance-state: running
    series: precise
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M

Revision history for this message

Peter Petrakis (peter-petrakis) wrote on 2014-03-04:

Strike last, I'm actually impacted by bug #1227450

Canonical Juju QA Bot (juju-qa-bot) on 2014-05-12

Changed in juju-core:
milestone:	1.20.0 → next-stable

Raghuram Kota (rkota) on 2014-06-09

tags:	added: hs-arm64
tags:	added: arm64

Curtis Hovey (sinzui) on 2014-11-04

Changed in juju-core:
importance:	High → Medium
milestone:	1.21 → none

Revision history for this message

Andrew Cloke (andrew-cloke) wrote on 2016-04-11:

The doors have closed on HP Cloud...

Curtis Hovey (sinzui) on 2016-04-12

Changed in juju-core:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

I ran this test again and machine #2 failed. This is the all-machines.log from the bootstrap node. I was unable to contact machine 2 to download it's log files. Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.