mojo spec for capomastro currently failing

Bug #1486615 reported by Tom Haddon
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Capomastro
Fix Released
High
Unassigned

Bug Description

As of 2015-08-19 we have a consistent failure on the mojo spec which seems to be in rabbitmq - https://ci.admin.canonical.com/view/live-ols-jobs/job/live-pes-capomastro/lastFailedBuild/console

Can someone take a look and fix as appropriate? We're now running daily builds of "live" mojo specs, including capomastro

Tags: spec
Daniel Manrique (roadmr)
tags: added: spec
Changed in capomastro:
status: New → Confirmed
importance: Undecided → High
milestone: none → 2015-08
Revision history for this message
Daniel Manrique (roadmr) wrote :

Rabbitmq can't start because it has a bogus RABBITMQ_NODENAME:

2015-08-20 01:55:53 INFO config-changed - unable to connect to epmd on ps45-10-25-62-20: nxdomain (non-existing domain)

This is done earlier during the process via apparently a juju set:

2015-08-20 01:55:46 INFO juju-log local nodename: ps45-10-25-62-20
2015-08-20 01:55:46 INFO juju-log configuring nodename
2015-08-20 01:55:46 INFO juju-log forcing nodename=ps45-10-25-62-20
2015-08-20 01:55:46 INFO juju-log Stopping rabbitmq-server.
2015-08-20 01:55:46 INFO config-changed * Stopping message broker rabbitmq-server
2015-08-20 01:55:49 INFO config-changed ...done.
2015-08-20 01:55:49 INFO juju-log Updating /etc/rabbitmq/rabbitmq-env.conf, RABBITMQ_NODENAME=rabbit@ps45-10-25-62-20
2015-08-20 01:55:49 INFO juju-log Starting rabbitmq-server.
2015-08-20 01:55:49 INFO config-changed * Restarting message broker rabbitmq-server
2015-08-20 01:55:51 INFO config-changed ...fail!

Note that initially, the value of RABBITMQ_NODENAME is something like rabbit@$HOSTNAME, and $HOSTNAME Usually resolves correctly.

Revision history for this message
Daniel Manrique (roadmr) wrote :

This looks like some evil dance between prodstack host naming and the rabbitmq charm.

See in the log how it does this:

2015-08-20 01:55:46 INFO juju-log local nodename: ps45-10-25-62-20
2015-08-20 01:55:46 INFO juju-log configuring nodename
2015-08-20 01:55:46 INFO juju-log forcing nodename=ps45-10-25-62-20
2015-08-20 01:55:46 INFO juju-log Stopping rabbitmq-server.
2015-08-20 01:55:46 INFO config-changed * Stopping message broker rabbitmq-server
2015-08-20 01:55:49 INFO config-changed ...done.
2015-08-20 01:55:49 INFO juju-log Updating /etc/rabbitmq/rabbitmq-env.conf, RABBITMQ_NODENAME=rabbit@ps45-10-25-62-20
2015-08-20 01:55:49 INFO juju-log Starting rabbitmq-server.
2015-08-20 01:55:49 INFO config-changed * Restarting message broker rabbitmq-server
2015-08-20 01:55:51 INFO config-changed ...fail!

The first three lines come from the rabbitmq charm's configure_nodename method:

def configure_nodename():
    '''Set RABBITMQ_NODENAME to something that's resolvable by my peers'''
    nodename = get_local_nodename()
    log('configuring nodename', level=INFO)
    if (nodename and
            rabbit.get_node_name() != 'rabbit@%s' % nodename):
        log('forcing nodename=%s' % nodename, level=INFO)
        # would like to have used the restart_on_change decorator, but
        # need to stop it under current nodename prior to updating env
        log('Stopping rabbitmq-server.')
        service_stop('rabbitmq-server')
        rabbit.update_rmq_env_conf(hostname='rabbit@%s' % nodename,
                                   ipv6=config('prefer-ipv6'))
        log('Starting rabbitmq-server.')
        service_restart('rabbitmq-server')

get_local_nodename has this:
def get_local_nodename():
    '''Resolve local nodename into something that's universally addressable'''
    ip_addr = get_host_ip(unit_get('private-address'))
    log('getting local nodename for ip address: %s' % ip_addr, level=INFO)
    try:
        nodename = get_hostname(ip_addr, fqdn=False)
    except:
        log('Cannot resolve hostname for %s using DNS servers' % ip_addr,
            level='WARNING')
        log('Falling back to use socket.gethostname()',
            level='WARNING')
        # If the private-address is not resolvable using DNS
        # then use the current hostname
        nodename = socket.gethostname()
    log('local nodename: %s' % nodename, level=INFO)
    return nodename

These changes that broke nodename configuration were introduced in revno 100 of the rabbitmq charm. Coincidentally, our runs started failing the same day that revno appeared, and our spec doesn't pin a specific revno, so:

1- We got revno 100 19 days ago
2- Our runs broke

So the quick fix for this is to pin revno 99 in our spec. I'll post a link to the bug on the charm once I file it.

Changed in capomastro:
status: Confirmed → Triaged
Revision history for this message
Daniel Manrique (roadmr) wrote :

See https://bugs.launchpad.net/charms/+source/rabbitmq-server/+bug/1487217 for the rabbitmq charm.

I'll fix our spec by pinning to revno 99 for the time being.

Changed in capomastro:
status: Triaged → In Progress
Revision history for this message
Daniel Manrique (roadmr) wrote :

Once I pinned the revno, I get another error:

https://ci.admin.canonical.com/view/live-ols-jobs/job/live-pes-capomastro/142/console

2015-08-20 22:25:25 ERROR juju-log block-storage:3: Error: Multiple volumes are associated with postgresql/0 prodstack-zone-1 volume. Cannot get_volume_id.
2015-08-20 22:25:25 ERROR juju.worker.uniter.operation runhook.go:86 hook "block-storage-relation-changed" failed: exit status 1
This is the table of volumes in that environment:

https://pastebin.canonical.com/138112/

The get_volume_id method in the block_storage_broker charm tries to determine which volume needs to be associated, but does so with a simple "A in B" statement. In the table, there are two entries which contain what we want ("postgresql/0 prodstack-zone-1 volume"): the other one is "spi-postgresql/0 prodstack-zone-1 volume".

I notice that other volumes are prefixed (spi-, grafana-, blah-), I'll try to figure out how to do something similar with capomastro so the volume matching works.

Revision history for this message
Daniel Manrique (roadmr) wrote :

The block-storage problem seems to be a bug in the block-storage-broker charm:

https://bugs.launchpad.net/charms/+source/block-storage-broker/+bug/1487636

Tracking that one to see if the charm provides a fix or we need to change something on our side.

Revision history for this message
Daniel Manrique (roadmr) wrote :

OK, capomastro CI run is now passing.

The volume naming issue with storage/block-storage-broker was fixed by adding a volume_name option to the storage charm and using that in the capomastro spec. Now there should be no conflict in volume names.

Changed in capomastro:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.