Juju Charms Collection
nova-cloud-controller package

nova-cloud-controller db sync sometimes fails in HA mode

Bug #1335139 reported by Alexander List on 2014-06-27

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Nova Cloud Controller Charm	Fix Released	High	Unassigned
	nova-cloud-controller (Juju Charms Collection)	Invalid	High	Unassigned	Juju Charms Collection 17.01

Bug Description

We are deploying openstack infrastructure components (trusty/icehouse) to LXC containers on 3 physical machines provided by MAAS (full smoosh).

When deploying nova-cloud-controller, I got an error from hook shared-db-relation-changed that is caused by nova-manage db sync which tries to create a table that already exists.

My suspicion is that another instance of nova-cloud-controller already created table "instances", so a subsequent CREATE TABLE fails.

This will have to be fixed upstream by changing this to CREATE TABLE IF NOT EXISTS or even more HA awareness, to skip DB generation entirely if it exists already.

Tags:

Revision history for this message

Alexander List (alexlist) wrote on 2014-06-27:

debug-hooks output of shared-db-relation-changed Edit (10.0 KiB, text/plain)

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2014-07-04:

unit-nova-cloud-controller-0.log Edit (146.6 KiB, text/plain)

I think are hitting this bug too. Attached is my log.

tags:

added: landscape

Andreas Hasenack (ahasenack) on 2014-07-17

tags:

added: cloud-installer

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2014-07-22:

I filed https://bugs.launchpad.net/charms/+source/nova-cloud-controller/+bug/1347245 with another backtrace I had, this might be a different bug.

Revision history for this message

James Page (james-page) wrote on 2014-07-23:

The db-changed hook is gated to ensure that only one unit runs the db-sync task:

@hooks.hook('shared-db-relation-changed')
@restart_on_change(restart_map())
def db_changed():
    if 'shared-db' not in CONFIGS.complete_contexts():
        log('shared-db relation incomplete. Peer not ready?')
        return
    CONFIGS.write_all()

    if eligible_leader(CLUSTER_RES):
        migrate_database()
        log('Triggering remote cloud-compute restarts.')
        [compute_joined(rid=rid, remote_restart=True)
         for rid in relation_ids('cloud-compute')]

The 'eligible_leader' function does a few checks

1) if fully clustered, then the owner of the VIP is the leader
2) if not fully clustered, the oldest service unit in the 'cluster' relation is declared the leader

If in the event of 2) the cluster relation is not fully formed with all service units prior to the shared-db relation being made, I could see how 2 nova-cc units both think they are the leader and you hit a race; do you see the db sync being run on multiple service units?

Revision history for this message

Adam Collard (adam-collard) wrote on 2014-07-24: Re: [Bug 1335139] Re: nova-cloud-controller db sync fails in HA mode

On 23 July 2014 14:01, James Page <email address hidden> wrote:

> The 'eligible_leader' function does a few checks
>
> 1) if fully clustered, then the owner of the VIP is the leader
> 2) if not fully clustered, the oldest service unit in the 'cluster'
> relation is declared the leader
>
> If in the event of 2) the cluster relation is not fully formed with all
> service units prior to the shared-db relation being made, I could see
> how 2 nova-cc units both think they are the leader and you hit a race;
> do you see the db sync being run on multiple service units?
>

Yes.

Given it's not possible for a user of the Juju API to know when a relation
is "fully formed" I'm not sure if/how we can work around this.

Revision history for this message

James Page (james-page) wrote on 2014-07-24: Re: nova-cloud-controller db sync fails in HA mode

We definitely hit this problem during original HA implementation and testing; two things where done to work-around this (pending any sort of leader election function from Juju itself):

1) juju deployer deploys service units and waits for them all to start before adding relations

2) peer relations should always fire before other relations when you introduce a new unit into an existing service

Are you waiting for all service units to fully start prior to adding relations between services?

Revision history for this message

James Page (james-page) wrote on 2014-07-24:

I raised bug 1228243 a while back to request this feature.

Adam Collard (adam-collard) on 2014-07-25

Changed in nova-cloud-controller (Juju Charms Collection):
status:	New → Confirmed

James Page (james-page) on 2014-07-30

Changed in nova-cloud-controller (Juju Charms Collection):
importance:	Undecided → High

JuanJo Ciarlante (jjo) on 2014-09-30

tags:

added: canonical-bootstack canonical-is

Edward Hope-Morley (hopem) on 2014-10-06

tags:

added: openstack

James Page (james-page) on 2015-01-23

Changed in nova-cloud-controller (Juju Charms Collection):
status:	Confirmed → Triaged
summary:	- nova-cloud-controller db sync fails in HA mode + nova-cloud-controller db sync sometimes fails in HA mode

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-04-15:

Just hit this with current /next charms (rev 154) - http://paste.ubuntu.com/10827860/

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	none → 15.04

James Page (james-page) on 2015-04-23

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	15.04 → 15.07

James Page (james-page) on 2015-08-10

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	15.07 → 15.10

Revision history for this message

Chad Smith (chad.smith) wrote on 2015-08-28:

Just hit this one more time at Cisco. Could attach a ton of logs if necessary, but it's the same failure mode.

2015-08-28 20:22:38 INFO shared-db-relation-changed 2015-08-28 20:22:38.017 56639 CRITICAL nova [-] OperationalError: (OperationalError) (1050, "Table 'instances' already exists") "\nCREATE TABLE instances (\n\tcrea
....

2015-08-28 20:22:38 INFO shared-db-relation-changed subprocess.check_output(cmd)
2015-08-28 20:22:38 INFO shared-db-relation-changed File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
2015-08-28 20:22:38 INFO shared-db-relation-changed raise CalledProcessError(retcode, cmd, output=output)
2015-08-28 20:22:38 INFO shared-db-relation-changed subprocess.CalledProcessError: Command '['nova-manage', 'db', 'sync']' returned non-zero exit status 1

Revision history for this message

Chad Smith (chad.smith) wrote on 2015-08-28:

#10

u"cs:trusty/nova-cloud-controller-60",
config={"openstack-origin": "cloud:trusty-icehouse",

Our juju charm settings from the above failed deployment

James Page (james-page) on 2015-10-22

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	15.10 → 16.01

James Page (james-page) on 2016-01-28

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	16.01 → 16.04

James Page (james-page) on 2016-04-22

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	16.04 → 16.07

Liam Young (gnuoy) on 2016-07-29

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	16.07 → 16.10

James Page (james-page) on 2016-10-14

Changed in nova-cloud-controller (Juju Charms Collection):
milestone:	16.10 → 17.01

James Page (james-page) on 2017-02-23

Changed in charm-nova-cloud-controller:
importance:	Undecided → High
status:	New → Triaged
Changed in nova-cloud-controller (Juju Charms Collection):
status:	Triaged → Invalid

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2017-10-05:

#11

I'm going to mark this as fix released since leader elections landed in nova-cloud-controller on June 9 2015 and that should fix the issue reported in this bug. If that's not the case please feel free to open the bug back up.

commit eaaaec38ddd6b49ce8be530529b6ffaf165ba6e1
Merge: b651ebf d4b768f
Author: James Page <email address hidden>
Date: Tue Jun 9 10:59:06 2015 +0100

Add support for leader-election

Changed in charm-nova-cloud-controller:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionnova-cloud-controller package

nova-cloud-controller db sync sometimes fails in HA mode

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Juju Charms Collection
nova-cloud-controller package