Deployment of a large bundle fails or hoggs the system

Bug #1658100 reported by Fairbanks.
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Undecided
Unassigned

Bug Description

Hello there,

When i try to deploy a large bundle with several LXD containers the deployment is take very long, and seems to hogg/stall for some reason i couldn't find in the logs.

My base bundle is the openstack-base bundle file, and i added HA via hacluster with 3 units.
In this setup there are a total of 9 machines, 3 neutron-gateway / services nodes and 6 compute nodes.

I install all the services like cinder, glance etc.. in a container on the neutron-gateway nodes. On the nova-compute nodes also function as a ceph-osd, but the ceph-mon's are installed on the neutron-gateway.

Now if i tell the bundle to deploy 3 units of all these services using hacluster as a sub-charm the installation will fail, seems like juju/bootstrap isn't responding or isn't telling what to do to the clients, or clients can't connect to the bootstrap, i couldn't figure that out.

If i tell the bundle to first deploy just one of those services, and after a good deployment i tell all those services to add 2 more unit's it works fine. But not at all at the same time.

Is it an option to let the juju deploy function hold all extra unit's of a specific app and let it deploy once the first one is running?

If i'm correct the juju-deployer for 1.2x did something like this?

ps. i'm not able to test this much (or not even at all any more) on these systems, since they are in use a.t.m.

Any questions just ask :).

Thx.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Fairbanks,

Could you please clarify what Juju 2.x and openstack versions you were using?

Also, we track Juju 2.x issues in "juju" project in launchpad.
I'll re-target :)

Revision history for this message
Anastasia (anastasia-macmood) wrote :

LXD version would be interesting too :)

no longer affects: juju-core
Changed in juju:
status: New → Incomplete
Revision history for this message
Fairbanks. (fairbanks) wrote :

Hello,

The version are as followed:

- MAAS Version 2.1.3+bzr5573-0ubuntu1 (16.04.1)
- JuJu 2.0.2-xenial-amd64
- LXD/LXC 2.0.8

Also all the machines, bare-metal use the images synced by MAAS from MAAS during deployment.
LXD is using the image provided by the default install using JuJu.

Revision history for this message
Fairbanks. (fairbanks) wrote :

Also, the OpenStack that has been deployed is:
ceilometer 7.0.0
ceph 10.2.3
cinder 9.0.0
glance 13.0.0
heat 7.0.0
keystone 10.0.0
mysql 5.6.21-25.8
neutron 9.0.0
nova 14.0.1
openstack-dashboard 10.0.0
rabbitmq-server 3.5.7
swift 2.10.0

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Thank you for the update. We have improved Juju performance in Juju 2.1. It would be great to know whether newer version behaves :)

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Hi Fairbanks,

Can you please post the bundle you're using?

Revision history for this message
Fairbanks. (fairbanks) wrote :

Sorry for the late reply.
It seems that using juju 2.1.2 fixed the issues with deploying

The only thing i sometimes see is that the memory usage is very high on the bootstrap node, and doing a reboot of this node makes it faster again.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Fairbanks. (fairbanks),

Thank you for verifying \o/

Yes, there may be some performance glitches in 2.1.2. We have improved 2.2 even further.

I will mark this bug as Fix Released in 2.1.2, although we have probably landed the fix in earlier 2.1x.

Changed in juju:
milestone: none → 2.1.2
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.