pyjuju

It should be possible to deploy multiple units to a machine (unit placement)

Bug #806241 reported by Adam Gandelman on 2011-07-05

158

This bug affects 28 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	High	Unassigned
	pyjuju	Fix Released	High	Unassigned	pyjuju 0.8 "zephr"

Bug Description

If this is being tracked on another bug, I apologize in advance...

It should be possible to collocate services on the same machine and optionally relate them to one another. This has a very valid use case with regards to openstack. Smaller deployments of only a handful of machines will likely have several (or potentially all) services running on the same machine, while larger deployments will typically isolate services to their own servers and relate them to other services like load balancers, caches, etc.

Currently, the alternative is to develop formulas that deal with a specific set of services as one large service unit and take care of everything within hooks scripts. This is obviously not ideal and makes it difficult to scale.

For example, I'm working on openstack formulas that currently require a minimum of 5 machines:

- nova-compute
- nova-cloud-controller
- glance
- rabbitmq
- mysql

It would be ideal if I could combine rabbitmq and mysql on the same machine as separate services.
nova-cloud-controller is a work-around to this problem as it installs and configures nova-api, nova-schedule, nova-network and nova-objectstore. These should all be handled by separate formulas that relate to one another but currently that would require another 3 machines.
glance is similar in that it sets up the glance-api and glance-registry servers as one unit.

WORKAROUND: use "jitsu deploy-to" and specify manually the machine to deploy to. Probably requires that charms are not incompatible and that they do not try to use the same ports (e.g. port 80).

See original description

Tags:

Related branches

lp:~clint-fewbar/pyjuju/reuse-machines

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2011-07-06:

This is more about multi-unit machines than co-location per se. Ensemble is looking to integrate lxc for multi-unit machine deployments. We're looking at co-location separately as a notion of multiple units within an lxc-container.

Changed in ensemble:
status:	New → Confirmed
summary:	- It should be possible to collocate service units + It should be possible to deploy multiple units to a machine
Changed in ensemble:
importance:	Undecided → High
milestone:	none → dublin

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2011-07-07: Re: It should be possible to deploy multiple units to a machine

So, in order to have a database and a web service hosted on the same machine, they must both be isolated in seperate LXC containers? I see the value of being able to do this, but for such a simple use case I think it adds an enormous amount of complexity, especially for admins who have no knowledge of containers and wish to see their services interacting locally as they have been pre-ensemble. In terms of the openstack example mentioned above, this would create a scenario where configuration needs to be maintained in 4 or 5 separate containers on a single machine where it could really exist in one place.

If it is in theory possible to co-locate multiple units within an lxc container, should it not also be possible to co-locate multiple units on the same machine without the use of containers? (Obviously brainstorming as an end-user here with no technical knowledge of ensemble internals :)

Revision history for this message

Nick Barcet (nijaba) wrote on 2011-07-08:

How would you handle the deployment of collectd client in this case, on the same machine as, say, a compute-node? I am not sure that collectd within an LXC container can collect info of what is happening on the metal...

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2011-07-08: Re: [Bug 806241] Re: It should be possible to deploy multiple units to a machine

Excerpts from Nick Barcet's message of Fri Jul 08 08:57:02 UTC 2011:
> How would you handle the deployment of collectd client in this case, on
> the same machine as, say, a compute-node? I am not sure that collectd
> within an LXC container can collect info of what is happening on the
> metal...
>

a co-located collectd formula can capture the condition of the container, which represents the other co-located service unit.

we talked about notions of machine level formulas, that could do things like setup machine level monitoring.
I think their doable but they really do need a significant restriction to work in the overall goals of
ensemble. They really can't do relations to the units on the machine. So for example, using the collectd on the machine
to collect data on a service unit, wouldn't be an option. Else we start getting a tight coupling of machine and service unit. Machines
fails, service units will move, and then stat collection storage needs migration of the relevant data for the unit.
It really depends on the level of addressing a formula can get into setting up the collection policies. I'd rather
see that same data get collected using a co-located formula and directly associate the metric data to the service unit.

And to me the same applies, in the context of traditional management systems, ala landscape and others. we really want
to shift the focus to service level management, and support logical aggregations of a service's metric/stat/monitoring
data across its units. This enables the metrics to get a wholistic view of the service, and better inform scaling
decisions.

There is the caveat of running multiple monitoring agents, but most of them are fairly lightweight, they run on fully loaded mission
critical machines. If we need more capacity, we add more machines.

In future, I see the lxc containers for each service unit being backed to a separate ebs volume or san/iscsi storage,
such that they can be migrated (allow for some vertical scaling). We can stop the container, snapshot its storage and move
to a separate machine. At the moment we have whole machine ebs volume so the machine to unit coupling would be
preserved, and vertical scaling would be rather arbitrary. There are lots of options to explore on machine assignment
of units to enable some form of vertical scaling. Its definitely pie in the sky atm, we'd need some sort of volume
management api, and it will be interesting to see what comes out of the physical machine sprint. Ideally we'd just
have a machine level formula setup an iscsi target subject, with physical machine selection based on some cross-provider
machine-size abstraction.

cheers,

Kapil

Excerpts from Nick Barcet's message of Fri Jul 08 08:57:02 UTC 2011:
> How would you handle the deployment of collectd client in this case, on
> the same machine as, say, a compute-node?  I am not sure that collectd
> within an LXC container can collect info of what is happening on the
> metal...
>

a co-located collectd formula can capture the condition of the container, which represents the other co-located service unit.

we talked about notions of machine level formulas, that could do things like setup machine level monitoring. 
I think their doable but they really do need a significant restriction to work in the overall goals of 
ensemble. They really can't do relations to the units on the machine. So for example, using the collectd on the machine
to collect data on a service unit, wouldn't be an option.  Else we start getting a tight coupling of machine and service unit. Machines 
fails, service units will move, and then stat collection storage needs migration of the relevant data for the unit.
It really depends on the level of addressing a formula can get into setting up the collection policies. I'd rather
see that same data get collected using a co-located formula and directly associate the metric data to the service unit.

There is the caveat of running multiple monitoring agents, but most of them are fairly lightweight, they run on fully loaded mission
critical machines. If we need more capacity, we add more machines.

In future, I see the lxc containers for each service unit being backed to a separate ebs volume or san/iscsi storage, 
such that they can be migrated (allow for some vertical scaling). We can stop the container, snapshot its storage and move 
to a separate machine.  At the moment we have whole machine ebs volume so the machine to unit coupling would be 
preserved, and vertical scaling would be rather arbitrary. There are lots of options to explore on machine assignment 
of units to enable some form of vertical scaling. Its definitely pie in the sky atm, we'd need some sort of volume 
management api, and it will be interesting to see what comes out of the physical machine sprint. Ideally we'd just
have a machine level formula setup an iscsi target subject, with physical machine selection based on some cross-provider
machine-size abstraction.

cheers,

Kapil

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2011-07-08: Re: It should be possible to deploy multiple units to a machine

I would be hesitant to make iSCSI backed container storage another pre-requisite for co-located service units on physical hardware. It makes sense in the context of EBS+EC2, but network attached storage is a 4 letter word for many admins (especially who are migrating away from EC2 after being bitten by recent EBS problems). Not to mention the overhead in terms of network hardware and admin. that is required to "do it right" at scale.

It seems all of the abstraction (LXC and now iSCSI) that would be required to deploy services onto hardware defeats the purpose of deploying to hardware to begin with. It then begins to look more like deploying services into an ensemble cloud, which may be hosted on our hardware, or ec2, etc. This sounds reasonable and attractive for traditionally elastic workloads like web services, map/reduce, etc. However, when we think of using ensemble to deploy cloud infrastructure or other services that need to be tightly coupled with the hardware that hosts it, the level of abstraction begins to become a show stopper.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-07-09: Re: [Bug 806241] Re: It should be possible to deploy multiple units to a machine

Excerpts from Adam Gandelman's message of Thu Jul 07 21:35:32 UTC 2011:
> So, in order to have a database and a web service hosted on the same
> machine, they must both be isolated in seperate LXC containers? I see
> the value of being able to do this, but for such a simple use case I
> think it adds an enormous amount of complexity, especially for admins
> who have no knowledge of containers and wish to see their services
> interacting locally as they have been pre-ensemble. In terms of the
> openstack example mentioned above, this would create a scenario where
> configuration needs to be maintained in 4 or 5 separate containers on a
> single machine where it could really exist in one place.
>
> If it is in theory possible to co-locate multiple units within an lxc
> container, should it not also be possible to co-locate multiple units on
> the same machine without the use of containers? (Obviously
> brainstorming as an end-user here with no technical knowledge of
> ensemble internals :)
>

First, I agree Adam. I have often wondered why we can't have the first
pair of the mysql/wordpress service on the same machine, and then scale
onto more machines when necessary. This fits the ensemble model really
nicely where it considers machines as resources. In this case, the resource
is not totally consumed by one service.

I believe Ben's work on co-located formulas will allow for what you are
suggesting Adam. While the main reason is to be able to co-locate things
like mysql and munin-node, it should still be perfectly logical to
co-locate certain things like mysql and nova-compute. If I understand the
way we proposed it, you would say

ensemble deploy nova-compute --with mysql

The only difference is that we also had discussed coupling the add/remove
units so

ensemble add-unit nova-compute

Would also add a mysql service unit, which is *not* what you want.

A really easy way around this would be to add this to deploy and add-unit:

--machine machine_id|service_name

So you'd still deploy onto another service's machines, but the
adding/removing of units would not remain coupled.

In fact I've pushed this up to launchpad

lp:~clint-fewbar/ensemble/reuse-machines

Very small diff, *VERY* useful for testing simple relations w/o spawning
many machines. It only fails when one tries to put two of the same formula
on one machine.

Excerpts from Adam Gandelman's message of Thu Jul 07 21:35:32 UTC 2011:
> So, in order to have a database and a web service hosted on the same
> machine, they must both be isolated in seperate LXC containers?  I see
> the value of being able to do this, but for such a simple use case I
> think it adds an enormous amount of complexity, especially for admins
> who have no knowledge of containers and wish to see their services
> interacting locally as they have been pre-ensemble.   In terms of the
> openstack example mentioned above, this would create a scenario where
> configuration needs to be maintained in 4 or 5 separate containers on a
> single machine where it could really exist in one place.
> 
> If it is in theory possible to co-locate multiple units within an lxc
> container, should it not also be possible to co-locate multiple units on
> the same machine without the use of containers?  (Obviously
> brainstorming as an end-user here with no technical knowledge of
> ensemble internals :)
>

First, I agree Adam. I have often wondered why we can't have the first
pair of the  mysql/wordpress service on the same machine, and then scale
onto more machines when necessary. This fits the ensemble model really
nicely where it considers machines as resources. In this case, the resource
is not totally consumed by one service.

ensemble deploy nova-compute --with mysql

The only difference is that we also had discussed coupling the add/remove
units so

ensemble add-unit nova-compute

Would also add a mysql service unit, which is *not* what you want.

A really easy way around this would be to add this to deploy and add-unit:

--machine machine_id|service_name

So you'd still deploy onto another service's machines, but the
adding/removing of units would not remain coupled.

In fact I've pushed this up to launchpad

lp:~clint-fewbar/ensemble/reuse-machines

Very small diff, *VERY* useful for testing simple relations w/o spawning
many machines. It only fails when one tries to put two of the same formula
on one machine.

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2011-07-11: Re: It should be possible to deploy multiple units to a machine

Thanks for the input, clint. It seems something like the latter would work well for deployment, if I understand what you're suggesting:

("cloud controller" node = machine 1)

ensemble deploy --machine 1|rabbitmq
ensemble deploy --machine 1|mysql
ensemble deploy --machine 1|nova-cloud-controller
ensemble deploy --machine 1|nova-compute

You didn't address adding relations, tho maybe this is handled by add-unit (I've not used add-unit yet, doh!). Something like this is what seems logical to me:

With this model, I could later keep adding nova-compute services on other machines, to be related to the individual services on the single node as if they were on dedicated machines. It may be out of scope, but the ability to later deploy mysql to a separate machine and remove/add the relations to its new location would be great.

I want to whip up some .dot or .dia diagram later today to illustrate what a complex beast the openstack deployment is becoming, so that you guys might take a look this week and see if it helps understand the challenges here.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-07-12:

With the branch I pushed up, add-relation still works the same as in multi-machine cases. Only commands that need to put a service unit on a machine need to have --machine .. which is add-unit and deploy.

Kapil Thangavelu (hazmat) on 2011-07-19

Changed in ensemble:
assignee:	nobody → Clint Byrum (clint-fewbar)

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-08-11:

You can at least run basic tests using the new

--placement=local

Argument to deploy and add-unit. This doesn't allow scaling beyond one machine, but it does let you say "just put the units on the bootstrap machine". This allows deploying anything that doesn't conflict directly on the same box (like the example wordpress/mysql formulas).

Note that these are not contained in any way, so the bootstrap machine may get messy really fast.

Changed in ensemble:
assignee:	Clint Byrum (clint-fewbar) → nobody

Clint Byrum (clint-fewbar) on 2011-08-11

Changed in ensemble:
importance:	High → Medium

Kapil Thangavelu (hazmat) on 2011-08-17

Changed in ensemble:
milestone:	dublin → none

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2011-09-28:

#10

The biggest problem preventing multi-unit machine usage at the moment is that with dynamic port usage juju has no way of detecting conflicts with pre-deployment analysis, so no multi unit deployment can safely be done. I'd like to revisit the dynamic port usag, as i think it causes more problems for this use case then it solves. alternatively hear someone articulate a clear vision of how advanced networking can work around this and work in ec2.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2011-09-28:

#11

It would be nice if we didn't block fixing this bug on Comment #10. Document that JuJu doesn't detect port conflicts, so don't expect services to work if ports are used in duplicate. As users, we can be care and conscious of this, and avoid those situations, and happily use them in situations where they don't conflict.

For example, we need a MySQL server and a RabbitMQ server for a particular charm we're working on. We'd like to reuse the two existing, generic formulas for these two services, but we *really* don't want to dedicate two physical machines (in Orchestra), to MySQL and RabbitMQ, as we have a very limited amount of hardware. We want/need to put MySQL and RabbitMQ on the same physical hardware.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2011-09-28:

#12

We're currently sprinting on deploying OpenStack with Orchestra + JuJu, and this is absolutely the most critical bug to our current deployment.

We're trying to develop reusable charms for the ~10-ish moving parts to the OpenStack infrastructure. We'd like to develop fundamental units that represent each of the core components, and use the power of JuJu to scale them out, and develop real High Availability of OpenStack.

To do this, we'd use individual charms for MySQL, RabbitMQ, Nova-Compute, Swift, Swift-Proxy, Nova-API, Glance, Keystone, Dash, Scheduler, Network, etc. Some of these are meant to be co-located, or should be to optimize hardware usage. Unlike EC2, where you have as many systems as you're willing to pay for, our physical network has hard limits to a physical number of systems. We *must* be more careful and conservative about what gets deployed where. As noted above, we need to co-locate the MySQL, RabbitMQ, and a couple of other services, as these physical systems are quite capable of running multiple services.

At this point, we're having to develop custom charms for each and every topology or deployment combination that we need. It's really quite unfortunate to have to snapshot/steal the charm bits and combine them into new charms every time we need a combination of services on a given system.

Would it be possible to raise the priority of this bug? Co-located charms are essential to using JuJu to deploy complex systems -- specifically, OpenStack, in our current use case. Thanks.

Revision history for this message

Robbie Williamson (robbiew) wrote on 2011-09-28:

#13

Based on the recent comments, raising this to High.

Changed in juju:
importance:	Medium → High

Clint Byrum (clint-fewbar) on 2011-09-28

tags:

added: production

Dustin Kirkland  (kirkland) on 2011-09-30

summary:

- It should be possible to deploy multiple units to a machine
+ It should be possible to deploy multiple units to a machine (service
+ colocation)

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2011-10-14: Re: It should be possible to deploy multiple units to a machine (service colocation)

#14

A few comments,

1. I've come around to the simplicity and flexibility of what clint originally proposed here. ie just let the user do manual placement, by specifying a machine at deploy time.

2. Juju should do things safely, ie. prevent the user from doing things that are just broken, and will break services.
Which is what i'm saying in #10. That's a larger topic for the list though.

Changed in juju:
milestone:	none → florence

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2011-10-14: Re: [Bug 806241] Re: It should be possible to deploy multiple units to a machine (service colocation)

#15

On Fri, Oct 14, 2011 at 5:33 AM, Kapil Thangavelu
<email address hidden> wrote:
> 1. I've come around to the simplicity and flexibility of what clint originally proposed here. ie just let the user do manual placement, by specifying a machine at deploy time.

Hell yeah! +100!

> 2. Juju should do things safely, ie. prevent the user from doing things that are just broken, and will break services.
> Which is what i'm saying in #10. That's a larger topic for the list though.

Meh, sure, if you can, that would be great. But if you can't, please
don't block this feature on that :-( Users can just DFDT until you
get this knowledge and protection in place.

e.g. I know for a *fact* that I want my MySQL and RabbitMQ servers on
the same physical system when I do my OpenStack Juju deployment.
Whether or not Juju understands that that's possible or not, I know
that it is. And it is very much in fact what I need/want.

Thanks for the activity on this, Kapil.

Kapil Thangavelu (hazmat) on 2011-10-25

summary:

- It should be possible to deploy multiple units to a machine (service
- colocation)
+ It should be possible to deploy multiple units to a machine (unit
+ placement)

Jim Baker (jimbaker) on 2011-10-27

Changed in juju:
assignee:	nobody → Jim Baker (jimbaker)

Jim Baker (jimbaker) on 2011-10-31

Changed in juju:
assignee:	Jim Baker (jimbaker) → nobody

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2012-03-16:

#16

This might be interesting as deploy with a force subordinate flag.

Changed in juju:
milestone:	florence → galapagos

Kapil Thangavelu (hazmat) on 2012-04-29

Changed in juju:
milestone:	galapagos → honolulu

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2012-05-11: Re: [Bug 806241] Re: It should be possible to deploy multiple units to a machine (unit placement)

#17

This is something to be handled in the Go port.

Also, it should not be possible to force charms as subordinate. Being
unable to deploy multiple units in a machine is a well known limitation
that will be fixed in due time. Subordinates are something else.

gustavo @ http://niemeyer.net
On Apr 28, 2012 8:25 PM, "Kapil Thangavelu" <email address hidden>
wrote:

> ** Changed in: juju
> Milestone: galapagos => honolulu
>
> --
> You received this bug notification because you are a member of juju
> hackers, which is the registrant for juju.
> https://bugs.launchpad.net/bugs/806241
>
> Title:
> It should be possible to deploy multiple units to a machine (unit
> placement)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/806241/+subscriptions
>

Clint Byrum (clint-fewbar) on 2012-09-10

Changed in juju:
milestone:	0.6 → 0.7
milestone:	0.7 → none

Clint Byrum (clint-fewbar) on 2012-10-25

Changed in juju-core:
status:	New → Confirmed
importance:	Undecided → High

Dave Cheney (dave-cheney) on 2012-12-05

Changed in juju-core:
milestone:	none → 2.0

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2013-01-31:

#18

should be noted that an implementation of this exists though lacking isolation between units in form of jitsu deploy-to

Marius B. Kotsbak (mariusko) on 2013-03-05

description:

updated

Kapil Thangavelu (hazmat) on 2013-03-12

Changed in juju:
milestone:	none → 0.8

Данило Шеган (danilo) on 2013-06-07

Changed in juju-core:
status:	Confirmed → Triaged

Revision history for this message

Logan McNaughton (loganmc10) wrote on 2013-07-12:

#19

This is a fairly big problem, it is holding a lot of people back from using MAAS/Juju to deploy OpenStack. I really hope this makes it into juju by the time Havana is released. Without this support, Mirantis' Fuel looks like a much better option.

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2013-07-15:

#20

This feature is builtin to juju-core for a few months, via juju deploy
--force-machine=$machine_id $charm_spec

On Fri, Jul 12, 2013 at 1:19 PM, Logan McNaughton <<email address hidden>
> wrote:

> This is a fairly big problem, it is holding a lot of people back from
> using MAAS/Juju to deploy OpenStack. I really hope this makes it into
> juju by the time Havana is released. Without this support, Mirantis'
> Fuel looks like a much better option.
>
> --
> You received this bug notification because you are subscribed to juju.
> https://bugs.launchpad.net/bugs/806241
>
> Title:
> It should be possible to deploy multiple units to a machine (unit
> placement)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/806241/+subscriptions
>

Revision history for this message

Tim Penhey (thumper) wrote on 2013-09-11:

#21

This is done with --to in juju-core.

Changed in juju-core:
status:	Triaged → Fix Released
milestone:	2.0 → none

Curtis Hovey (sinzui) on 2013-10-12

Changed in juju:
status:	Confirmed → Triaged

Manoj Pattanaik (manoj-pattanaik) on 2013-10-25

Changed in juju:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #829402

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.