Mirantis OpenStack

Sometimes the controller may exist more than one L3-agent/DHCP-agent/Metadata-agent.

Bug #1652748 reported by siyingchun on 2016-12-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Fix Committed	Wishlist	Andrii Kroshchenko	Mirantis OpenStack 10.0

Bug Description

We have a large scale of openstack clusters, sometimes the controller node may exist more than one L3-agent/DHCP-agent/Metadata-agent, after the whole environment works correctly several days.

Our environment is mainly based on Mirantis Fuel 7.0, many services are monitored and managed by pacemaker that is a very powerful and automantic tool. Also there are four services controlled by pacemaker, for example, L3-agent, DHCP-agent, ovs-agent and neutron meta-data agent. So administrator and any other users have no need to manage and operate them, but ...... what happened? And here is the key reason that administrator and other users unexpectedly don't find them because of unknown reasons, for instance, using linux bash shell "ps -ef | grep L3-agent", they may use types of tools to restart these services, e.g. "service *** start", "systemctl *** start" ......

As a result, it looks ok, and the crash service works again, however it is just managed by linux bash, and pacemaker don't know what happened and regularly start this crash service by himself, so TWO the same services have been started and they work dependently.

Or any other non man-made factors, so it should be checked when system wants to start a new *-agent.

* Pre-condition:
You have a large scale environment or a small test one when it works several days.

* Step-by-step:
In controller, when you type the list of commands, like "neutron agent-list, check *-agents"

* Expect result:
Only one L3-agent/DHCP-agent/Metadata-agent exists

* Actual result:
Two L3-agent/DHCP-agent/Metadata-agent exist

* Version:
Openstack Newton, deployed with Fuel 10.0
Ubuntu Ubuntu 16.04.1 LTS, running kernel 4.4.0-57-generic
Neutron version 5.1.0

Tags:

siyingchun (wintersi) on 2016-12-27

Changed in neutron:
assignee:	nobody → siyingchun (wintersi)

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-12-27:

This issue as reported seem to be related to the specific (downstream) distribution used to deploy and operate OpenStack.

Changed in neutron:
status:	New → Invalid
assignee:	siyingchun (wintersi) → nobody

Revision history for this message

siyingchun (wintersi) wrote on 2016-12-27:

Yes, I think so, but I also think it should be better if the *-agent process could be controlled by a status flag in code, whenever or whatever it will be started by the pacemaker or users and administrators.

Changed in neutron:
status:	Invalid → Opinion

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-12-27:

I am not sure what you're proposing, but it the use of the flag seems problematic.

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-12-28:

@siyingchun

I think it's not a problem of neutron itself, but rather of how neutron services are deployed, e.g. if pacemaker is used for managing of the services, then those must be explicitly disabled in the init system to prevent possible problems with manual restarts. E.g. in systemd the services could be "masked" (https://fedoramagazine.org/systemd-masking-units/).

Changed in mos:
status:	New → Confirmed
Changed in neutron:
status:	Opinion → Invalid
Changed in mos:
importance:	Undecided → Wishlist
milestone:	none → 10.0
assignee:	nobody → MOS Packaging Team (mos-packaging)
tags:	added: area-linux

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-12-28:

I consider this not a bug, but rather a request for improvement of user experience, thus Wishlist.

Revision history for this message

Ma Liang Liang (mall2) wrote on 2016-12-28:

I think this is a typical reliability issue. Operator may input some cmds by mistake sometimes.
So it's better to do a check in code.
I read the code, when a neutron *-agent started, it will send a report_status message to neutron-server as a register inform. All the agent will add a "start_flag" at the first message and remove the flag from the following report messages. Neutron server can check the DB and "start_flag" to judge if the new comming agent is a duplicated on a specific host.

So in my opinion ,it's a little effort big gain issue.
I just give out my advice.

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2016-12-28:

The bug has nothing to do with upstream neutron project, hence removing it.

no longer affects:

neutron

Ivan Udovichenko (iudovichenko) on 2016-12-29

Changed in mos:
assignee:	MOS Packaging Team (mos-packaging) → Andrii Kroshchenko (akroshchenko)

Andrii Kroshchenko (akroshchenko) on 2017-01-11

Changed in mos:
assignee:	Andrii Kroshchenko (akroshchenko) → nobody
assignee:	nobody → MOS Packaging Team (mos-packaging)

Revision history for this message

Andrii Kroshchenko (akroshchenko) wrote on 2017-01-11:

I think the right way to avoid this issue is, as advised @rpodolyaka, masking of a service.

Andrii Kroshchenko (akroshchenko) on 2017-01-11

Changed in mos:
assignee:	MOS Packaging Team (mos-packaging) → Andrii Kroshchenko (akroshchenko)

Revision history for this message

Andrii Kroshchenko (akroshchenko) wrote on 2017-03-06:

Related changes were made - patch was merge[1].

[1] https://review.openstack.org/#/c/424099/

Changed in mos:
status:	Confirmed → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.