Sometimes the controller may exist more than one L3-agent/DHCP-agent/Metadata-agent.

Bug #1652748 reported by siyingchun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Committed
Wishlist
Andrii Kroshchenko

Bug Description

We have a large scale of openstack clusters, sometimes the controller node may exist more than one L3-agent/DHCP-agent/Metadata-agent, after the whole environment works correctly several days.

Our environment is mainly based on Mirantis Fuel 7.0, many services are monitored and managed by pacemaker that is a very powerful and automantic tool. Also there are four services controlled by pacemaker, for example, L3-agent, DHCP-agent, ovs-agent and neutron meta-data agent. So administrator and any other users have no need to manage and operate them, but ...... what happened? And here is the key reason that administrator and other users unexpectedly don't find them because of unknown reasons, for instance, using linux bash shell "ps -ef | grep L3-agent", they may use types of tools to restart these services, e.g. "service *** start", "systemctl *** start" ......

As a result, it looks ok, and the crash service works again, however it is just managed by linux bash, and pacemaker don't know what happened and regularly start this crash service by himself, so TWO the same services have been started and they work dependently.

Or any other non man-made factors, so it should be checked when system wants to start a new *-agent.

* Pre-condition:
You have a large scale environment or a small test one when it works several days.

* Step-by-step:
In controller, when you type the list of commands, like "neutron agent-list, check *-agents"

* Expect result:
Only one L3-agent/DHCP-agent/Metadata-agent exists

* Actual result:
Two L3-agent/DHCP-agent/Metadata-agent exist

* Version:
Openstack Newton, deployed with Fuel 10.0
Ubuntu Ubuntu 16.04.1 LTS, running kernel 4.4.0-57-generic
Neutron version 5.1.0

Tags: area-linux
siyingchun (wintersi)
Changed in neutron:
assignee: nobody → siyingchun (wintersi)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This issue as reported seem to be related to the specific (downstream) distribution used to deploy and operate OpenStack.

Changed in neutron:
status: New → Invalid
assignee: siyingchun (wintersi) → nobody
Revision history for this message
siyingchun (wintersi) wrote :

Yes, I think so, but I also think it should be better if the *-agent process could be controlled by a status flag in code, whenever or whatever it will be started by the pacemaker or users and administrators.

Changed in neutron:
status: Invalid → Opinion
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am not sure what you're proposing, but it the use of the flag seems problematic.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

@siyingchun

I think it's not a problem of neutron itself, but rather of how neutron services are deployed, e.g. if pacemaker is used for managing of the services, then those must be explicitly disabled in the init system to prevent possible problems with manual restarts. E.g. in systemd the services could be "masked" (https://fedoramagazine.org/systemd-masking-units/).

Changed in mos:
status: New → Confirmed
Changed in neutron:
status: Opinion → Invalid
Changed in mos:
importance: Undecided → Wishlist
milestone: none → 10.0
assignee: nobody → MOS Packaging Team (mos-packaging)
tags: added: area-linux
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I consider this not a bug, but rather a request for improvement of user experience, thus Wishlist.

Revision history for this message
Ma Liang Liang (mall2) wrote :

I think this is a typical reliability issue. Operator may input some cmds by mistake sometimes.
So it's better to do a check in code.
I read the code, when a neutron *-agent started, it will send a report_status message to neutron-server as a register inform. All the agent will add a "start_flag" at the first message and remove the flag from the following report messages. Neutron server can check the DB and "start_flag" to judge if the new comming agent is a duplicated on a specific host.

So in my opinion ,it's a little effort big gain issue.
I just give out my advice.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The bug has nothing to do with upstream neutron project, hence removing it.

no longer affects: neutron
Changed in mos:
assignee: MOS Packaging Team (mos-packaging) → Andrii Kroshchenko (akroshchenko)
Changed in mos:
assignee: Andrii Kroshchenko (akroshchenko) → nobody
assignee: nobody → MOS Packaging Team (mos-packaging)
Revision history for this message
Andrii Kroshchenko (akroshchenko) wrote :

I think the right way to avoid this issue is, as advised @rpodolyaka, masking of a service.

Changed in mos:
assignee: MOS Packaging Team (mos-packaging) → Andrii Kroshchenko (akroshchenko)
Revision history for this message
Andrii Kroshchenko (akroshchenko) wrote :

Related changes were made - patch was merge[1].

[1] https://review.openstack.org/#/c/424099/

Changed in mos:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.