[L2] update the port DB status directly in agent-side

Bug #1840979 reported by LIU Yulong
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Wishlist
Unassigned

Bug Description

When ovs-agent done processing the port, it will call neutron-server to make some DB update.
Especially when restart the ovs-agent, all ports in one agent will do such RPC and DB update again to make port status consistent. When a large number of concurrent agent restart happen, neutron-server may not work fine.
So how about making the following DB updating locally in neutron agent side directly? It may have some mechanism driver notification, IMO, this can also be done in agent-side.

    def update_device_down(self, context, device, agent_id, host=None):
        cctxt = self.client.prepare()
        return cctxt.call(context, 'update_device_down', device=device,
                          agent_id=agent_id, host=host)

    def update_device_up(self, context, device, agent_id, host=None):
        cctxt = self.client.prepare()
        return cctxt.call(context, 'update_device_up', device=device,
                          agent_id=agent_id, host=host)

    def update_device_list(self, context, devices_up, devices_down,
    ret = cctxt.call(context, 'update_device_list',

Changed in neutron:
status: New → Opinion
importance: Undecided → Wishlist
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/678366

Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: Opinion → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :

Nova and Cinder can do some DB update in nova-compute and cinder-volume side, so neutron can make something similar in agent side. Let's try to see what the CI results.

summary: - [L2] [opinion] update the port DB status directly in agent-side
+ [L2] update the port DB status directly in agent-side
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

For me this looks as RFE which should be discussed with team as it seems to very big design change in Neutron.
Personally I don't think we should allow e.g. agents to allow db access. It should be IMO done by neutron server process only.

Speaking about Cinder and Nova, I have no idea how Cinder is doing that but IIRC in Nova only conductor (and maybe nova-api) has access to db. I don't think nova-compute is doing any modifications in db.

tags: added: db rfe
Revision history for this message
LIU Yulong (dragon889) wrote :

@Slawek,
Please see instance.save() and volume.save() in nova-compute and cinder-volume for details.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Are You sure that nova-compute has direct access to the DB? I'm looking at Nova's docs: https://docs.openstack.org/nova/stein/install/compute-install-ubuntu.html#install-and-configure-components and I don't see any info that db connection should be configured on compute nodes to make nova-compute working.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was thinking about this a bit more. And I really don't think it's good idea to let all neutron-ovs-agents to connect directly to db.
First of all it might be considered as security issue to store db access credentials on all compute nodes.
Secondly, it may be an issue in some edge deployments where compute nodes may don't have access to db nodes at all.
And lastly, please check fullstack job logs from Your PoC patch. It seems that it e.g. tries to initialize some tables during staring of neutron-ovs-agent: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_66/678366/8/check/neutron-fullstack/0611560/controller/logs/dsvm-fullstack-logs/TestAgentBandwidthReport.test_agent_configurations_Open-vSwitch-agent_/neutron-openvswitch-agent--2019-08-30--14-59-58-144574_log.txt.gz

Revision history for this message
Miguel Lavalle (minsel) wrote :

Yes, this is a big design change and in principle I don't support it for the same reasons Slawek mentions above

Revision history for this message
LIU Yulong (dragon889) wrote :

Those volume.save has ability to access DB directly. This a node of my test env.
[root@compute1 ~]# netstat -natpl|grep 3306
tcp 0 0 172.28.8.42:34626 172.28.8.37:3306 ESTABLISHED 953313/python2
tcp 0 0 172.28.8.42:34148 172.28.8.37:3306 ESTABLISHED 953313/python2
tcp 0 0 172.28.8.42:34588 172.28.8.37:3306 ESTABLISHED 953412/python2
tcp 0 0 172.28.8.42:34656 172.28.8.37:3306 ESTABLISHED 939760/python2
tcp 0 0 172.28.8.42:34590 172.28.8.37:3306 ESTABLISHED 953412/python2
[root@compute1 ~]# ps -ef|grep 939760
cinder 939760 1 1 Aug28 ? 04:58:57 /usr/bin/python2 /usr/bin/cinder-backup --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/backup.log
root 2407559 2407535 0 07:55 pts/0 00:00:00 grep --color=auto 939760
[root@compute1 ~]# ps -ef|grep 953313
cinder 953313 953299 2 Aug28 ? 05:27:55 /usr/bin/python2 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/volume.log
root 2407571 2407535 0 07:55 pts/0 00:00:00 grep --color=auto 953313
For those projects who have no that 'conductor' process will do DB update locally.

This approach does not require all deployments to have directly DB access. But give those large scale cloud a chance to narrow down their scale issues. A new config will be added to agent side.

@Slawek, yes, I noticed the fullstack failure, and try to enable some config of the test. Seems the QoS extension_driver was not added in the config. I have no idea temporarily...

@Miguel, we discussed about the scale of neutron once, this approach may help breaking the current deployment limit of the number of nodes.

All right, there seems to be some opposition, so how to solve the problem mentioned in the description?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@LIU Yulong: Maybe we can somehow smartly improve this RPC communication between server and agent? IIRC, agent first asks server for details about port, and than server switch such port to BUILD state. Maybe for ports which are already in ACTIVE it will not be necessary? And than if agent has got port which already is UP and wants to report that, maybe it don't need to be send to server?
I'm not sure about that and I didn't test it so it is very likely that such approach will not work at all but maybe it's worth to explore and test it. What do You think?

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

it's rather common practice for distributed systems to make agents to access the db directly for performance reasons.
i guess the security concern is valid though, especially given that our db doesn't provide a fine grained ACLs.
(midonet had a plan to use zookeeper ACLs to mitigate similar concerns.)

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Lets discuss this on drivers meeting to see what others thinks about this idea.

tags: added: rfe-triaged
removed: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi,

We will try to discuss this RFE on next drivers meeting, at 25.10.2019.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

We discussed that idea during drivers meeting on 25.10.2019. Sum up of discussion is that we don't want to do such radical change in neutron's architecture. Main reasons for that are:

1. This can make upgrades much harder as when DB schema change happens, neutron agents cannot talk with DB until neutron agents are upgraded,

2. there is plan to converge with OVN so lets focus on that instead of doing so radical changes to existing agents,

3. this may impact security of whole deployment.

So I'm closing this RFE as not approved.

tags: added: rfe
removed: rfe-triaged
Changed in neutron:
status: In Progress → Won't Fix
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: LIU Yulong (dragon889) → nobody
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/678366
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

LIU Yulong (dragon889)
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: Won't Fix → New
Changed in neutron:
status: New → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@LIU: why You changed status of this RFE to "New" again?
According to my comment #13 this wasn't approved RFE so we don't want to have such change in Neutron.

Changed in neutron:
status: In Progress → Won't Fix
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: LIU Yulong (dragon889) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/678366
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.