[RFE] introduce distributed locks to ipam

Bug #1836834 reported by qinhaizhong on 2019-07-17
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

When the virtual machines are created in batches, nova will call the neutron API to create the port concurrently. When an ip allocation conflict fails to submit a database, a ``DB ERROR`` exception is thrown. ``Create_port`` will catch the above exception and rest after ``retry_interval=0.1`` and re-call ``create_port`` until it exceeds ``max_retries=10``.When it exceeds ``max_retries=10`` times, "Create_port" will fail. And bulk create port has similar problem.

The current design using the retry mechanism will cause the neutron burden to burst rapidly. At this time, the api that calls neutron will not be able to get the corresponding in 60 seconds. The caller will judge that the neutron api fails to be called, which will cause various problems.

Details see https://bugs.launchpad.net/neutron/+bug/1777968. And bulk port creation has the same problem.

We will implements a new ipam driver by introducing a distributed lock to completely solve the problem of ip address allocation conflict leading to failure. And distributed locks we will use openstack tooz, which supports many backend
drivers, such as Zookeeper, Memcached, Redis, Mysql, etc. And tooz is an openstack native project.

Changed in neutron:
assignee: nobody → qinhaizhong (qinhaizhong)
Changed in neutron:
status: New → In Progress
description: updated
Changed in neutron:
status: In Progress → New
description: updated
description: updated
description: updated
Brin Zhang (zhangbailin) on 2019-07-17
tags: added: rfe-confirmed
removed: rfe
tags: added: rfe
Hongbin Lu (hongbin.lu) on 2019-07-18
Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
Miguel Lavalle (minsel) wrote :

Can you give a little more detail as to how you propose to implement this? We need to approve this in the Drivers team, so we would like more detail.

Also, you state that "And it is an openstack native project.". Does this mean you are going to create a new project to solve this problem?

qinhaizhong (qinhaizhong) wrote :

Our core logic is as follows:
Based on the original _generate_ips algorithm, this method will be reimplemented. The ips calculated for _generate_ips will be "ip+subnet_id" as the key plus the distributed lock. If the lock failure means conflict, we will re-generate_ips when allocated. After the ip is submitted to the database, it is unlocked, which guarantees concurrent performance and completely resolves conflicts.

"And it is an openstack native project.",I mean that tooz is an openstack native project, I have updated the description. Thank you.

description: updated
LIU Yulong (dragon889) wrote :
Miguel Lavalle (minsel) on 2019-07-26
tags: added: rfe-triaged
removed: rfe rfe-confirmed
Miguel Lavalle (minsel) on 2019-08-09
tags: added: rfe-approved
removed: rfe-triaged
Miguel Lavalle (minsel) wrote :

This RFE was approved on July 26th during the drivers meeting on the following assumptions:

1) A more detailed spec will be proposed
2) We will see the code as a PoC. We will use the experience of the PoC to feedback on the spec if needed
3) The code will come with Rally tests, so we can measure improvement

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers