[RFE] QoS Explicit Congestion Notification (ECN) Support

Bug #1505627 reported by vikram.choudhary
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Reedip

Bug Description

[Existing problem]
Network congestion can be very common in large data centers generating huge traffic from multiple hosts. Though each hosts can use IP header TOS ECN bit functionality to implement explicit congestion notification [1]_ but this will be a redundant effort.

[Proposal]
This proposal talks about achieving ECN on behalf of each host. This will help in making the solution centralized and can be done per tenant level. In addition to this traffic classification for applying ECN functionality can also be achieved via specific filtering rules, if required. Almost all the leading vendors support this option for better QoS [2]_.

Existing QoS framework is limited only to bandwidth rate limiting and be extend for supporting explicit congestion notification (RFC 3168 [3]_).

[Benefits]
- Enhancement to the existing QoS functionality.

[What is the enhancement?]
- Add ECN support to the QoS extension.
- Add additional command lines for realizing ECN functionality.
- Add OVS support.

[Related information]
[1] ECN Wiki
   http://en.wikipedia.org/wiki/Explicit_Congestion_Notification
[2] QoS
   https://review.openstack.org/#/c/88599/
[3] RFC 3168
   https://tools.ietf.org/html/rfc3168
[4] Specification
    https://blueprints.launchpad.net/neutron/+spec/explicit-congestion-notification
[5] Specification Discussion: https://etherpad.openstack.org/p/QoS_ECN
[6] OpenVSwitch support for ECN : http://openvswitch.org/support/dist-docs/ovs-ofctl.8.txt
[7] Etherpad Link : https://etherpad.openstack.org/p/QoS_ECN

Changed in neutron:
assignee: nobody → vikram.choudhary (vikschw)
summary: - qos-ecn-support
+ QoS ECN Support
tags: added: ovs qos
Changed in neutron:
importance: Undecided → Wishlist
status: New → Confirmed
Henry Gessau (gessau)
summary: - QoS ECN Support
+ QoS Explicit Congestion Notification (ECN) Support
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote : Re: QoS Explicit Congestion Notification (ECN) Support

I'm not sure I get the full picture. Could you explain a little bit of a flow of how admin would configure this, what kinds of congestion would it detect, and how admin would see them, or what would happen when ECN bit is on?

Revision history for this message
vikram.choudhary (vikschw) wrote :

Will do an initial write up and upload a specification soon.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This has never happened, unless the tracking bot failed.

Changed in neutron:
status: Confirmed → Incomplete
assignee: vikram.choudhary (vikschw) → nobody
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Please provide more details.

Revision history for this message
Reedip (reedip-banerjee) wrote :

@Armando: Will provide a writeup for the same soon.

Changed in neutron:
assignee: nobody → Reedip (reedip-banerjee)
Revision history for this message
vikram.choudhary (vikschw) wrote :

Information captured @ https://beta.etherpad.org/p/Notepad

Thanks to Reedip for taking this up!

Reedip (reedip-banerjee)
description: updated
Reedip (reedip-banerjee)
description: updated
Changed in neutron:
status: Incomplete → New
Revision history for this message
Reedip (reedip-banerjee) wrote :
summary: - QoS Explicit Congestion Notification (ECN) Support
+ [RFE] QoS Explicit Congestion Notification (ECN) Support
Henry Gessau (gessau)
Changed in neutron:
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ajo, Ihar: have you had the chance to have a first pass at the details provided?

Reedip (reedip-banerjee)
description: updated
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Sorry for the delay, I've added some questions to the etherpad.

It seems like it could be a reasonable enhancement to simple packet policing (note ingress policing still not supported, just egress).

My question on the etherpad:

#) Without a bandwidth limit rule, what's the source of congestion?, do we inspect the external interfaces to check the rate is too high?, do we inspect system load?
   System load could force ECN on low traffic instances
   System traffic rate could also force ECN on low traffic instances
   System traffic could be combined with the port packet counters to determine which ports are receiving more traffic, and act upon that.

I find this proposal interesting, but I think it's necessary to define:
1) how is "congestion" identified (there could be serveral forms, or to be tunable)
2) I think a reasonable way could be to create an ECN rule that identifies those parameters (a bandwidth_limit_rule_id -which has to be on the same policy-, or "host_load" / "host_traffic" .. etc - ? )
3) A high level description of how would it be implemented in the low level (I know, I contradict myself).

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

@vikram, @reedip, can you answer to the questions on etherpad?

I propose we discuss this RFE on next QoS meeting to clarify.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Ping, asked in qos-meeting, when can we talk about it?

Revision history for this message
Reedip (reedip-banerjee) wrote :

Dear Ajo,
Just saw the logs from yesterday's meeting.
Sorry for the delay, we will put up our responses on the etherpad sheet itself

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

One of the thoughts of the last meeting, was that, the high level seems clear now, but we may want to move this forward an have a spec to clarify the details of the implementation.

Specially we're not sure about how would you implement it in the low level.

@armax, if we could move this forward in the form of a spec, that'd be great.

Revision history for this message
Akihiro Motoki (amotoki) wrote :

I am not sure whether we really need to allow users to control ECN-enabled via API?
There are several levels of ECN support.
- virtual switches like OVS of compute nodes keeps ECN fields (i.e., do not reset ECN fields)
- set ECN fields based on a congestion level of virtual switches on compute nodes

I wonder what kind of things are exposed by ECN support.
Isn't it enough to support ECN in OVS on compute nodes based on the configuration?

(I added same questions to the etherpad too.)

I tried to follow the discussion on QoS meeting 3/23 but it seems the meeting bot did not log the conversation.
According to Miguel's commnent, QoS team now has clear high level view.
It would be great to share the high level view.

Revision history for this message
Reedip (reedip-banerjee) wrote :

Akihiro:
The missing logs have been fished out and pasted here ( Lucky I have logging in my IRC :) )
http://paste.openstack.org/show/494363/

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

We are moving this to incomplete until we have more feedback.

I had a talk with reedip during the summit, and it was clear that he wanted to leverage ECN, and how ECN works, but there was no clear picture of how the network node / the hypervisor would "detect" congestion, (by bw limit rules, by other means... etc).

Changed in neutron:
status: Confirmed → Incomplete
Reedip (reedip-banerjee)
description: updated
Revision history for this message
Reedip (reedip-banerjee) wrote :

Hi Miguel,
I have updated the https://etherpad.openstack.org/p/QoS_ECN Etherpad link for further discussion.
I would like your opinion on the same .

Revision history for this message
Reedip (reedip-banerjee) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/445762

Changed in neutron:
status: Incomplete → New
Changed in neutron:
status: New → Triaged
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

My belief is that qos subteam should make a judgement call here whether they are ready to support this initiative. It seems to me qos plate is already quite full.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Tentatively approved. Please write up a high-level spec because there are a lot of questions about implementation specifics.

tags: added: rfe-approved
removed: rfe
tags: added: rfe-postponed
removed: rfe-approved
Revision history for this message
Kevin Benton (kevinbenton) wrote :

I think it's just a matter of implementation details and reviewer resources

Revision history for this message
Reedip (reedip-banerjee) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-specs (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/445762
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Reedip (<email address hidden>) on branch: master
Review: https://review.opendev.org/445762
Reason: Dont have sufficient time and bandwidth for this right now. If someone else wants to take it up, then it would be very helpful

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers