Magnum

Flannel overlay performance

Bug #1518605 reported by Ton Ngo on 2015-11-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Magnum	Fix Released	Medium	Ton Ngo	Magnum mitaka-1

Bug Description

Flannel provides the networking at the container level for k8s and optionally for swarm. This is done by overlay: messages are encapsulated with a new header and sent over the underlying network. This introduces an overhead and potentially significant performance degradation.

In a k8s cluster as provisioned by Magnum, a performance study was done using a 10GB network and the following bandwidth was observed:
-Direct server to server (base scenario for comparison): 9.39 GBits/sec
-Direct VM to VM (base scenario for comparison): 7.74 GBits/sec
-Flannel with host-gw backend (between pods in different hosts): 6.0 GBits/sec
-Flannel with VxLAN backend (between pods in different hosts): 1.71 GBits/sec
-Flannel with UDP backend (between pods in different hosts): 0.385 GBits/sec

A large part of the degradation is due to the processing of the header.
In a shared public cloud, the bandwidth degradation may not be a big issue since there may not be a network performance guarantee to the user, and the actual network bandwidth is shared among the users.
However, there are scenarios where the bandwidth degradation is not acceptable, such as a private cloud, or a public cloud where a high performance network is offered.

This data was presented at a talk at the Tokyo Summit, 2015.

See original description

Revision history for this message

Ton Ngo (ton-i) wrote on 2015-11-21:

Angus Lees proposed the host-gw backend as an improvement. This works for our case because all the minions of the k8s cluster are on one Neutron network.

From the discussion on the patch, we will change the label flannel_use_vxlan to something like flannel_backend where the three options can be specified: udp, vxlan, host-gw. The default will be host-gw.

Changed in magnum:
assignee:	nobody → Ton Ngo (ton-i)

Adrian Otto (aotto) on 2015-11-24

Changed in magnum:
milestone:	none → mitaka-1

Ton Ngo (ton-i) on 2016-01-07

description:

updated

Revision history for this message

Ton Ngo (ton-i) wrote on 2016-01-09:

Angus proposed several different approaches for adding the host-gw backend:

1. Introduce a new `flannel_use_hostgw` option and describe how this
composes with the existing `flannel_use_vxlan` boolean.

2. Introduce a new `flannel_backend` enumerated option, transition the
`flannel_use_vxlan` boolean to it, and then introduce the `host-gw`
alternative.

3. Deprecate `flannel_use_vxlan` and use the `host-gw` backend unconditionally.

Patch 1 and 2 implements option 3. After team discussion, we decided to use
option 2 to allow flexibility when new networking topologies are possible in the
future.

Ton Ngo (ton-i) on 2016-01-09

description:	updated
description:	updated

OpenStack Infra (hudson-openstack) on 2016-01-09

Changed in magnum:
status:	New → In Progress

Adrian Otto (aotto) on 2016-02-08

Changed in magnum:
importance:	Undecided → Medium

OpenStack Infra (hudson-openstack) on 2016-02-09

Changed in magnum:
assignee:	Ton Ngo (ton-i) → Angus Lees (gus)

OpenStack Infra (hudson-openstack) on 2016-03-11

Changed in magnum:
assignee:	Angus Lees (gus) → Ton Ngo (ton-i)

OpenStack Infra (hudson-openstack) on 2016-03-11

Changed in magnum:
assignee:	Ton Ngo (ton-i) → hongbin (hongbin034)

OpenStack Infra (hudson-openstack) on 2016-03-17

Changed in magnum:
assignee:	hongbin (hongbin034) → Ton Ngo (ton-i)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-28: Fix merged to magnum (master)

Reviewed: https://review.openstack.org/241866
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=14909f0944649b4de7eb5c8785162defaad0d55a
Submitter: Jenkins
Branch: master

commit 14909f0944649b4de7eb5c8785162defaad0d55a
Author: Angus Lees <email address hidden>
Date: Thu Nov 5 15:12:43 2015 +1100

Add flannel's host-gw backend option

Allow configuring Flannel with 3 different backends

    Magnum deploys k8s/swarm over a dedicated neutron private network,
    possibly using flannel. Flannel's `host-gw` backend gives the best
    performance in this topopolgy (private layer2): no packet processing
    overhead, no reduction to MTU, scales to many hosts as well as the
    alternatives. The performance difference is significant, see bug for
    performance numbers for the 3 backend options.

    Note that part of this change involves relaxing the minion IP spoofing
    rules to allow traffic from all dynamically-allocated flannel subnets.
    This is morally equivalent to what we were doing previously with
    encapsulation - only now neutron is able to see the inner IP header
    directly.

This patch repurposes the label "flannel_use_vxlan" when the network
driver is flannel.

    1. Rename the label flannel_use_vxlan to flannel_backend
    2. Redefine the value of this label from "yes/no"
       to "udp/vxlan/host-gw"

For example, to create a bay model with flannel as network driver:
--network-driver flannel --labels flannel_backend=host-gw

Other backend options are udp and vxlan.

    Co-Authored-By: Ton Ngo <email address hidden>
    Partial-Bug: #1518605
    Closes-Bug: #1516789
    Change-Id: I6d2441664ad1baaca14d0e6ff4bcddbe75bee094

Ton Ngo (ton-i) on 2016-06-07

Changed in magnum:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.