Comment 7 for bug 1817941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/657087
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=4802f1d96a1217124e39a057fd7a05e22177b81c
Submitter: Zuul
Branch: master

commit 4802f1d96a1217124e39a057fd7a05e22177b81c
Author: Al Bailey <email address hidden>
Date: Fri May 3 15:11:57 2019 -0500

    Changing tiller pod networking settings to improve swact time

    Based on investigation by Matt, the tiller-deploy pod was running
    in the cluster network namespace and therefore not inheriting host
    TCP keepalive parameters.

    During a swact, when the floating IP is taken down, tiller keepalive
    is so large its the kube-apiserver detects the timeout after 15 minutes
    (5 probes * 180 seconds)

    The cluster namespace values are 9 probes at 75 second intervals.
    The host TCP values are 5 consecutive probes at 1 second intervals.

    By changing the tiller pod to be deployed using the host network,
    it will inherit the host sysctl values and detect much more quickly.
    (10 seconds)

    Adding additional override settings during helm init for tiller
    helm init <params> --override spec.template.spec.hostNetwork=true

    These changes were added to the ansible playbook.

    Change-Id: I218e4ef37100950c8ac5a0cb9759d9df50d9e368
    Closes-Bug: 1817941
    Partial-Bug: 1818123
    Co-Authored-By: Matt Peters <email address hidden>
    Signed-off-by: Al Bailey <email address hidden>