Bug #1993716 “[2.9] pod/node affinity for sidecar charms not imp...” : Bugs : Canonical Juju

Thomas Miller (tlmiller) on 2022-11-09

Changed in juju:
assignee:	nobody → Thomas Miller (tlmiller)

Thomas Miller (tlmiller) on 2022-11-15

Changed in juju:
assignee:	Thomas Miller (tlmiller) → Ian Booth (wallyworld)

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-15:

#1

juju uses the constraint tags that are prefixed with "pod." or "anti-pod." or "node." into pod affinity selectors.
your constraints are: --constraints="tags=mldatanode=true,^mlgpunode=true"
The tag keys don't have the required prefixes to be translated into affinity selection expressions.

A contrived example.

juju deploy somecharm --constraints="tags=node.foo=a|b|c,^bar=d|e|f,^foo=g|h,pod.foo=1|2|3,^pod.bar=4|5|6,anti-pod.afoo=x|y|z,^anti-pod.abar=7|8|9"

would result in

kubectl get -o json statefulset.apps/somecharm | jq .spec.template.spec.affinity
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "bar",
              "operator": "NotIn",
              "values": [
                "d",
                "e",
                "f"
              ]
            },
            {
              "key": "foo",
              "operator": "NotIn",
              "values": [
                "g",
                "h"
              ]
            },
            {
              "key": "foo",
              "operator": "In",
              "values": [
                "a",
                "b",
                "c"
              ]
            }
          ]
        }
      ]
    }
  },
  "podAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "bar",
              "operator": "NotIn",
              "values": [
                "4",
                "5",
                "6"
              ]
            },
            {
              "key": "foo",
              "operator": "In",
              "values": [
                "1",
                "2",
                "3"
              ]
            }
          ]
        },
        "topologyKey": ""
      }
    ]
  },
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "abar",
              "operator": "NotIn",
              "values": [
                "7",
                "8",
                "9"
              ]
            },
            {
              "key": "afoo",
              "operator": "In",
              "values": [
                "x",
                "y",
                "z"
              ]
            }
          ]
        },
        "topologyKey": ""
      }
    ]
  }
}

juju uses the constraint tags that are prefixed with "pod." or "anti-pod." or "node." into pod affinity selectors. 
your constraints are: --constraints="tags=mldatanode=true,^mlgpunode=true"
The tag keys don't have the required prefixes to be translated into affinity selection expressions.

A contrived example.

juju deploy somecharm --constraints="tags=node.foo=a|b|c,^bar=d|e|f,^foo=g|h,pod.foo=1|2|3,^pod.bar=4|5|6,anti-pod.afoo=x|y|z,^anti-pod.abar=7|8|9"

would result in

kubectl get -o json statefulset.apps/somecharm | jq .spec.template.spec.affinity
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "bar",
              "operator": "NotIn",
              "values": [
                "d",
                "e",
                "f"
              ]
            },
            {
              "key": "foo",
              "operator": "NotIn",
              "values": [
                "g",
                "h"
              ]
            },
            {
              "key": "foo",
              "operator": "In",
              "values": [
                "a",
                "b",
                "c"
              ]
            }
          ]
        }
      ]
    }
  },
  "podAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "bar",
              "operator": "NotIn",
              "values": [
                "4",
                "5",
                "6"
              ]
            },
            {
              "key": "foo",
              "operator": "In",
              "values": [
                "1",
                "2",
                "3"
              ]
            }
          ]
        },
        "topologyKey": ""
      }
    ]
  },
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "abar",
              "operator": "NotIn",
              "values": [
                "7",
                "8",
                "9"
              ]
            },
            {
              "key": "afoo",
              "operator": "In",
              "values": [
                "x",
                "y",
                "z"
              ]
            }
          ]
        },
        "topologyKey": ""
      }
    ]
  }
}

Changed in juju:
assignee:	Ian Booth (wallyworld) → nobody
status:	New → Incomplete

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-15:

#2

Can you try with the required constraint syntax and see if it works?

Revision history for this message

Syed Mohammad Adnan Karim (karimsye) wrote on 2022-11-16:

#3

Unfortunately it did not work for me yet.
I updated my kubeflow bundle to contain constraints for all applications in the following forms:

    constraints: tags=node.mldatanode=true,^mlgpunode=true
    constraints: tags="node.mldatanode=true,^mlgpunode=true"
    constraints: tags="node.mldatanode=true,^node.mlgpunode=true"

and redeployed multiple times but the pods still land on nodes with a GPU that are labelled mlgpunode=true.

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-16:

#4

To help understand what is happening, we need the k8s statefulset config info like was shown in comment #1. Also the full node config info with the tags etc.

Revision history for this message

Syed Mohammad Adnan Karim (karimsye) wrote on 2022-11-16:

#5

Here is the example of the kubeflow-dashboard-operator that ends up on a GPU node (las2-mlgpu43).
The application is specified in the bundle as follows:
  kubeflow-dashboard:
    charm: kubeflow-dashboard
    channel: 1.6/stable
    scale: 1
    _github_repo_name: kubeflow-dashboard-operator
    constraints: tags=node.mldatanode=true,^node.mlgpunode=true

$ kubectl get statefulsets.apps -n kubeflow kubeflow-dashboard-operator -o json | jq .spec.template.spec.affinity
null

Here is the full YAML for the kubeflow-dashboard-operator statefulset: https://pastebin.canonical.com/p/RGb2YyzRr3/
Here is the full JSON for the kubeflow-dashboard-operator statefulset: https://pastebin.canonical.com/p/Y6pYgHxQzD/

The cluster has the following nodes:
NAME STATUS ROLES AGE VERSION
las2-mlgpu41 Ready <none> 29d v1.24.3
las2-mlgpu43 Ready <none> 28d v1.24.3
lv01-mlkfwapp-l01 Ready <none> 34d v1.24.6
lv01-mlkfwapp-l02 Ready <none> 34d v1.24.6
lv01-mlkfwapp-l03 Ready <none> 34d v1.24.6
lv01-mlkfwapp-l04 Ready <none> 34d v1.24.6
lv01-mlkfwapp-l05 Ready <none> 34d v1.24.6
lv1-mlksapp-l01 Ready control-plane,master 35d v1.24.6
lv1-mlksapp-l02 Ready control-plane,master 35d v1.24.6
lv1-mlksapp-l03 Ready control-plane,master 35d v1.24.6
lv1-mlksapp-l04 Ready control-plane,master 35d v1.24.6
lv1-mlksapp-l05 Ready control-plane,master 35d v1.24.6

Here is the full YAML for the nodes in the cluster: https://pastebin.canonical.com/p/gjqR2hnjYh/
Here is the full JSON for the nodes in the cluster: https://pastebin.canonical.com/p/R8PKD3DBPq/

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-16:

#6

This

$ kubectl get statefulsets.apps -n kubeflow kubeflow-dashboard-operator -o json | jq .spec.template.spec.affinity
null

seems to show that the constraints aren't being applied. Maybe it's a bundle processing bug - I wonder what happens if the charm is deployed as a charm outside of any bundle.

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-17 (last edit on 2022-11-17):

#7

I tested with a sidecar charm and it worked as expected. This is on Juju 3.0. I suspect it will be the same on 2.9.

The issue is because kubeflow-dashboard is an older "podspec" charm which is deprecated. It seems somewhere along the way, some of the work to implement sidecar charms broke affinity on podspec charms.

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-17:

#8

Ah, I just noticed you are looking at the operator statefuleset. This is not where the node/pod affinity is applied to. podspec charms deploy 2 stateful sets:

1. for the operator itself (the charm)
2. for the workload

The workload stateful set is created when the charm sets the podspec; this is where the affinity selectors are applied. There's no affinity rules that are applied to the operator pod.

With the transition to sidecar charms, the charm container and workload container all run in the same pod, so the affinity rules are applied to that single pod, that why the sidecar example I tried worked.

We're not doing any more work on podspec charms so will not be adding affinity support to the podspec charm operator pod.

Changed in juju:
status:	Incomplete → Won't Fix

Revision history for this message

Camille Rodriguez (camille.rodriguez) wrote on 2022-11-17:

#9

@Ian Not supporting that feature on podspec charms means that charmed kubeflow will not support placement directives for another 1-1.5 year. This is a critical functionality for any charmed app deployment on kubernetes. What would be the level of effort required to make this backward compatible with podspec charms ?

Revision history for this message

Syed Mohammad Adnan Karim (karimsye) wrote on 2022-11-17:

#10

I just tried this in a bundle and CLI (constraints="tags=mldatanode=true,^mlgpunode=true") with a sidecar charm (training-operator) and it did not respect the node affinity (it did not show up in the statefulset):
https://pastebin.canonical.com/p/mp7v5BVYT4/

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-17:

#11

Your tag names are missing the required "node" and/or "pod" prefixes :-)

Revision history for this message

Syed Mohammad Adnan Karim (karimsye) wrote on 2022-11-18:

#12

Sorry that was a typo but just to be sure I tried again with:
juju deploy training-operator --constraints="tags=node.mldatanode=true,^node.mlgpunode=true"
and it still ended up on a node labelled with mlgpunode=true. Here is the deployed statefulset YAML again:
https://pastebin.canonical.com/p/X7h3wq9hP9/

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-18:

#13

I guess you're using Juju 2.9.

I checked the code and it seems affinity support for sidecar charms was only added in Juju 3.x. I had thought both podspec and sidecar charms supported it even in 2.9, but it seems 2.9 doesn't support affinity for sidecar charms. We can look to add in this support.

@Camille to clarify, 2.9 does support affinity for podspec charms, but only for the workload pod, not the operator pod that runs the charm. The original thinking way back when this was done was that it's the workload that needs access to gpu etc. Because podspec charms spin up 2 statefulsets, one for the operator and one for the workload, there's no good way to use the constraint syntax to provide a different set of affinity rules for the operator vs workload pods. But you can use the same approach as already used in sidecar charms for changing the cluster in ways juju doesn't support - use the k8s api client from the charm to update the operator's statefulset podspec template.

summary:	- placement directives for k8s cloud not working + [2.9] pod/node affinity for sidecar charms not implemented
Changed in juju:
milestone:	none → 2.9.38
status:	Won't Fix → Triaged
importance:	Undecided → Wishlist

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-18:

#14

I backported support for affinity for sidecar charms from juju 3
https://github.com/juju/juju/pull/14897

Changed in juju:
assignee:	nobody → Ian Booth (wallyworld)
status:	Triaged → In Progress

Ian Booth (wallyworld) on 2022-11-18

Changed in juju:
status:	In Progress → Fix Committed

Revision history for this message

Camille Rodriguez (camille.rodriguez) wrote on 2022-11-22:

#15

Hi Ian - can you provide a timeline for the backport to be packaged and available to use?

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-11-23:

#16

We hope to have 2.9.38 candidate out next week (currently 3.0.2 is being tested).
Until then you can try with the edge snap.

Canonical Juju QA Bot (juju-qa-bot) on 2023-01-17

Changed in juju:
status:	Fix Committed → Fix Released

Canonical Juju

[2.9] pod/node affinity for sidecar charms not implemented

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches