Magnum default images used for kubeclt should be upgraded

Bug #1434468 reported by Kai Qiang Wu(Kennan)
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
Critical
Steven Dake

Bug Description

the defauled fedora-21-atomic image used kubernetes server version is 0.6.0
and kubernetes community said it is two old

if we used new kubectl version, it would have some issues bewteen two sides.

check this

https://github.com/GoogleCloudPlatform/kubernetes/issues/5590

for details.

What's the proper way to upgrade our images ?

Revision history for this message
Kai Qiang Wu(Kennan) (wkqwu) wrote :

Add @sdake, as he is familiar with that. Thanks

Revision history for this message
Steven Dake (sdake) wrote :

The way is a bit complex.

I think what I did last time is booted a fedora-atomic image via libvirt, then ran rpm-ostree upgrade, then saved then stopped the vm. Next I found the vm's storage and used qemu-img to compress the image into a new qcow2.

You would think there is a better way, but there isn't one that I've found.

The fedora community is figuring out how to do 2 week release cycles, but they haven't delivered *any* update yet.

Given our current images are busted, I'll update them. Thanks for letting me know.

Revision history for this message
Steven Dake (sdake) wrote :
Changed in magnum:
status: New → Triaged
Steven Dake (sdake)
Changed in magnum:
importance: Undecided → Critical
assignee: nobody → Steven Dake (sdake)
Revision history for this message
Vilobh Meshram (vilobhmm) wrote :

As discussed with Steven will test the fix on Monday 03/23

Revision history for this message
hongbin (hongbin034) wrote :

It didn't seem to work. Details below:

# First, the bay creation completes, but status of each minion is NotReady
$ magnum bay-list
+--------------------------------------+---------+------------+-----------------+
| uuid | name | node_count | status |
+--------------------------------------+---------+------------+-----------------+
| 538de8d9-c789-43a7-a7e2-dae0ffc75ac4 | testbay | 2 | CREATE_COMPLETE |
+--------------------------------------+---------+------------+-----------------+

$ kubectl get minions
NAME LABELS STATUS
10.0.0.4 <none> NotReady
10.0.0.5 <none> NotReady

# Second, created pod are unassigned and pending forever
$ magnum pod-create --manifest ./redis-master.yaml --bay-id $BAY_UUID
...

$ kubectl get pod
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
redis-master master kubernetes/redis:v1 <unassigned> name=redis,redis-sentinel=true,role=master Pending
                                        sentinel kubernetes/redis:v1

Revision history for this message
hongbin (hongbin034) wrote :

More details about the minions: "Node health check failed: kubelet /healthz endpoint returns not ok"

$ kubectl get minions --output=json
{
    "kind": "List",
    "creationTimestamp": null,
    "resourceVersion": 12,
    "apiVersion": "v1beta1",
    "items": [
        {
            "kind": "Minion",
            "id": "10.0.0.4",
            "uid": "1ab32324-d045-11e4-9d3d-fa163efbaeda",
            "creationTimestamp": "2015-03-22T03:39:54Z",
            "selfLink": "/api/v1beta1/minions/10.0.0.4",
            "resourceVersion": 11,
            "apiVersion": "v1beta1",
            "hostIP": "10.0.0.4",
            "resources": {
                "capacity": {
                    "cpu": "1",
                    "memory": 3221225472
                }
            },
            "status": {
                "conditions": [
                    {
                        "kind": "Ready",
                        "status": "None",
                        "lastTransitionTime": "2015-03-22T03:39:54Z",
                        "reason": "Node health check failed: kubelet /healthz endpoint returns not ok"
                    }
                ]
            }
        },
        {
            "kind": "Minion",
            "id": "10.0.0.5",
            "uid": "1aae6eca-d045-11e4-9d3d-fa163efbaeda",
            "creationTimestamp": "2015-03-22T03:39:54Z",
            "selfLink": "/api/v1beta1/minions/10.0.0.5",
            "resourceVersion": 12,
            "apiVersion": "v1beta1",
            "hostIP": "10.0.0.5",
            "resources": {
                "capacity": {
                    "cpu": "1",
                    "memory": 3221225472
                }
            },
            "status": {
                "conditions": [
                    {
                        "kind": "Ready",
                        "status": "None",
                        "lastTransitionTime": "2015-03-22T03:39:54Z",
                        "reason": "Node health check failed: kubelet /healthz endpoint returns not ok"
                    }
                ]
            }
        }
    ]
}

Revision history for this message
Steven Dake (sdake) wrote :

Well it looks like the minions are registering with etcd. My suspicion is the template is busted. Could you examine the master node via systemctl | grep kube and see if any services are failed? Then check out the logs?

My guess is kubernetes in the master of fedora has had a configuration change that no longer works with our template.

Regards
-steve

Revision history for this message
Steven Dake (sdake) wrote :

Are you using a master version of kubectl?

Revision history for this message
hongbin (hongbin034) wrote :

master:

$ sudo systemctl | grep kube
kube-apiserver.service loaded active running Kubernetes API Server
kube-controller-manager.service loaded active running Kubernetes Controller Manager
kube-scheduler.service loaded active running Kubernetes Scheduler Plugin

$ sudo systemctl | grep failed
#No output

minion:

$ sudo systemctl | grep kube
  kube-proxy.service loaded active running Kubernetes Kube-Proxy Server

$ sudo systemctl | grep failed
● cloud-final.service loaded failed failed Execute cloud user/final scripts
● flanneld.service loaded failed failed Flanneld overlay address etcd agent

Revision history for this message
hongbin (hongbin034) wrote :

I work with sdake for trying the new image as well as the old image. It looks that the flanneld service didn't start correctly in the new image, but works well in the old image.

Related logs/configs were attached.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/166661

Changed in magnum:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/166662

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/166687

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/166688

Revision history for this message
Kai Qiang Wu(Kennan) (wkqwu) wrote :

hi, @sdake, I still not sure what's the correct way for images upgrade.
According to my understanding, the upgrade images(and above commit changes, like dockerroot used, instead of docker. etc.)
was build by ourselves, and fedora official build image seems different with us.

So In the future, do we upgrade image with our made ? Or follow fedora community official image ?

Could you help give the path clarification ? (maybe I misunderstood)

Thanks

Revision history for this message
Steven Dake (sdake) wrote :

fedora 22 image will closely match the fedora 21 image I created manually. the issue is fedora-21-atomic.qcow2 had kubernetes 0.6.0, and fedora-21-atomic-2.qcow2 has kubernetes-0.11.0. Lots of things changed between these two versions of kubernetes. Both RHT and GOOG are beating on Kubernetes and getting it into usable form, but its not there yet.

When we have real deployments, we can worry about backwards compatibility. For now, only developers are working on Magnum, so a little pain here and there is to be expected.

I would like to get to the point where we follow the fedora official community image, but that requires them to stop changing how kubernetes is configured every couple of weeks ;) Still this may be suboptimal because kubernetes changes in a particular release of fedora, and we really need to test against the latest kubernetes atleast in master.

Regards
-steve

Revision history for this message
Kai Qiang Wu(Kennan) (wkqwu) wrote :

Yes, @Sdake, we need test against with the latest kubernetes as much as possible.
Just curious how to make such images in an easy way. You must have some tips on that :)

I have tested your image with fedora 20, it works.

Revision history for this message
Kai Qiang Wu(Kennan) (wkqwu) wrote :

And our image creation seems have some manual creation steps (not same as Fedora official image ? like they use docker as group, we use dockerroot)

so if we have a link or guide for such image creation, it could help a lot for easy reference.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/166661
Committed: https://git.openstack.org/cgit/stackforge/magnum/commit/?id=e6e86f71631d7280fd8dcf9ac472f338acc01a61
Submitter: Jenkins
Branch: master

commit e6e86f71631d7280fd8dcf9ac472f338acc01a61
Author: Steven Dake <email address hidden>
Date: Sun Mar 22 16:41:03 2015 -0700

    Merge heat-kubernetes pull request 14

    Pull request 14 changes the start order of Kubernetes. The master
    now starts first, and the minion starts after the master completes.

    This prevents flanneld in the minions from timing out and failing to
    start the kubernetes services on the minions.

    See:
    https://github.com/larsks/heat-kubernetes/pull/14/

    Change-Id: Ia823fd3593dc7c5d9d9c6327e009b833ad586a5c
    Closes-bug: #1434468

Changed in magnum:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/166687
Committed: https://git.openstack.org/cgit/stackforge/magnum/commit/?id=20bf6ed1cfafbd644cc7495aa5084d8c31d2d8ab
Submitter: Jenkins
Branch: master

commit 20bf6ed1cfafbd644cc7495aa5084d8c31d2d8ab
Author: Steven Dake <email address hidden>
Date: Sun Mar 22 20:59:23 2015 -0700

    Merge heat-kubernetes pull request 15

    This changes the adding of the docker user to dockerroot to match
    the changes to fedora.

    See:
    https://github.com/larsks/heat-kubernetes/pull/15/

    Change-Id: I54f2567d487ffdb4108439438ffd64c0334d1f8b
    loses-bug: #1434468

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/166688
Committed: https://git.openstack.org/cgit/stackforge/magnum/commit/?id=d8093029fbf870783561ac1db38ce58b94e41151
Submitter: Jenkins
Branch: master

commit d8093029fbf870783561ac1db38ce58b94e41151
Author: Steven Dake <email address hidden>
Date: Sun Mar 22 21:02:05 2015 -0700

    Merge heat-kubernetes pull request 16

    Fix an incorrect option for starting kubelet which was stopping
    kubernetes from operating with new fedora atomic images.

    See:
    https://github.com/larsks/heat-kubernetes/pull/16/

    Change-Id: Idfc276b39eb761727a4ecb87991e9866fab6692a
    Closes-bug: #1434468

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/166662
Committed: https://git.openstack.org/cgit/stackforge/magnum/commit/?id=d915e4e063b7ea69483088c7c6bb5122262f0e64
Submitter: Jenkins
Branch: master

commit d915e4e063b7ea69483088c7c6bb5122262f0e64
Author: Steven Dake <email address hidden>
Date: Sun Mar 22 16:47:55 2015 -0700

    Modify documentation to point to kubernetes-0.11 atomic image

    The current fedora atomic image uses kubernetes 0.6.0. The latest
    atomic image uses 0.11.0. Unfortunately the Fedora community doesn't
    release updates to atomic until a major release.

    Change-Id: I6ab93469b7e57e1eeeb730189bcc63f1cd5da0bb
    Closes-bug: #1434468

Adrian Otto (aotto)
Changed in magnum:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.