Timed out error on pod-create

Bug #1469748 reported by Nikunj Aggarwal
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Magnum
Invalid
Undecided
Eli Qiao

Bug Description

* The command that failed

magnum pod-create --manifest ./redis-master.yaml --bay k8sbay

DEBUG (shell:577) Timed out waiting for a reply to message ID 6fbfa9c03dae430f96b6fed0d55ef7cc (HTTP 500)
Traceback (most recent call last):
  File "/opt/stack/python-magnumclient/magnumclient/shell.py", line 574, in main
    OpenStackMagnumShell().main(map(encodeutils.safe_decode, sys.argv[1:]))
  File "/opt/stack/python-magnumclient/magnumclient/shell.py", line 519, in main
    args.func(self.cs, args)
  File "/opt/stack/python-magnumclient/magnumclient/v1/shell.py", line 293, in do_pod_create
    node = cs.pods.create(**opts)
  File "/opt/stack/python-magnumclient/magnumclient/v1/pods.py", line 93, in create
    return self._create(self._path(), new)
  File "/opt/stack/python-magnumclient/magnumclient/common/base.py", line 49, in _create
    resp, body = self.api.json_request('POST', url, body=body)
  File "/opt/stack/python-magnumclient/magnumclient/common/httpclient.py", line 196, in json_request
    resp, body_iter = self._http_request(url, method, **kwargs)
  File "/opt/stack/python-magnumclient/magnumclient/common/httpclient.py", line 179, in _http_request
    error_json.get('debuginfo'), method, url)
InternalServerError: Timed out waiting for a reply to message ID 6fbfa9c03dae430f96b6fed0d55ef7cc (HTTP 500)
ERROR: Timed out waiting for a reply to message ID 6fbfa9c03dae430f96b6fed0d55ef7cc (HTTP 500)

* The manifest I used

stack@nikunj:~/kubernetes/examples/redis/v1beta3$ cat redis-master.yaml
apiVersion: v1beta3
kind: Pod
metadata:
  labels:
    name: redis
    redis-sentinel: "true"
    role: master
  name: redis-master
spec:
  containers:
    - name: master
      image: kubernetes/redis:v1
      env:
        - name: MASTER
          value: "true"
      ports:
        - containerPort: 6379
      resources:
        limits:
          cpu: "1"
      volumeMounts:
        - mountPath: /redis-master-data
          name: data
    - name: sentinel
      image: kubernetes/redis:v1
      env:
        - name: SENTINEL
          value: "true"
      ports:
        - containerPort: 26379
  volumes:
    - name: data
      emptyDir: {}

* Trace in server

2015-06-29 20:08:58.092 ERROR wsme.api [req-3d9ca7bd-96f9-4ae1-bc7b-718004d28d6f admin demo] Server-side error: "Timed out waiting for a reply to message ID 6fbfa9c03dae430f96b6fed0d55ef7cc". Detail: ^M
Traceback (most recent call last):^M
^M
  File "/usr/local/lib/python2.7/dist-packages/wsmeext/pecan.py", line 84, in callfunction^M
    result = f(self, *args, **kwargs)^M
^M
  File "/opt/stack/magnum/magnum/api/controllers/v1/pod.py", line 267, in post^M
    new_pod = pecan.request.rpcapi.pod_create(pod_obj)^M
^M
  File "/opt/stack/magnum/magnum/conductor/api.py", line 86, in pod_create^M
    return self._call('pod_create', pod=pod)^M
^M
  File "/opt/stack/magnum/magnum/common/rpc_service.py", line 88, in _call^M
    return self._client.call(self._context, method, *args, **kwargs)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 393, in call^M
    return self.prepare().call(ctxt, method, **kwargs)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 156, in call^M
    retry=self.retry)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send^M
    timeout=timeout, retry=retry)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 361, in send^M
    retry=retry)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in _send^M
    result = self._waiter.wait(msg_id, timeout)^M
^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 248, in wait^M
    message = self.waiters.get(msg_id, timeout=timeout)^M
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 153, in get^M
    'to message ID %s' % msg_id)^M
^M
MessagingTimeout: Timed out waiting for a reply to message ID 6fbfa9c03dae430f96b6fed0d55ef7cc^M
^M
192.168.136.131 - - [29/Jun/2015 20:08:58] "POST /v1/pods HTTP/1.1" 500 168^M

Eli Qiao (taget-9)
Changed in magnum:
assignee: nobody → Eli Qiao (taget-9)
Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-1
Revision history for this message
Sreekumar S (sreesiv) wrote :

Same issue still exists. Someone working on this?

Revision history for this message
Eli Qiao (taget-9) wrote :

This issue will only happened when the k8s cluster gets something wrong and we failed to communicate with k8s api.

There is a similar blue print to address this, to use async method to do container operations, but I think all these can be applied to k8s operations,
https://blueprints.launchpad.net/magnum/+spec/async-container-operations

please take a look on this blue print and put anything you though useful on it.

thanks
Eli.

Revision history for this message
Sreekumar S (sreesiv) wrote :

Confirmed that it does happen only with k8s, tried with Swarm and it worked.
But is it due to some issue with k8s or with magnum? I understood that the BP is about making these calls to k8s async and then have a poll to get back the status for that operation + block any other dependent operations till this one is over.

I believe with the existing sync call architecture this should be checked and fixed, because it is failing for the trivial example suggested in dev quick start guide http://docs.openstack.org/developer/magnum/dev/dev-quickstart.html#dev-quickstart

Swarm bay example works fine.

Changed in magnum:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.