2015-05-13 01:06:02 |
Larry Michel |
description |
We had 8 nodes transition to failed state within 2 minutes. I couldn't tell from the logs (attached) what could have caused that to happen. It hit that rash of failures then things went relatively quiet for the past day. This resulted in pending state and a few of these servers came from a singled OpenStack deployment:
'1':
agent-state: pending
containers:
1/lxc/0:
instance-id: pending
series: trusty
1/lxc/1:
instance-id: pending
series: trusty
dns-name: hayward-43.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-ok,oil,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-9df8a42a-c4cd-11e3-824b-00163efc5068/
series: trusty
'2':
agent-state: pending
containers:
2/lxc/0:
instance-id: pending
series: trusty
dns-name: hayward-21.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-ok,oil-slave-4,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-a38685ec-c4cd-11e3-8102-00163efc5068/
series: trusty
'3':
agent-state: pending
containers:
3/lxc/0:
instance-id: pending
series: trusty
dns-name: apsaras.oil
hardware: arch=amd64 cpu-cores=4 mem=32768M tags=hw-ok,oil-slave-1,hardware-dell-poweredge-R210,console-com2
instance-id: /MAAS/api/1.0/nodes/node-5f9c14e6-ae98-11e3-b194-00163efc5068/
series: trusty
...
'5':
agent-state: pending
containers:
5/lxc/0:
instance-id: pending
series: trusty
5/lxc/1:
instance-id: pending
series: trusty
dns-name: hayward-49.oil
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=hw-ok,oil-slave-1,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-9ff3a32e-c4cd-11e3-824b-00163efc5068/
series: trusty
From maas logs:
Node failed to deploy, but we're looking at a number of 8 failed deployments during a ~2 minute window:
May 11 13:16:29 maas-trusty-back-may22 maas.node: [INFO] hayward-43: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:16:35 maas-trusty-back-may22 maas.node: [INFO] hayward-36: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:16:53 maas-trusty-back-may22 maas.node: [INFO] hayward-21: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:17:18 maas-trusty-back-may22 maas.node: [INFO] apsaras.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:08 maas-trusty-back-may22 maas.node: [INFO] reading.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:12 maas-trusty-back-may22 maas.node: [INFO] glover.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:19 maas-trusty-back-may22 maas.node: [INFO] hayward-55: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:19 maas-trusty-back-may22 maas.node: [INFO] hayward-49: Status transition from DEPLOYING to FAILED_DEPLOYMENT
I have attached some logs from maas server. |
We had 8 nodes transition to failed state within 2 minutes. I couldn't tell from the logs (attached) what could have caused that to happen. It hit that rash of failures then things went relatively quiet for the past day. This resulted in the servers being in pending state in juju_status.yaml with a few of these servers came from one OpenStack deployment build.
See below from juju_status.yaml:
'1':
agent-state: pending
containers:
1/lxc/0:
instance-id: pending
series: trusty
1/lxc/1:
instance-id: pending
series: trusty
dns-name: hayward-43.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-ok,oil,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-9df8a42a-c4cd-11e3-824b-00163efc5068/
series: trusty
'2':
agent-state: pending
containers:
2/lxc/0:
instance-id: pending
series: trusty
dns-name: hayward-21.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-ok,oil-slave-4,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-a38685ec-c4cd-11e3-8102-00163efc5068/
series: trusty
'3':
agent-state: pending
containers:
3/lxc/0:
instance-id: pending
series: trusty
dns-name: apsaras.oil
hardware: arch=amd64 cpu-cores=4 mem=32768M tags=hw-ok,oil-slave-1,hardware-dell-poweredge-R210,console-com2
instance-id: /MAAS/api/1.0/nodes/node-5f9c14e6-ae98-11e3-b194-00163efc5068/
series: trusty
...
'5':
agent-state: pending
containers:
5/lxc/0:
instance-id: pending
series: trusty
5/lxc/1:
instance-id: pending
series: trusty
dns-name: hayward-49.oil
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=hw-ok,oil-slave-1,hardware-sm15k,hw-glance-sm15k
instance-id: /MAAS/api/1.0/nodes/node-9ff3a32e-c4cd-11e3-824b-00163efc5068/
series: trusty
From maas logs:
Node failed to deploy, but we're looking at a number of 8 failed deployments during a ~2 minute window:
May 11 13:16:29 maas-trusty-back-may22 maas.node: [INFO] hayward-43: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:16:35 maas-trusty-back-may22 maas.node: [INFO] hayward-36: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:16:53 maas-trusty-back-may22 maas.node: [INFO] hayward-21: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:17:18 maas-trusty-back-may22 maas.node: [INFO] apsaras.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:08 maas-trusty-back-may22 maas.node: [INFO] reading.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:12 maas-trusty-back-may22 maas.node: [INFO] glover.local: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:19 maas-trusty-back-may22 maas.node: [INFO] hayward-55: Status transition from DEPLOYING to FAILED_DEPLOYMENT
May 11 13:18:19 maas-trusty-back-may22 maas.node: [INFO] hayward-49: Status transition from DEPLOYING to FAILED_DEPLOYMENT
I have attached some logs from maas server. |
|