8 nodes transition to failed state within very short period of time
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Undecided
|
Unassigned |
Bug Description
We had 8 nodes transition to failed state within 2 minutes. I couldn't tell from the logs (attached) what could have caused that to happen. It hit that rash of failures then things went relatively quiet for the past day. This resulted in the servers being in pending state in juju_status.yaml with a few of these servers came from one OpenStack deployment build.
See below from juju_status.yaml:
'1':
agent-state: pending
containers:
1/lxc/0:
series: trusty
1/lxc/1:
series: trusty
dns-name: hayward-43.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-
instance-id: /MAAS/api/
series: trusty
'2':
agent-state: pending
containers:
2/lxc/0:
series: trusty
dns-name: hayward-21.oil
hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hw-
instance-id: /MAAS/api/
series: trusty
'3':
agent-state: pending
containers:
3/lxc/0:
series: trusty
dns-name: apsaras.oil
hardware: arch=amd64 cpu-cores=4 mem=32768M tags=hw-
instance-id: /MAAS/api/
series: trusty
...
'5':
agent-state: pending
containers:
5/lxc/0:
series: trusty
5/lxc/1:
series: trusty
dns-name: hayward-49.oil
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=hw-
instance-id: /MAAS/api/
series: trusty
From maas logs:
Node failed to deploy, but we're looking at a number of 8 failed deployments during a ~2 minute window:
May 11 13:16:29 maas-trusty-
May 11 13:16:35 maas-trusty-
May 11 13:16:53 maas-trusty-
May 11 13:17:18 maas-trusty-
May 11 13:18:08 maas-trusty-
May 11 13:18:12 maas-trusty-
May 11 13:18:19 maas-trusty-
May 11 13:18:19 maas-trusty-
I have attached some logs from maas server.
The attached regiond log only goes to May 08, and the attached maas.log only goes to Apr 25. Are you sure you attached the correct logs?
Going to mark this as incomplete until correct logs are attached.