cannot add unit for application due to EOF from MAAS

Bug #1717301 reported by Jason Hobbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Undecided
Unassigned

Bug Description

maas_2.2.2-6099-g8751f91-0ubuntu1~16.04.1
juju_1:2.2.3-0ubuntu1~16.04.1~juju1
ocata
xenial

A run of 'juju deploy' failed with this error:
ERROR cannot deploy bundle: cannot add unit for application "keystone": cannot add unit 1/1 to application "keystone": cannot add unit to application "keystone": unexpected: Get http://10.245.208.33/MAAS/api/2.0/boot-resources/: EOF

Here is the full output:
http://pastebin.ubuntu.com/25534631/

Looking at the MAAS logs, MAAS was up and servicing requests the whole time. I'm not sure where the EOF came from.

Expected behavior: this seems to be a temporary communication problem between juju and MAAS. I don't know why it would affect the 'deploy' run at the client at all; asking about boot resources seems like something that should be happening on the back end? In either case, juju should retry this read operation.

Chris Gregan (cgregan)
description: updated
Revision history for this message
Anastasia (anastasia-macmood) wrote :

That is weird.. And logs do not have any further details? And just this bundle? or do you have this error reliably with any other bundle deploy? and only with bundle deploy?... Is there a way for you to identify the smallest repro scenario?

Changed in juju:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

It doesn't reproduce reliably, I've seen it happen only once so far. I don't have logs from that run. If it reproduces again I will get logs.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I opened bug 1718016 against MAAS. The cause of the EOF could be the same in both cases. I think juju should still retry a read-only operation that results in EOF though.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I hit this again today:

http://paste.ubuntu.com/25588243/

controller model's machine-0.log shows EOF happening at other times too:
http://paste.ubuntu.com/25588250/

Sometimes it's retried. It doesn't appear to be when it happens in this case, where the error is exposed to the cli

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

/var/log/juju from the controller machine 0

Changed in juju:
status: Incomplete → New
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (5.0 KiB)

If you grep the logs, you'll find that you're actually getting quite a few EOF messages from various places.
2017-09-21 19:57:05 DEBUG juju.apiserver request_notifier.go:171 -> [D82] machine-12 31.851599396s {"request-id":90,"error":"cannot get network interfaces of \"wptehh\": getting instance \"wptehh\": unexpected: Get http://10.245.208.33/MAAS/api/2.0/machines/?agent_name=e19034fd-016f-48af-8128-f8aad8ce3de4\u0026id=wptehh: EOF","response":"'body redacted'"} Provisioner[""].SetHostMachineNetworkConfig
2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:462 looking up static routes generated IsUnexpectedError, but didn't match: "unexpected: Get http://10.245.208.33/MAAS/api/2.0/static-routes/: EOF" &errors.Err{message:"", cause:(*gomaasapi.UnexpectedError)(0xc427b83b30), previous:(*errors.Err)(0xc427b83a90), file:"github.com/juju/gomaasapi/errors.go", line:41}
2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:468 error looking up static-routes: unexpected: Get http://10.245.208.33/MAAS/api/2.0/static-routes/: EOF
2017-09-21 20:04:00 DEBUG juju.apiserver.provisioner provisioner.go:963 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6", Status:"allocating", Info:"failed to start instance (unable to look up static-routes: unexpected: Get http://10.245.208.33/MAAS/api/2.0/static-routes/: EOF), retrying in 10s (10 more attempts)", Data:map[string]interface {}(nil)}

I'm not sure why you're getting EOF for static-routes instead of a 404 if it isn't implemented, or something useful if it is.
Note that we do clearly retry that one, though the retry is at the level of doing all the provisioning steps, AFAICT.

2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:171 -> [DA3] machine-1 30.005683052s {"request-id":183,"error":"Get http://10.245.208.33/MAAS/api/2.0/version/: EOF","response":"'body redacted'"} Provisioner[""].PrepareContainerInterfaceInfo
2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:145 <- [DA3] machine-1 {"request-id":192,"type":"Provisioner","version":4,"request":"SetInstanceStatus","params":"'params redacted'"}
2017-09-21 20:04:41 DEBUG juju.apiserver.provisioner provisioner.go:963 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6", Status:"allocating", Info:"failed to start instance (Get http://10.245.208.33/MAAS/api/2.0/version/: EOF), retrying in 10s (9 more attempts)", Data:map[string]interface {}(nil)}
2017-09-21 20:05:13 DEBUG juju.apiserver request_notifier.go:171 -> [DAF] machine-18 30.035373959s {"request-id":148,"error":"cannot get provider network config: failed to construct a model from config: unexpected: Get http://10.245.208.33/MAAS/api/2.0/users/?op=whoami: EOF","response":"'body redacted'"} Provisioner[""].SetHostMachineNetworkConfig
2017-09-21 20:05:13 DEBUG juju.apiserver request_notifier.go:145 <- [DAF] machine-18 {"request-id":178,"type":"Provisioner","version":4,"request":"SetInstanceStatus","params":"'params redacted'"}
2017-09-21 20:05:13 DEBUG juju.apiserver.provisioner provisioner.go:963 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-18-lxd-5", Status:"allocating", Info:"failed to start instance (c...

Read more...

Changed in juju:
status: New → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

Was this deploying a bundle that was mostly already deployed? Or do we just have a major communication failure where every "Deploying ..." is immediately followed by a "reusing"?

It does feel very unrelated that trying to change the constraints on an existing application would then fail because of not being able to detect the version of MAAS on the underlying provider. The two don't seem like they should be coupled at all.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1717301] Re: cannot add unit for application due to EOF from MAAS
Download full text (6.9 KiB)

John,

There is definitely something wrong in MAAS where it's giving EOF's.
That's filed under bug 1718016.

This bug is about the EOF's that surface during "juju deploy" and cause a
failure, without any apparent retries, which I've seen both on a fresh
deploy and a redeploy of the same bundle.

Jason

On Tue, Oct 3, 2017 at 5:25 AM, John A Meinel <email address hidden>
wrote:

> If you grep the logs, you'll find that you're actually getting quite a few
> EOF messages from various places.
> 2017-09-21 19:57:05 DEBUG juju.apiserver request_notifier.go:171 -> [D82]
> machine-12 31.851599396s {"request-id":90,"error":"cannot get network
> interfaces of \"wptehh\": getting instance \"wptehh\": unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/machines/?agent_name=
> e19034fd-016f-48af-8128-f8aad8ce3de4\u0026id=wptehh:
> EOF","response":"'body redacted'"} Provisioner[""].
> SetHostMachineNetworkConfig
> 2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:462 looking up
> static routes generated IsUnexpectedError, but didn't match: "unexpected:
> Get http://10.245.208.33/MAAS/api/2.0/static-routes/: EOF"
> &errors.Err{message:"", cause:(*gomaasapi.UnexpectedError)(0xc427b83b30),
> previous:(*errors.Err)(0xc427b83a90), file:"github.com/juju/
> gomaasapi/errors.go", line:41}
> 2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:468 error
> looking up static-routes: unexpected: Get http://10.245.208.33/MAAS/api/
> 2.0/static-routes/: EOF
> 2017-09-21 20:04:00 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6",
> Status:"allocating", Info:"failed to start instance (unable to look up
> static-routes: unexpected: Get http://10.245.208.33/MAAS/api/
> 2.0/static-routes/: EOF), retrying in 10s (10 more attempts)",
> Data:map[string]interface {}(nil)}
>
> I'm not sure why you're getting EOF for static-routes instead of a 404 if
> it isn't implemented, or something useful if it is.
> Note that we do clearly retry that one, though the retry is at the level
> of doing all the provisioning steps, AFAICT.
>
> 2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:171 -> [DA3]
> machine-1 30.005683052s {"request-id":183,"error":"Get
> http://10.245.208.33/MAAS/api/2.0/version/: EOF","response":"'body
> redacted'"} Provisioner[""].PrepareContainerInterfaceInfo
> 2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:145 <- [DA3]
> machine-1 {"request-id":192,"type":"Provisioner","version":4,"
> request":"SetInstanceStatus","params":"'params redacted'"}
> 2017-09-21 20:04:41 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6",
> Status:"allocating", Info:"failed to start instance (Get
> http://10.245.208.33/MAAS/api/2.0/version/: EOF), retrying in 10s (9 more
> attempts)", Data:map[string]interface {}(nil)}
> 2017-09-21 20:05:13 DEBUG juju.apiserver request_notifier.go:171 -> [DAF]
> machine-18 30.035373959s {"request-id":148,"error":"cannot get provider
> network config: failed to construct a model from config: unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/users/?op=whoami:
>...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We figured out the MAAS problem - it was an issue with a short timeout in haproxy. We haven't hit this since fixing it.

Changed in juju:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.