Comment 8 for bug 1717301

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1717301] Re: cannot add unit for application due to EOF from MAAS

John,

There is definitely something wrong in MAAS where it's giving EOF's.
That's filed under bug 1718016.

This bug is about the EOF's that surface during "juju deploy" and cause a
failure, without any apparent retries, which I've seen both on a fresh
deploy and a redeploy of the same bundle.

Jason

On Tue, Oct 3, 2017 at 5:25 AM, John A Meinel <email address hidden>
wrote:

> If you grep the logs, you'll find that you're actually getting quite a few
> EOF messages from various places.
> 2017-09-21 19:57:05 DEBUG juju.apiserver request_notifier.go:171 -> [D82]
> machine-12 31.851599396s {"request-id":90,"error":"cannot get network
> interfaces of \"wptehh\": getting instance \"wptehh\": unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/machines/?agent_name=
> e19034fd-016f-48af-8128-f8aad8ce3de4\u0026id=wptehh:
> EOF","response":"'body redacted'"} Provisioner[""].
> SetHostMachineNetworkConfig
> 2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:462 looking up
> static routes generated IsUnexpectedError, but didn't match: "unexpected:
> Get http://10.245.208.33/MAAS/api/2.0/static-routes/: EOF"
> &errors.Err{message:"", cause:(*gomaasapi.UnexpectedError)(0xc427b83b30),
> previous:(*errors.Err)(0xc427b83a90), file:"github.com/juju/
> gomaasapi/errors.go", line:41}
> 2017-09-21 20:04:00 WARNING juju.provider.maas devices.go:468 error
> looking up static-routes: unexpected: Get http://10.245.208.33/MAAS/api/
> 2.0/static-routes/: EOF
> 2017-09-21 20:04:00 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6",
> Status:"allocating", Info:"failed to start instance (unable to look up
> static-routes: unexpected: Get http://10.245.208.33/MAAS/api/
> 2.0/static-routes/: EOF), retrying in 10s (10 more attempts)",
> Data:map[string]interface {}(nil)}
>
> I'm not sure why you're getting EOF for static-routes instead of a 404 if
> it isn't implemented, or something useful if it is.
> Note that we do clearly retry that one, though the retry is at the level
> of doing all the provisioning steps, AFAICT.
>
> 2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:171 -> [DA3]
> machine-1 30.005683052s {"request-id":183,"error":"Get
> http://10.245.208.33/MAAS/api/2.0/version/: EOF","response":"'body
> redacted'"} Provisioner[""].PrepareContainerInterfaceInfo
> 2017-09-21 20:04:41 DEBUG juju.apiserver request_notifier.go:145 <- [DA3]
> machine-1 {"request-id":192,"type":"Provisioner","version":4,"
> request":"SetInstanceStatus","params":"'params redacted'"}
> 2017-09-21 20:04:41 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6",
> Status:"allocating", Info:"failed to start instance (Get
> http://10.245.208.33/MAAS/api/2.0/version/: EOF), retrying in 10s (9 more
> attempts)", Data:map[string]interface {}(nil)}
> 2017-09-21 20:05:13 DEBUG juju.apiserver request_notifier.go:171 -> [DAF]
> machine-18 30.035373959s {"request-id":148,"error":"cannot get provider
> network config: failed to construct a model from config: unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/users/?op=whoami:
> EOF","response":"'body redacted'"} Provisioner[""].
> SetHostMachineNetworkConfig
> 2017-09-21 20:05:13 DEBUG juju.apiserver request_notifier.go:145 <- [DAF]
> machine-18 {"request-id":178,"type":"Provisioner","version":4,"
> request":"SetInstanceStatus","params":"'params redacted'"}
> 2017-09-21 20:05:13 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-18-lxd-5",
> Status:"allocating", Info:"failed to start instance (cannot get provider
> network config: failed to construct a model from config: unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/users/?op=whoami: EOF), retrying in 10s
> (10 more attempts)", Data:map[string]interface {}(nil)}
> 2017-09-21 20:05:22 DEBUG juju.apiserver request_notifier.go:171 -> [DA3]
> machine-1 30.025484296s {"request-id":195,"error":"unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/users/?op=whoami:
> EOF","response":"'body redacted'"} Provisioner[""].
> PrepareContainerInterfaceInfo
> 2017-09-21 20:05:22 DEBUG juju.apiserver request_notifier.go:145 <- [DA3]
> machine-1 {"request-id":196,"type":"Provisioner","version":4,"
> request":"SetInstanceStatus","params":"'params redacted'"}
> 2017-09-21 20:05:22 DEBUG juju.apiserver.provisioner provisioner.go:963
> SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-1-lxd-6",
> Status:"allocating", Info:"failed to start instance (unexpected: Get
> http://10.245.208.33/MAAS/api/2.0/users/?op=whoami: EOF), retrying in 10s
> (8 more attempts)", Data:map[string]interface {}(nil)}
>
>
> Something seems very unhappy about wptehh in all of that, as it keeps
> failing over and over (maybe its all just machine-1 and different
> containers on it, or something).
>
> Do you have any logs from the MAAS controller at the same time? It seems
> pretty unexpected for it to be giving EOF, which would hint more that
> something is wrong internally causing MAAS to segfault instead of
> providing proper HTTP error responses.
>
> Now, all of the log entries that I see appear to be wrapped in a retry,
> but its entirely possible that we were missing one of them, which is why
> the bundle deployment failed.
> Though I'm pretty surprised to see at least 5 EOF occurrences triggering.
>
> Anyway, looking through the logs I don't see the smoking gun of what
> actually triggered the failure for deploy, which is a bit of a shame,
> but I do seem to see a Juju and MaaS that aren't particularly happy
> together.
>
>
> ** Changed in: juju
> Status: New => Triaged
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1717301
>
> Title:
> cannot add unit for application due to EOF from MAAS
>
> Status in juju:
> Triaged
>
> Bug description:
> maas_2.2.2-6099-g8751f91-0ubuntu1~16.04.1
> juju_1:2.2.3-0ubuntu1~16.04.1~juju1
> ocata
> xenial
>
> A run of 'juju deploy' failed with this error:
> ERROR cannot deploy bundle: cannot add unit for application "keystone":
> cannot add unit 1/1 to application "keystone": cannot add unit to
> application "keystone": unexpected: Get http://10.245.208.33/MAAS/api/
> 2.0/boot-resources/: EOF
>
> Here is the full output:
> http://pastebin.ubuntu.com/25534631/
>
> Looking at the MAAS logs, MAAS was up and servicing requests the whole
> time. I'm not sure where the EOF came from.
>
> Expected behavior: this seems to be a temporary communication problem
> between juju and MAAS. I don't know why it would affect the 'deploy'
> run at the client at all; asking about boot resources seems like
> something that should be happening on the back end? In either case,
> juju should retry this read operation.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1717301/+subscriptions
>