Juju 1.24-beta6.1 unit commands in debug-hooks hang indefinitely

Bug #1463117 reported by Charles Butler on 2015-06-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
High
Jesse Meek
1.24
High
Jesse Meek

Bug Description

Symptom: Attach to any unit via juju debug-hooks, when in a hook context none of the unit agent commands work, such as:

- config-get
- unit-get
- juju-log

These commands appear to just hang, as if they are attempting to speak to the socket but are unable to authenticate(?)

as an illustration:

$ time config-get
^C
real 0m13.632s
user 0m0.025s
sys 0m0.055s

I'm going to dump a rather large subset of environmental information in hopes that I capture what would be causing the error. There is no indicator in the logs that something has gone awry - if i were to detach from the debug-hooks session and re-execute the failing hook it completes as intended. This only appears to effect active debug sessions, on any substrate, in 1.24-beta6.1

dump of env:

JUJU_API_ADDRESSES=10.151.22.207:17070
JUJU_UNIT_NAME=registrator/0
TERM=screen
SHELL=/bin/bash
CHARM_DIR=/var/lib/juju/agents/unit-registrator-0/charm
JUJU_REMOTE_UNIT=consul/0
JUJU_METER_INFO=
JUJU_RELATION=consul-api
JUJU_CHARM_DIR=/var/lib/juju/agents/unit-registrator-0/charm
USER=root
JUJU_ENV_NAME=amazon
SUDO_USER=ubuntu
SUDO_UID=1000
JUJU_ENV_UUID=8993534d-1442-43fc-8eeb-d573e8b1ed3f
USERNAME=root
TMUX=/tmp/tmux-0/default,13791,1
BYOBU_BACKEND=tmux
PATH=/var/lib/juju/tools/unit-registrator-0:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
MAIL=/var/mail/root
JUJU_MACHINE_ID=3
PWD=/var/lib/juju/agents/unit-registrator-0/charm
LANG=en_US.UTF-8
TMUX_PANE=%2
JUJU_CONTEXT_ID=registrator/0-consul-api-relation-changed-730251586198999342
PS1=${debian_chroot:+($debian_chroot)}\u@\h:\w\$
SUDO_COMMAND=/bin/bash -c F=$(mktemp); echo IyEvYmluL2Jhc2gKKApjbGVhbnVwX29uX2V4aXQoKSAKeyAKCWVjaG8gIkNsZWFuaW5nIHVwIHRoZSBkZWJ1ZyBzZXNzaW9uIgoJdG11eCBraWxsLXNlc3Npb24gLXQgcmVnaXN0cmF0b3IvMDsgCn0KdHJhcCBjbGVhbnVwX29uX2V4aXQgRVhJVAoKIyBMb2NrIHRoZSBqdWp1LTx1bml0Pi1kZWJ1ZyBsb2NrZmlsZS4KZmxvY2sgLW4gOCB8fCAoCgllY2hvICJGb3VuZCBleGlzdGluZyBkZWJ1ZyBzZXNzaW9ucywgYXR0ZW1wdGluZyB0byByZWNvbm5lY3QiIDI+JjEKCWV4ZWMgdG11eCBhdHRhY2gtc2Vzc2lvbiAtdCByZWdpc3RyYXRvci8wCglleGl0ICQ/CgkpCigKIyBDbG9zZSB0aGUgaW5oZXJpdGVkIGxvY2sgRkQsIG9yIHRtdXggd2lsbCBrZWVwIGl0IG9wZW4uCmV4ZWMgOD4mLQoKIyBXcml0ZSBvdXQgdGhlIGRlYnVnLWhvb2tzIGFyZ3MuCmVjaG8gImUzMEsiIHwgYmFzZTY0IC1kID4gL3RtcC9qdWp1LXVuaXQtcmVnaXN0cmF0b3ItMC1kZWJ1Zy1ob29rcwoKIyBMb2NrIHRoZSBqdWp1LTx1bml0Pi1kZWJ1Zy1leGl0IGxvY2tmaWxlLgpmbG9jayAtbiA5IHx8IGV4aXQgMQoKIyBXYWl0IGZvciB0bXV4IHRvIGJlIGluc3RhbGxlZC4Kd2hpbGUgWyAhIC1mIC91c3IvYmluL3RtdXggXTsgZG8KICAgIHNsZWVwIDEKZG9uZQoKaWYgWyAhIC1mIH4vLnRtdXguY29uZiBdOyB0aGVuCiAgICAgICAgaWYgWyAtZiAvdXNyL3NoYXJlL2J5b2J1L3Byb2ZpbGVzL3RtdXggXTsgdGhlbgogICAgICAgICAgICAgICAgIyBVc2UgYnlvYnUvdG11eCBwcm9maWxlIGZvciBmYW1pbGlhciBrZXliaW5kaW5ncyBhbmQgYnJhbmRpbmcKICAgICAgICAgICAgICAgIGVjaG8gInNvdXJjZS1maWxlIC91c3Ivc2hhcmUvYnlvYnUvcHJvZmlsZXMvdG11eCIgPiB+Ly50bXV4LmNvbmYKICAgICAgICBlbHNlCiAgICAgICAgICAgICAgICAjIE90aGVyd2lzZSwgdXNlIHRoZSBsZWdhY3kganVqdS90bXV4IGNvbmZpZ3VyYXRpb24KICAgICAgICAgICAgICAgIGNhdCA+IH4vLnRtdXguY29uZiA8PEVORAogICAgICAgICAgICAgICAgCiMgU3RhdHVzIGJhcgpzZXQtb3B0aW9uIC1nIHN0YXR1cy1iZyBibGFjawpzZXQtb3B0aW9uIC1nIHN0YXR1cy1mZyB3aGl0ZQoKc2V0LXdpbmRvdy1vcHRpb24gLWcgd2luZG93LXN0YXR1cy1jdXJyZW50LWJnIHJlZApzZXQtd2luZG93LW9wdGlvbiAtZyB3aW5kb3ctc3RhdHVzLWN1cnJlbnQtYXR0ciBicmlnaHQKCnNldC1vcHRpb24gLWcgc3RhdHVzLXJpZ2h0ICcnCgojIFBhbmVzCnNldC1vcHRpb24gLWcgcGFuZS1ib3JkZXItZmcgd2hpdGUKc2V0LW9wdGlvbiAtZyBwYW5lLWFjdGl2ZS1ib3JkZXItZmcgd2hpdGUKCiMgTW9uaXRvciBhY3Rpdml0eSBvbiB3aW5kb3dzCnNldC13aW5kb3ctb3B0aW9uIC1nIG1vbml0b3ItYWN0aXZpdHkgb24KCiMgU2NyZWVuIGJpbmRpbmdzLCBzaW5jZSBwZW9wbGUgYXJlIG1vcmUgZmFtaWxpYXIgd2l0aCB0aGF0LgpzZXQtb3B0aW9uIC1nIHByZWZpeCBDLWEKYmluZCBDLWEgbGFzdC13aW5kb3cKYmluZCBhIHNlbmQta2V5IEMtYQoKYmluZCB8IHNwbGl0LXdpbmRvdyAtaApiaW5kIC0gc3BsaXQtd2luZG93IC12CgojIEZpeCBDVFJMLVBHVVAvUEdET1dOIGZvciB2aW0Kc2V0LXdpbmRvdy1vcHRpb24gLWcgeHRlcm0ta2V5cyBvbgoKIyBQcmV2ZW50IEVTQyBrZXkgZnJvbSBhZGRpbmcgZGVsYXkgYW5kIGJyZWFraW5nIFZpbSdzIEVTQyA+IGFycm93IGtleQpzZXQtb3B0aW9uIC1zIGVzY2FwZS10aW1lIDAKCkVORAogICAgICAgIGZpCmZpCgooCiAgICAjIENsb3NlIHRoZSBpbmhlcml0ZWQgbG9jayBGRCwgb3IgdG11eCB3aWxsIGtlZXAgaXQgb3Blbi4KICAgIGV4ZWMgOT4mLQogICAgaWYgISB0bXV4IGhhcy1zZXNzaW9uIC10IHJlZ2lzdHJhdG9yLzA7IHRoZW4KCQl0bXV4IG5ldy1zZXNzaW9uIC1kIC1zIHJlZ2lzdHJhdG9yLzAKCWZpCgljbGllbnRfY291bnQ9JCh0bXV4IGxpc3QtY2xpZW50cyB8IHdjIC1sKQoJaWYgWyAkY2xpZW50X2NvdW50IC1nZSAxIF07IHRoZW4KCQlzZXNzaW9uX25hbWU9cmVnaXN0cmF0b3IvMCItIiRjbGllbnRfY250CgkJZXhlYyB0bXV4IG5ldy1zZXNzaW9uIC1kIC10IHJlZ2lzdHJhdG9yLzAgLXMgJHNlc3Npb25fbmFtZQoJCWV4ZWMgdG11eCBhdHRhY2gtc2Vzc2lvbiAtdCAkc2Vzc2lvbl9uYW1lIFw7IHNldC1vcHRpb24gZGVzdHJveS11bmF0dGFjaGVkCgllbHNlCgkgICAgZXhlYyB0bXV4IGF0dGFjaC1zZXNzaW9uIC10IHJlZ2lzdHJhdG9yLzAKCWZpCikKKSA5Pi90bXAvanVqdS11bml0LXJlZ2lzdHJhdG9yLTAtZGVidWctaG9va3MtZXhpdAopIDg+L3RtcC9qdWp1LXVuaXQtcmVnaXN0cmF0b3ItMC1kZWJ1Zy1ob29rcwpleGl0ICQ/Cg== | base64 -d > $F; . $F
SHLVL=1
HOME=/home/ubuntu
JUJU_AVAILABILITY_ZONE=
LOGNAME=root
APT_LISTCHANGES_FRONTEND=none
DEBIAN_FRONTEND=noninteractive
JUJU_METER_STATUS=NOT SET
JUJU_HOOK_NAME=consul-api-relation-changed
JUJU_RELATION_ID=consul-api:8
SUDO_GID=1000
JUJU_AGENT_SOCKET=@/var/lib/juju/agents/unit-registrator-0/agent.socket
JUJU_DEBUG=/tmp/tmp.Y5R3djOWzm
_=/usr/bin/env

Networking output - full environment connectivity is working. eg: i'm able to ping the state server.

docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
          inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 22:00:0a:b9:86:fe
          inet addr:10.185.134.254 Bcast:10.185.134.255 Mask:255.255.255.192
          inet6 addr: fe80::2000:aff:feb9:86fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:86160 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18324 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:119611778 (119.6 MB) TX bytes:2025178 (2.0 MB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Let me know if there is any additional information i can provide about this.

Curtis Hovey (sinzui) on 2015-06-08
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.0
Charles Butler (lazypower) wrote :

Attaching all unit logs that seem relevant:

machine-3.log
http://paste.ubuntu.com/11653320/

unit-registrator-0.log
http://paste.ubuntu.com/11653329/

machine-0.log
http://paste.ubuntu.com/11653353/

all-machines.log
http://paste.ubuntu.com/11653354/

Nate Finch (natefinch) on 2015-06-08
summary: - Juju 1.24-beta6.1 unit commands hang indefinately
+ Juju 1.24-beta6.1 unit commands hang indefinitely

This also appears to hang commands in an action context, i.e., action-get, action-set, etc.

Tim Penhey (thumper) wrote :

It seems that 1.24 beta 5 was ok, beta 6 no.

Need to check what changed between those two releases.

Jesse Meek (waigani) wrote :

The bug was introduced in commit f35a8c6, which aimed to fix bug #1454678. The problem is at cmd/jujud/main.go:87

stdin, err := ioutil.ReadAll(ctx.Stdin)

As opposed to jujuc commands run by agents without a terminal, debug-hooks is run via the terminal - which means ioutil.ReadAll waits indefinitely for stdin, but never gets any.

I explored several solutions and discussed with axw. In summary:

A possible solution is to pass through the file descriptor so the alias command can be executed directly from the terminal, though this may not work on Windows. Another solution could be to ditch net/rpc for juju/rpc, so we can have the server request data from stdin on demand. In the short term, reverting f35a8c6 will fix this bug and the underlying problem should be tackled in a proper fix to 1454678, which I'll reopen.

 PR to revert it: http://reviews.vapour.ws/r/1895

Jesse Meek (waigani) on 2015-06-09
Changed in juju-core:
status: Triaged → In Progress
summary: - Juju 1.24-beta6.1 unit commands hang indefinitely
+ Juju 1.24-beta6.1 unit commands in debug-hooks hang indefinitely
Jesse Meek (waigani) on 2015-06-09
Changed in juju-core:
assignee: nobody → Jesse Meek (waigani)
Jesse Meek (waigani) on 2015-06-11
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui) on 2015-06-11
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers