debug-hooks needs to end hooks more robustly

Bug #791841 reported by Kapil Thangavelu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pyjuju
Fix Released
High
Gustavo Niemeyer

Bug Description

While debugging hook its possible for the user to close a screen window, but still have the hook script (debug variant) running and waiting for the end of the interactive process. An IRC conversation inlined below provides some context and alternate mechanism that should be more robust.

<kim0> fired debug-hooks drupal/0 .. works
<kim0> add-relation drupal mysql
<kim0> I am not getting any new windows in the debug-hooks screen
<niemeyer> Hmm
<niemeyer> kim0: Thinking
<kim0> sure
<kim0> I think I saw this yesterday
<kim0> when I closed the debug-hooks screen ..
<kim0> hooks suddenly started firing
<kim0> it's like it was stuck
<niemeyer> kim0: That's normal
<kim0> but I always blame myself :)
<niemeyer> kim0: Hooks are serially executed
<kim0> well there were not opening new windows in screen session
<kim0> they should right ?
<niemeyer> kim0: You won't get another hook window until you stop the existing one
<kim0> there was no existing one .. I was waiting for it
<kim0> just like now .. there's only window 0 in screen
<niemeyer> kim0: and what's 0?
<hazmat> kim0, it was disabled (the ssh fingerprint confirm prompt)
<kim0> niemeyer: just a shell
<hazmat> but it leaves things open to man in the middle
<hazmat> we should pull it down though automatically
<niemeyer> hazmat: Do you have any ideas of what might be going on for kim0?
<kim0> hazmat: yeah .. check my wish list :) we could ec2-get-console-output and verify it :)
<niemeyer> kim0: WE can do better than that
<niemeyer> kim0: We should inject the host key
<niemeyer> kim0: That's in our wishlist already :)
<hazmat> niemeyer, man in the middle was the primary reason fingerprint checking was renabled yes?
* hazmat reads through log
<niemeyer> hazmat: That's right
<kim0> niemeyer: cool !
<kim0> niemeyer: cloud-init can inject host key already indeed .. that's even better
<hazmat> indeed, we should probably make use of that, but we need to store in zk for multi-client access
<hazmat> kim0, okay.. so you've got a debug hook session on drupal or mysql?
<hazmat> when doing the add relation
<kim0> drupal
<kim0> hazmat: debug-hooks drupal/0
<kim0> hazmat: add-relation drupal mysql
<kim0> that's it .. no new window in screen
<hazmat> kim0, okay.. so you do debugs for install & start?
<hazmat> or are you debugging after start?
<kim0> the sequence was
<kim0> deploy mysql
<kim0> deploy drupal
<kim0> debug-hooks drupal/0
<kim0> debug-log
<kim0> add-relation mysql drupal
<hazmat> kim0, could you paste your debug-log
<kim0> I got a "install" or "start" hook here can't remember .. which I closed
<kim0> I expected to get the db-relation-changed one after it .. but didn't
<kim0> sure
<hazmat> kim0, by closing are you exiting the shell or just closing the window?
<hazmat> hmm
<hazmat> i don't think i've tested closing the window instead of exiting the shell
<kim0> hazmat: ctrl + d
<kim0> exit shell
<kim0> hazmat: log http://paste.ubuntu.com/616692/
<hazmat> hmm that should be fine
<niemeyer> hazmat: Should be equivalent
<kim0> hazmat: status → http://paste.ubuntu.com/616694/
<hazmat> niemeyer, yeah.. but on the close window case, there is still a callback to screen to close the window after the process exit
* zaid_h (~zaid@64.34.151.178) has joined #ubuntu-ensemble
<hazmat> but ctrl +d vs. exit is equiv
<niemeyer> hazmat: I don't understand that distinction
<hazmat> kim0, odd it seems like the unit hasn't picked up the relation
<kim0> :s
<kim0> can u connect to the env ?
<niemeyer> hazmat: The callback will execute after the shell process exits, right? Either option would kill it
<niemeyer> hazmat: IOW, closing the window also terminates the shell
<niemeyer> hazmat: This would happen if the hook wasn't executed
<hazmat> niemeyer, right, but we have another process checking on the shell and then instructing screen to kill the window, which is probably just a noop at that point
<hazmat> niemeyer, its unrelated to what kim0 is seeing
<niemeyer> hazmat: I mean, the relation not showing up
<niemeyer> kim0: Can you please paste ps auxw for that machine?
<niemeyer> kim0: The drupal one
<kim0> niemeyer: from the debug-hooks screen is ok right ?
<niemeyer> hazmat: Hmm.. unless we're running the shell script with -e, and screen exits with 1 because the window wasn't there?
* niemeyer doing guess work
<niemeyer> kim0: Yeah
<kim0> http://paste.ubuntu.com/616696/
<niemeyer> hazmat: "install".. there's an old hook running still
<kim0> I hope I didn't do something stupid at the end :)
<niemeyer> kim0: I suspect your window 0 has the install hook running
<niemeyer> kim0: Can you please paste "env" from that window
<kim0> http://paste.ubuntu.com/616698/
<hazmat> niemeyer, window 0 is never used for hooks its .. its always a shell
<kim0> window 0 is always there
<kim0> yeah
<niemeyer> hazmat: It's trivial to shift windows around
<hazmat> niemeyer, but the names are distinct on the windows
<kim0> http://paste.ubuntu.com/616699/ is the install hook itself
<niemeyer> Ok, but that's not the case either way
<niemeyer> Still, we have a hook running
<niemeyer> hazmat: Ok
<hazmat> the debug stuff names the windows by hook , except window 0 which is named 'shell' afaicr
* kim0 nods
<niemeyer> kim0: What's in /tmp/tmpLjxVDG-install
<kim0> niemeyer: http://paste.ubuntu.com/616700/
<kim0> scary script
<hazmat> so it seems somehow the debug window was ended but the underlying debug process is still alive.
<niemeyer> hazmat: Yeah, it's still in the sleep loop
<niemeyer> hazmat: Which confirms your initial theory
<kim0> I probably closed the window too fast, if you think it needs time to do anything
<hazmat> it might be a different signal gets sent besides HUP that needs to be caught here
<hazmat> kim0, it shouldn't matter
<hazmat> we should never rely on user timing
<kim0> yeah I know
<niemeyer> hazmat: TERM, KILL
<niemeyer> hazmat: Wait.. the HUP is catching the outside signal
<niemeyer> hazmat: That's not the problem.. that script is still running
<hazmat> yeah.. its not in the screen process
<niemeyer> kim0: One more: /proc/1585/environ
<kim0> http://paste.ubuntu.com/616708/
<kim0> niemeyer: not sure why it has no newlines
<kim0> doh
<niemeyer> hazmat: We should monitor it from outside instead of expecting it to do stuff before it dies
<kim0> sorry .. pastebinit error
<niemeyer> kim0: That's the file format indeed
<kim0> niemeyer: http://paste.ubuntu.com/616709/
<kim0> this is complete
<niemeyer> hazmat: e.g. writing to hook.pid when the process starts
* kim0 probably just uncovered a pastebinit bug
<hazmat> niemeyer, yeah.. and then just doing something like kill -0 `cat hook.pid` for the sleep condition
<niemeyer> hazmat: RIght
* hazmat files a bug
<niemeyer> hazmat: Another handy issue for a brain breaker.. will paste that conversation in a bug.
<niemeyer> hazmat: Oh, ok :)
<niemeyer> hazmat: Please paste the log for context
<niemeyer> hazmat: Thanks
<niemeyer> kim0: Alright.. we know what's wrong
<kim0> great :)
<niemeyer> kim0: For fixing your problem right now,
<niemeyer> kim0: kill 1585
<kim0> got it
<kim0> thanks
<niemeyer> kim0: np

Related branches

Changed in ensemble:
milestone: none → dublin
importance: Undecided → High
Changed in ensemble:
status: New → In Progress
assignee: nobody → Gustavo Niemeyer (niemeyer)
Changed in ensemble:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.