Adding a new tag with empty string as the definition parameter causes maas-cli to hang

Bug #1064737 reported by Diogo Matsubara
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Jeroen T. Vermeulen
maas (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

$ maas-cli maas tags new name=foobar23 comment='foo bar' definition=' '

Note the two whitespace in the definition argument. This command hangs and doesn't return

Tags: api cli

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 12.10
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I can reproduce this in a packaged installation, but not in a dev branch.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The region controller's Apache logs the API call, but then the operation gets stuck on the server side before the call even gets to TagsHandler. It's not the space that does it, or even the “new”: same thing happens for “maas-cli maas tags list” for example.

Apache's error.log shows this for each API call:

«
[error] [client 10.0.0.99] Premature end of script headers: wsgi.py
[alert] (2)No such file or directory: mod_wsgi (pid=12345): Unable to change working directory to '/home/maas'.
»

Indeed there is no /home/maas. Creating it (owned by maas:maas of course) fixes the “maas-cli maas tags list” but the tag-creation command still hangs — as long as the space is there. If I leave out the space, the API considers that a missing argument and errors out immediately.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

A note of caution. The wsgi process eventually times out and dies, to get replaced by a fresh one. But while you've got each of the wsgi processes hanging, further experiments may be misleading because you get timeouts caused by earlier calls, not by whatever you're doing in the experiment. Avoid interrupting hanging calls with ctrl-C; let it time out first so you don't fall into this trap.

Each call that dies in this way logs “Premature end of script headers: wsgi.py” in the Apache error log.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

And so it was still only the whitespace definition that got stuck, and the call did reach TagsHandler after all.

With that lesson learned, I now see the call get stuck in TagForm.clean_definition, at:

    etree.XPath(definition)

—even though the exact same call run in a bare python interpreter instantly produces the expected lxml.etree.XPathSyntaxError. (The “cleaned” version of the definition is still the original string of whitespace).

Execution goes into machine code at this point. When I run just this line in a python interpreter, strace shows no system calls at all before the call starts generating a backtrace.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I get the same problem when I submit other malformed tag definitions. Looking through the lxml source I find myself suspecting the logging code.

Changed in maas:
assignee: nobody → Jeroen T. Vermeulen (jtv)
status: Triaged → In Progress
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

There is no busy-waiting: the lxml call is properly waiting. Here's what I get when I strace the application at this point:

31418 futex(0xb93c0248, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
31417 <... select resumed> ) = 0 (Timeout)
31417 futex(0xb93c0248, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
31416 <... select resumed> ) = 0 (Timeout)

(The last line, and the <unfinished> before it, only seem to appear when the process times out).

The lxml code has some locking that only happens when threading is enabled, so that may explain why I didn't see this when I straced just the call in a python interpreter. The wsgi process does indeed run with threads, and if what we see here is multiple threads vying for the same lock, there may be a deadlock condition.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Yup, that seems to be the problem. Applying one of the suggested workarounds from the article stops the lockups.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas (Ubuntu):
status: New → Confirmed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.