Client queues up lshw calls if talking to old server
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Client |
Fix Released
|
Critical
|
Thomas Herve | ||
landscape-client (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Lucid |
Fix Released
|
Undecided
|
Unassigned | ||
Natty |
Won't Fix
|
Undecided
|
Unassigned | ||
Oneiric |
Fix Released
|
Undecided
|
Unassigned | ||
Precise |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
This is an SRU request for landscape-client in lucid, natty, oneiric and precise.
Landscape is *NOT* invoking the SRU exception this time
(https:/
Explanation
===========
If a 12.04.X or 12.05 client is talking to a server which does not support the
hardware-info messages (like LDS 11.07.X), it will queue calls to /usr/bin/lshw in
memory, one per day. That in itself is not a big problem per se.
But if the server the client is talking to is then upgraded to a version that
does support hardware-info (like LDS 12.09), all those lshw calls will happen
at once, creating a storm of one lshw process per day it was talking to the old
server. This could be just a few processes, or hundreds, depending on how long
landscape-client was running and talking to the old server.
A restart will wipe that queue and is the recommended workaround: restart the
clients just before upgrading LDS.
Test case
=========
You will need one lucid machine and access to both LDS 11.07.X and 12.09. The test itself is simple, but the preparation requires some work.
- deploy LDS 11.07.X in quickstart mode
- install landscape-client 12.05 (can be on the same machine or another one with
network access to the LDS one)
- tweak the client code to invoke the hardware-info plugin every minute instead of
once per day. In the file /usr/share/
change run_interval to 60
- copy /usr/bin/lshw to /usr/bin/lshw.orig and change /usr/bin/lshw to this script:
"""
#!/bin/bash
date >> /tmp/lshw.log
/usr/bin/lshw.orig $@
"""
- run "lshw -xml" as root once, confirm that you get XML output and that /tmp/lshw.log is updated
- register the client with LDS. AFTER THIS, DO NOT RESTART THE CLIENT AGAIN FOR ANY REASON.
- in this scenario, lshw is not being called because the server does not support that information, but the client is accumulating the calls in memory.
- leave the client talking to the server for about 10min. Create a simple script activity to make sure it's working (like, send "ifconfig" to the client)
- after 10min, adjust the sources.list line to grab LDS 12.09 and issue apt-get update followed by apt-get dist-upgrade. The upgrade will take a few minutes. Answer any dpkg conf file question with "N", keeping the original file installed.
- keep tailing -f /tmp/lshw.log and /var/log/
- the broker log will get errors while LDS is down upgrading, that's normal
- once the client is able to talk to the server again, it will notice the
server version switch and log something like this:
"""
2012-09-19 19:41:43,139 INFO [MainThread] Accepted types changed: +hardware-info computer-info operation-result memory-info mount-info text-message network-activity mount-activity custom-graph active-process-info reboot-required change-
"""
- this is the moment where the *buggy* client will spawn a lot of lshw calls, like this:
"""
root@ubuntu:
tail: cannot open `/tmp/lshw-run.log' for reading: No such file or directory
tail: `/tmp/lshw-run.log' has appeared; following end of new file
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:43 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:44 UTC 2012
Wed Sep 19 19:41:45 UTC 2012
Wed Sep 19 19:41:45 UTC 2012
"""
- the *fixed* client will be more well behaved, with no such storm of lshw calls (note: log taken from a different test run, so timestamps won't match):
"""
root@ubuntu:~# tail -F /tmp/lshw-run.log
Wed Sep 19 20:19:23 UTC 2012
Wed Sep 19 20:20:21 UTC 2012
(and one per minute from here on)
"""
Regression potential
=======
The fix itself is unit tested, and I tested it in all ubuntu releases with the
attached debdiff and the test case above, confirming that the issue is gone.
The patch is very specific and affects hardware reporting only.
Related branches
- Chris Glass (community): Approve
- Björn Tillenius (community): Approve
-
Diff: 45 lines (+23/-2)2 files modifiedlandscape/manager/hardwareinfo.py (+5/-2)
landscape/manager/tests/test_hardwareinfo.py (+18/-0)
description: | updated |
Changed in landscape-client: | |
assignee: | nobody → Thomas Herve (therve) |
Changed in landscape-client: | |
status: | New → In Progress |
Changed in landscape-client: | |
status: | In Progress → Fix Committed |
tags: | added: rls-q-incoming |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: verification-done-precise |
tags: | removed: verification-needed |
tags: |
added: verification-done removed: verification-done-precise |
Changed in landscape-client: | |
milestone: | 12.09.1 → 12.11 |
Changed in landscape-client: | |
status: | Fix Committed → Fix Released |
Patch from therve confirmed working:
=== modified file 'landscape/ manager/ hardwareinfo. py' manager/ hardwareinfo. py 2012-03-15 11:13:13 +0000 manager/ hardwareinfo. py 2012-09-19 19:13:33 +0000 immediately = True
--- landscape/
+++ landscape/
@@ -13,10 +13,13 @@
run_
command = "/usr/bin/lshw"
+ def register(self, registry): registry) on_accepted( self.message_ type, self.send_message) on_accepted( self.message_ type, self.send_message) broker. call_if_ accepted(
+ super(HardwareInfo, self).register(
+ self.call_
+
def run(self):
- self.call_
return self.registry.
- self.message_type, self.send_message)
+ self.message_type, self.send_message)
def send_message(self):
result = getProcessOutput(