When the broker is restarted by the watchdog, the state of the client is inconsistent

Bug #380633 reported by Thomas Herve
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Landscape Client
Fix Released
High
Thomas Herve

Bug Description

The watchdog automatically restarts the broker if it dies for some reason. In this case, the client appears to work (some exchange happens, it pings the server correctly), but in fact the state of the system is pretty wrong:
 * manager mostly doesn't work
 * monitoring doesn't work
 * there is no package information upload.

What happens is that some calls (register_message, register_plugin) are bound the current running broker and are only done are start up: if the broker is restarted, they are not done again.

We need to tell the monitor and manager plugins that the broker has restarted, so that they registered themselves and the message types again.

Thomas Herve (therve)
Changed in landscape-client:
importance: Undecided → High
milestone: none → 1.0.30
Thomas Herve (therve)
Changed in landscape-client:
assignee: nobody → Thomas Herve (therve)
status: New → In Progress
tags: added: review
Changed in landscape-client:
milestone: 1.0.30 → 1.3.2
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Download full text (3.3 KiB)

I can't get this client to transmit messages to the server apparently:
==> /var/log/landscape/broker.log <==
2009-06-02 17:46:08,159 INFO [MainThread] Switching to normal exchange mode.
2009-06-02 17:46:08,171 INFO [MainThread] Accepted types changed: +computer-info +operation-result +memory-info +mount-info +mount-activity +custom-graph +active-process-info +client-uptime +change-packages-result +temperature +hardware-inventory +users +processor-info +load-average +packages +unknown-package-hashes +add-packages +computer-uptime +free-space +action-info +resynchronize +text-message +distribution-info test -register -register-cloud-vm
2009-06-02 17:46:08,200 INFO [MainThread] Message exchange completed in 6.79s.
2009-06-02 17:46:40,367 INFO [MainThread] Ping indicates message available. Scheduling an urgent exchange.
2009-06-02 17:47:08,303 INFO [MainThread] Starting urgent message exchange with https://landscape.canonical.com/message-system.
2009-06-02 17:47:10,529 INFO [MainThread] Ping indicates message available. Scheduling an urgent exchange.
2009-06-02 17:47:21,664 INFO [Dummy-2 ] Sent 147434 bytes and received 149 bytes in 13.33s.
2009-06-02 17:47:21,669 INFO [MainThread] Switching to normal exchange mode.
2009-06-02 17:47:21,671 INFO [MainThread] Pending messages remain after the last exchange.
2009-06-02 17:47:21,671 INFO [MainThread] Message exchange completed in 13.37s.

Notice the "Pending messages remain after the last exchange." message. It keeps showing in the logs, and nothing seems to be uploaded to the server. No users, processes or packages.

The client queue is just getting bigger:
root@nsn2:~# l /var/lib/landscape/client/messages/0/
total 212K
drwx------ 2 landscape nogroup 4.0K 2009-06-02 17:55 .
drwxr-xr-x 3 landscape root 14 2009-06-02 17:44 ..
-rw------- 1 landscape nogroup 166 2009-06-02 17:44 0
-rw------- 1 landscape nogroup 143 2009-06-02 17:46 1
-rw------- 1 landscape nogroup 1.4K 2009-06-02 17:55 10
-rw------- 1 landscape nogroup 197 2009-06-02 17:55 11
-rw------- 1 landscape nogroup 150 2009-06-02 17:55 12
-rw------- 1 landscape nogroup 129 2009-06-02 17:55 13
-rw------- 1 landscape nogroup 852 2009-06-02 17:55 14
-rw------- 1 landscape nogroup 241 2009-06-02 17:55 15
-rw------- 1 landscape nogroup 134 2009-06-02 17:55 16
-rw------- 1 landscape nogroup 135 2009-06-02 17:55 17
-rw------- 1 landscape nogroup 329 2009-06-02 17:46 2
-rw------- 1 landscape nogroup 163 2009-06-02 17:46 3
-rw------- 1 landscape nogroup 24K 2009-06-02 17:46 4
-rw------- 1 landscape nogroup 99K 2009-06-02 17:46 5
-rw------- 1 landscape nogroup 7.9K 2009-06-02 17:46 6
-rw------- 1 landscape nogroup 12K 2009-06-02 17:46 7
-rw------- 1 landscape nogroup 961 2009-06-02 17:46 8
-rw------- 1 landscape nogroup 12K 2009-06-02 17:54 9

This is the last exchange that happened:
==> /var/log/landscape/broker.log <==
2009-06-02 17:55:28,864 INFO [MainThread] Starting urgent message exchange with https://landscape.canonical.com/message-system.
2009-06-02 17:55:42,073 INFO [Dummy-2 ] Sent 162799 bytes and received 149 bytes in 13.17s.
2009-06-02 17:55:42,089 INFO [MainThread] S...

Read more...

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Oh, scrap that. This client branch is too new for production which is where I was pointing it at.

Thomas Herve (therve)
description: updated
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Only one minor, and +1! Thanks for the nice fix.

[1]

+ def broker_started(self):

This method in BrokerClientPluginRegistry is missing a unittest I think.

Revision history for this message
Thomas Herve (therve) wrote :

Gustavo: thanks, test added!

Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote : Re: [Bug 380633] Re: When the broker is restarted by the watchdog, the state of the client is inconsistent

It looks nice indeed :) +1

#1

+ makes the Manager plugin register all registered message again.
+ makes the Monitor plugin register all registered message again.

This "message" should probably be "messages" (plural).

Thanks!

Free

 tag -review

Revision history for this message
Thomas Herve (therve) wrote :

Thanks, merged in r112!

Changed in landscape-client:
status: In Progress → Fix Committed
Jamu Kakar (jkakar)
tags: added: needs-testing
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

qa + 1

Tested by killing the broker with -9 and sending a script activity afterwards. The script would fail with a previous version. Also tested by "hanging" the broker with gdb and waiting for landscape itself to kill it:

==> /var/log/landscape/watchdog.log <==
2009-06-24 21:52:33,454 WARNING [MainThread] landscape-broker failed to respond to a ping.
2009-06-24 21:53:03,454 WARNING [MainThread] landscape-broker failed to respond to a ping.
2009-06-24 21:53:33,455 WARNING [MainThread] landscape-broker failed to respond to a ping.
2009-06-24 21:54:03,455 WARNING [MainThread] landscape-broker failed to respond to a ping.
2009-06-24 21:54:33,456 WARNING [MainThread] landscape-broker failed to respond to a ping.
2009-06-24 21:54:33,456 WARNING [MainThread] landscape-broker died! Restarting.

The script activity also worked fine afterwards.

tags: removed: needs-testing
Changed in landscape-client:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.