Comment 3 for bug 960107

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

This is implemented it looks a little different from what we had before.

Unlike the old system which only tried to capture errors, this captures any logging statements. Logging statements expire automatically after 7 days (which is configurable) so we don't have to worry about the log growing infinitely large. Currently there aren't a lot of logging statements in Karl, so mostly this will wind up capturing errors, but this gives us the flexibility to start dumping performance and other data into here and have the possibility of writing tools to pull the data back out and analyze it. When an error is logged, the error triggers an alarm which puts the log into an alarm state which must be cleared by a human. This is very similar to how the old system worked, except that here you can clear the alarm state without erasing the log.

In the admin interface, there is now a 'System Log' link in the sidebar which takes you to the new view. This view currently kind of sucks but it is functional, at least. The view displays the 100 most recent entries all on a single page. Each entry has a 'Details' link that will show/hide more detailed info, like the stack trace for errors. Entries are divided into levels and categories. Levels are things like, INFO, WARNING, ERROR. Categories are things like karl, mailin, gsa_sync, etc... There are links at the top to narrow the display to a particular level and/or a particular category. The links display only levels/categories that the redislog has seen before. The redislog is not aware, a priori, which levels and categories it might see, so it just keeps track of the ones it has seen.

If an alarm is set for a particular category, that will be visible at the top of the screen along with a link to clear the alarm. The alarm should impact gocept's nagios monitoring in the same way as the old system.