[2.5] MAAS services may attempt to log to dead maas-syslog service, causing dropped log messages

Bug #1801389 reported by Ashley Lai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Blake Rouse

Bug Description

Maas release: 2.5.0~beta4-7361-g401d6c73d-0ubuntu1~18.04.1

Some Pod VMs show in "Ready" state in syslog but not showing "Ready" state in maas.log. We need help on figuring out why the state is not showing in the maas log and when querying the state it returns "Commissioning".

Below are the Pod VMs:
Status: {u'landscapeamqp-2': u'Ready', u'landscapeha-1': u'Commissioning', u'landscape-1': u'Ready', u'landscapeamqp-1': u'Ready', u'juju-1': u'Ready', u'juju-3': u'Ready', u'grafana-1': u'Commissioning', u'prometheus-3': u'Ready', u'landscapesql-1': u'Ready', u'landscapesql-2': u'Ready', u'juju-2': u'Ready', u'graylog-3': u'Commissioning', u'elastic-3': u'Ready', u'elastic-2': u'Ready', u'landscape-3': u'Ready', u'landscapeamqp-3': u'Ready', u'thruk-2': u'Commissioning', u'nagios-1': u'Commissioning', u'landscape-2': u'Ready'}

landscapeha-1 shows "Ready" in syslog only:
alai@ant:~/Downloads/tmp/home/ubuntu/project/generated/maas/logs-2018-10-31-21.44.55$ grep -r landscapeha-1 | grep READY
10.244.40.32/var/log/syslog:Oct 31 06:53:06 swoobat maas.node: [info] landscapeha-1: Status transition from TESTING to READY
10.244.40.32/var/log/syslog:Oct 31 07:48:02 swoobat maas.node: [info] landscapeha-1: Status transition from READY to ALLOCATED
10.244.40.32/var/log/syslog:Oct 31 12:44:45 swoobat maas.node: [info] landscapeha-1: Status transition from TESTING to READY
10.244.40.32/var/log/syslog:Oct 31 13:38:47 swoobat maas.node: [info] landscapeha-1: Status transition from READY to ALLOCATED
10.244.40.32/var/log/syslog:Oct 31 17:07:20 swoobat maas.node: [info] landscapeha-1: Status transition from TESTING to READY
10.244.40.32/var/log/syslog:Oct 31 18:01:51 swoobat maas.node: [info] landscapeha-1: Status transition from READY to ALLOCATED

For Pod VMs that worked, both syslog and maas.log show they are in "Ready" state:
alai@ant:~/Downloads/tmp/home/ubuntu/project/generated/maas/logs-2018-10-31-21.44.55$ grep -r landscapeamqp-2 | grep READY
10.244.40.30/var/log/syslog:Oct 31 06:54:58 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY
10.244.40.30/var/log/syslog:Oct 31 12:47:18 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY
10.244.40.30/var/log/syslog:Oct 31 17:08:59 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY
10.244.40.30/var/log/maas/maas.log:2018-10-31T21:26:05.789133+00:00 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY
10.244.40.31/var/log/maas/maas.log:2018-10-31T21:26:05+00:00 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY
10.244.40.32/var/log/syslog:Oct 31 07:47:44 swoobat maas.node: [info] landscapeamqp-2: Status transition from READY to ALLOCATED
10.244.40.32/var/log/syslog:Oct 31 13:38:43 swoobat maas.node: [info] landscapeamqp-2: Status transition from READY to ALLOCATED
10.244.40.32/var/log/syslog:Oct 31 18:01:33 swoobat maas.node: [info] landscapeamqp-2: Status transition from READY to ALLOCATED
10.244.40.32/var/log/maas/maas.log:2018-10-31T21:26:05+00:00 leafeon maas.node: [info] landscapeamqp-2: Status transition from TESTING to READY

Tags: cdo-qa

Related branches

Revision history for this message
Ashley Lai (alai) wrote :
Ashley Lai (alai)
summary: - Pod VMs don't return "READY" state
+ maas-rackd logs maas-syslog that is not up
Revision history for this message
Mike Pontillo (mpontillo) wrote : Re: maas-rackd logs maas-syslog that is not up

If I understand the issue correctly, maas-syslog service isn't logging everything it should to maas.log, but the logging is (correctly) present in syslog.

Given that the expected behavior is now that `maas.log` will contain logging from every MAAS region[1], this feels to me like a glitch in the maas-syslog service. Can you attach attach the journalctl output for maas-syslog, such as by running:

    journalctl -u maas-syslog > maas-syslog.log

... and attach the output?

[1]:
https://github.com/maas/maas/commit/fe00d1b7b6ba82e78cb43c927242a34431c6fb42

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I grepped through the attached logs, and found some evidence that the syslog service is misbehaving:

https://paste.ubuntu.com/p/jgHNKHDm3T/

I don't see anything that corresponds to the exact time you see the missing log output, though.

I don't think we have enough information to move forward with this one. We need to narrow down how potential race conditions in the maas-syslog service might cause logging to be dropped - and what those conditions are.

Changed in maas:
status: New → Incomplete
summary: - maas-rackd logs maas-syslog that is not up
+ [2.5] MAAS services may attempt to log to dead maas-syslog service,
+ causing dropped log messages
Revision history for this message
Blake Rouse (blake-rouse) wrote :

The issue is that on a fresh installation the maas-syslog service is not running yet. The regiond or rackd will then try to use the syslog socket that maas-syslog should create and listen on, but with it not being up yet its failing.

The code needs to handle delaying the writting of the syslog until the maas-syslog service is up and running, which MAAS actually configures and starts.

Its a chicken and egg problem really, but we can solve it by handling the failure case in MAAS better.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
milestone: none → 2.5.0rc1
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.