Changing "agent-logfile-max-backups" and "agent-logfile-max-size" doesn't work

Bug #1997022 reported by Marco Marino
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

Hello,
it seems that there is a problem when we try to change the above-mentioned parameters:

Env details:
3 juju controllers in HA, version 2.9.35 on Ubuntu Focal.

I changed them with:
juju controller-config agent-logfile-max-size="108M"
juju controller-config agent-logfile-max-backups=5

Then, I checked the status with:

ubuntu@marino-mrc-bastion:~$ juju controller-config
Attribute Value
agent-logfile-max-backups "5"
agent-logfile-max-size 108M
api-port 17070
api-port-open-delay 2s
audit-log-capture-args false
audit-log-exclude-methods ReadOnlyMethods
audit-log-max-backups 10
audit-log-max-size 300M
auditing-enabled true
batch-raft-fsm false
ca-cert |
  -----BEGIN CERTIFICATE-----
### Omitted output
  -----END CERTIFICATE-----
charmstore-url https://api.jujucharms.com/charmstore
controller-name focal-controller
controller-uuid 5f048128-3119-4b4b-825c-2bd64fbfaec9
juju-db-snap-channel 4.4/stable
max-agent-state-size 524288
max-charm-state-size 2.097152e+06
max-debug-log-duration 24h0m0s
max-prune-txn-batch-size 1e+06
max-prune-txn-passes 100
max-txn-log-size 10M
metering-url https://api.jujucharms.com/omnibus/v3
migration-agent-wait-time 15m
model-logfile-max-backups 2
model-logfile-max-size 105M
model-logs-size 20M
mongo-memory-profile default
non-synced-writes-to-raft-log false
prune-txn-query-count 1000
prune-txn-sleep-time 10ms
set-numa-control-policy false
state-port 37017

Everything looks good, but I noticed that the changes are not propagated to the agent configuration file and also, one of my controllers has a '0' value for both parameters (after the change):

ubuntu@marino-mrc-bastion:~$ juju ssh -m controller 2 "sudo cat /var/lib/juju/agents/*/agent.conf | grep agent-logfile"
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
Connection to 252.3.44.1 closed.

ubuntu@marino-mrc-bastion:~$ juju ssh -m controller 1 "sudo cat /var/lib/juju/agents/*/agent.conf | grep agent-logfile"
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
Connection to 252.2.249.1 closed.

ubuntu@marino-mrc-bastion:~$ juju ssh -m controller 0 "sudo cat /var/lib/juju/agents/*/agent.conf | grep agent-logfile"
agent-logfile-max-size: 0
agent-logfile-max-backups: 0
Connection to 10.5.2.219 closed.

Also, I checked some units (randomly) in the "openstack" model and I noticed that the agents.conf file still contains the default value:

ubuntu@marino-mrc-bastion:~$ juju ssh nova-compute/0 "sudo cat /var/lib/juju/agents/*/agent.conf | grep agent-logfile"
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
Connection to 10.5.1.217 closed.

ubuntu@marino-mrc-bastion:~$ juju ssh ceph-osd/0 "sudo cat /var/lib/juju/agents/*/agent.conf | grep agent-logfile"
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
agent-logfile-max-size: 100
agent-logfile-max-backups: 2
Connection to 252.2.199.1 closed.

Please, help to fix this.
Also, do we have a workaround in order to change those values on all units? Having them =0 seems a big problem in terms of log rotation.

Additional notes:
1. I think all parameters in the following list could be impacted in some way:

agent-logfile-max-backups
agent-logfile-max-size
model-logfile-max-backups
model-logfile-max-size
model-logs-size

2. I did some tests on a single-controller environment and it seems the problem persists. Even if I don't have the value '0' for the parameters mentioned above, the new value is not propagated on the controller and other units (only changed in the DB apparently)

Thank you.
Regards,
Marco

Tags: seg sts
summary: Changing "agent-logfile-max-backups" and "agent-logfile-max-size"
- doesn't work in a HA env
+ doesn't work
tags: added: sts
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :
Download full text (3.7 KiB)

Another sub-problem.

It seems when modifying the value via `juju controller-config` the value `agent-logfile-max-size` is stored as a string in the db which is causing problem.

Reproducer:
1. Bootstrap a controller and enable HA
2. juju controller-config agent-logfile-max-backups=3 # default is 2
3. Reboot one of the controllers and it won't come back (the controller machine is ok)

The machine agent tries start and repeatedly crashes as it can't parse `agent-logfile-max-size`:
```
goroutine 492 [running]:
github.com/juju/juju/controller.Config.mustInt(0x5719d60?, {0x58e56c9, 0x19})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/controller/config.go:568 +0xf8
github.com/juju/juju/controller.Config.intOrDefault(...)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/controller/config.go:575
github.com/juju/juju/controller.Config.AgentLogfileMaxBackups(...)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/controller/config.go:867
github.com/juju/juju/apiserver.newServer({{0x623d8c8, 0x9191b20}, {0x0, 0x0}, {0x6233900, 0xc0007fd030}, {0xc0002ac0c0, 0xd}, {0xc0002ac150, 0xd}, ...})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/apiserver.go:412 +0xc32
github.com/juju/juju/apiserver.NewServer({{0x623d8c8, 0x9191b20}, {0x0, 0x0}, {0x6233900, 0xc0007fd030}, {0xc0002ac0c0, 0xd}, {0xc0002ac150, 0xd}, ...})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/apiserver.go:289 +0xf8
github.com/juju/juju/worker/apiserver.newServerShim({{0x623d8c8, 0x9191b20}, {0x0, 0x0}, {0x6233900, 0xc0007fd030}, {0xc0002ac0c0, 0xd}, {0xc0002ac150, 0xd}, ...})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/worker/apiserver/worker.go:183 +0x58
github.com/juju/juju/worker/apiserver.NewWorker({{0x6273e50, 0xc000ac24e0}, {0x623d8c8, 0x9191b20}, 0xc0007e2600, {0x6250a60, 0xc0000d8800}, 0xc0004a9b00, {0x6215f08, 0xc0007f3e60}, ...})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/worker/apiserver/worker.go:179 +0x4d9
github.com/juju/juju/worker/apiserver.ManifoldConfig.start({{0x587ba94, 0x5}, {0x58a6d86, 0x10}, {0x587bc7e, 0x5}, {0x588ef51, 0xb}, {0x5893781, 0xc}, ...}, ...)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/worker/apiserver/manifold.go:243 +0x843
github.com/juju/juju/cmd/jujud/agent/engine.flagStart.func1({0x6216b38, 0xc000c3c480})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/cmd/jujud/agent/engine/housing.go:108 +0xb6
github.com/juju/worker/v3/dependency.(*Engine).runWorker.func1()
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/vendor/github.com/juju/worker/v3/dependency/engine.go:518 +0x399
github.com/juju/worker/v3/dependency.(*Engine).runWorker.func2()
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/vendor/github.com/juju/worker/v3/dependency/engine.go:522 +0x53
github.com/juju/worker/v3/dependency.(*Engine).runWorker(0xc00036e480, {0x5889d2e, 0xa}, 0x18da6d0?, 0xc000af2300?, 0xc000c3c480)
        /home/jenkins/workspace/build-juju/build/src/github.com/juj...

Read more...

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

A workaround to #1 to get the controller back is to change the value in the db itself.

```
db.controllers.updateOne({ "_id": "controllerSettings" }, { $set: { "settings.agent-logfile-max-backups": 2, }})
```

But this requires access to mongodb which may not be available once the controller(s) get into that restart loop. There might be ways to get this set in the db but this needs some time and this could cause serious outage especially in non-HA environments.

tags: added: seg
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9.38
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Harry Pidcock (hpidcock) wrote :
Revision history for this message
Harry Pidcock (hpidcock) wrote :

I have extracted the panic to https://bugs.launchpad.net/juju/+bug/2001732 and keep this one here as the refreshing of agent-logfile-* config is a non-critical, as these values currently don't take affect until a restart.

Changed in juju:
importance: Critical → Medium
milestone: 2.9.38 → 2.9-next
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9-next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.