i/o timeout from mongodb
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
A landscape-driven cloud deployment failed and we noticed this in our juju client logs:
Mar 14 04:55:55 juju-sync-1 INFO Handling failure RequestError: read tcp 10.96.15.100:37017: i/o timeout (code: '')
We didn't retry that, and filed bug #1556937 about it.
10.96.15.100 is the state server, and 37017 is mongo's port. We don't talk to mongo directly, so that was an internal juju connection.
machine-0.log ends with these three lines:
2016-03-14 04:19:51 ERROR juju.worker.
2016-03-14 04:55:55 ERROR juju.state status.go:216 failed to write status history: read tcp 10.96.15.100:37017: i/o timeout
2016-03-14 04:55:56 ERROR juju.state.
After that, all other units have error lines like this one:
unit-neutron-
Of note is that all-machines.log didn't get logs from all units, just one (!). I also spotted a rsyslog restart in /var/log/syslog:
Mar 14 04:55:59 albany rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="660130" x-info="http://
/var/log/syslog got quite big (over 300MB).
I'm attaching the relevant log files from the bootstrap node. This is from a CI job so I don't have the environment up still, but I do have logs from all units if you want them (https:/
tags: | removed: kanban-cross-team |
Changed in juju-core: | |
milestone: | 1.25.5 → 1.25.6 |
Changed in juju-core: | |
milestone: | 1.25.6 → 1.25.7 |
Changed in juju-core: | |
milestone: | 1.25.7 → none |
I think this is caused by bug 1539656 (which was fixed in 1.25.4). I can verify from checking the unit logs, but I don't have access to the CI job to check. Can you add me?