Slave went offline during log upload
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Core Infrastructure |
Fix Released
|
High
|
Jeremy Stanley |
Bug Description
Encountered this issue, where it seems like a node went offline during log upload:
https:/
http://
2014-01-13 14:50:16.795 | PASSED (id=0)
2014-01-13 14:50:16.795 | Slowest Tests
2014-01-13 14:50:16.795 | Test id Runtime (s)
2014-01-13 14:50:16.795 | -------
2014-01-13 14:50:16.796 | tempest.
2014-01-13 14:50:16.835 | _______
2014-01-13 14:50:16.835 | large-ops: commands succeeded
2014-01-13 14:50:16.835 | congratulations :)
2014-01-13 14:50:16.836 | /opt/stack/
2014-01-13 14:50:42.221 | Process leaked file descriptors. See http://
2014-01-13 14:50:42.789 | [gate-tempest-
2014-01-13 14:50:42.824 | Detailed logs: http://
2014-01-13 14:50:43.237 | Looks like the node went offline during the build. Check the slave log for the details.FATAL: /var/lib/
2014-01-13 14:50:43.237 | java.io.
2014-01-13 14:50:43.237 | at java.io.
2014-01-13 14:50:43.238 | at java.io.
2014-01-13 14:50:43.238 | at org.kohsuke.
2014-01-13 14:50:43.238 | at org.kohsuke.
2014-01-13 14:50:43.238 | at org.kohsuke.
2014-01-13 14:50:43.238 | at hudson.
2014-01-13 14:50:43.238 | at hudson.
2014-01-13 14:50:43.239 | at hudson.
2014-01-13 14:50:43.239 | at hudson.
2014-01-13 14:50:43.239 | at hudson.
2014-01-13 14:50:43.239 | at hudson.
Changed in openstack-ci: | |
status: | New → Confirmed |
Changed in openstack-ci: | |
status: | Confirmed → In Progress |
importance: | Undecided → Critical |
assignee: | nobody → Jeremy Stanley (fungi) |
milestone: | none → icehouse |
no longer affects: | heat |
Caught one of these on a full tempest run for a heat change...
http:// logs.openstack. org/35/ 68135/1/ gate/gate- tempest- dsvm-full/ b17cdc5/ console. html#_2014- 01-23_16_ 26_41_918
I tried to hold the offending node but deletion was already in progress so it got yanked out from under me before I could spot any obvious local issue on it. THEN I spotted this in the log of the corresponding Jenkins master...
Jan 23, 2014 4:25:17 PM hudson. node_monitors. AbstractDiskSpa ceMonitor markNodeOffline IfDiskspaceIsTo oLow precise- hpcloud- az2-1187454 offline temporarily due to the lack of disk space
WARNING: Making devstack-
Since this is happening between tests completing and log collection, I think we're filling up the remaining space on the very constrained root filesystem on these nodes when copying logs from /opt to the workspace in /home/jenkins. I'm trying to address it with...
https:/ /review. openstack. org/68706