Tempest ssh to guest intermittently fails, "GROWROOT: NOCHANGE: partition 1 is size 2078687. it cannot be grown" seen in guest console log

Bug #1843610 reported by Matt Riedemann on 2019-09-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Gate
Undecided
Unassigned

Bug Description

Seen here for example:

https://00b9edcb114e0ac8e05a-b611493cf8fd4459149d00d14c03b361.ssl.cf5.rackcdn.com/670715/15/check/cinder-tempest-dsvm-lvm-lio-barbican/bb230c6/job-output.txt

2019-09-11 11:50:09.034110 | primary | sh: write error: No space left on device
2019-09-11 11:50:09.034181 | primary | Top of dropbear init script
2019-09-11 11:50:09.034251 | primary | Starting dropbear sshd: OK
2019-09-11 11:50:09.034376 | primary | GROWROOT: NOCHANGE: partition 1 is size 2078687. it cannot be grown
2019-09-11 11:50:09.034456 | primary | resize-rootfs already run per once
2019-09-11 11:50:09.034579 | primary | /run/cirros/datasource/data/user-data was not '#!' or executable

Note that this might not be the reason for the ssh failure into the guest, we could be hitting this in successful runs as well but only see this on ssh failure because that's when we dump the console log. Note that the network info was retrieved:

2019-09-11 11:50:30.311189 | primary | === network info ===
2019-09-11 11:50:30.311262 | primary | if-info: lo,up,127.0.0.1,8,,
2019-09-11 11:50:30.311377 | primary | if-info: eth0,up,10.1.0.14,28,fe80::f816:3eff:fec5:b98b/64,
2019-09-11 11:50:30.311465 | primary | ip-route:default via 10.1.0.1 dev eth0
2019-09-11 11:50:30.311561 | primary | ip-route:10.1.0.0/28 dev eth0 src 10.1.0.14
2019-09-11 11:50:30.311659 | primary | ip-route:169.254.169.254 via 10.1.0.1 dev eth0
2019-09-11 11:50:30.311749 | primary | ip-route6:fe80::/64 dev eth0 metric 256
2019-09-11 11:50:30.311864 | primary | ip-route6:unreachable default dev lo metric -1 error -101
2019-09-11 11:50:30.311952 | primary | ip-route6:ff00::/8 dev eth0 metric 256
2019-09-11 11:50:30.312068 | primary | ip-route6:unreachable default dev lo metric -1 error -101

We should, however, attempt to get rid of that growroot error so it's not a red herring in debugging.

19 hits in 7 days, check and gate, all failures:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22GROWROOT%3A%20NOCHANGE%3A%20partition%201%20is%20size%5C%22%20AND%20message%3A%5C%22it%20cannot%20be%20grown%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

Matt Riedemann (mriedem) on 2019-09-11
description: updated
Clark Boylan (cboylan) wrote :

Note we only dump console logs during failures. It is possible that this happens on successful jobs too and isn't the cause of these failures (we just don't have that data).

That said I think fixing errors like this (the job in question should have a 1GB boot from volume disk) is likely to fix bugs and avoid distracting errors when debugging underlying issues.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers