Guest VM FS corruption after compute host reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
James Polley |
Bug Description
Rebooted NovaCompute0 which caused the guest vm to fail to become pingable (FS corruption).
nova list
+------
| ID | Name | Status | Task State | Power State | Networks |
+------
| 04a7f53f-
| 2267f56c-
| 6b527a34-
+------
nova reboot 04a7f53f-
From the console log of the guest vm hosted on NovaCompute0
ci-info: +++++++
ci-info: +------
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +------
ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.0.0 | 0.0.0.0 | 255.0.0.0 | eth0 | U |
ci-info: +------
[ 143.350298] EXT4-fs error (device vda1): ext4_find_
[ 143.393895] EXT4-fs error (device vda1): ext4_find_
[ 143.408435] EXT4-fs error (device vda1): ext4_find_
* Starting AppArmor profiles [80G Skipping profile in /etc/apparmor.
[74G[ OK ]
* Starting iSCSI initiator service iscsid [80G [74G[ OK ]
* Setting up iSCSI targets [80G
iscsiadm: No records found
[74G[ OK ]
* Mounting network filesystems [80G [74G[ OK ]
landscape-client is not configured, please run landscape-config.
Cloud-init v. 0.7.3 running 'modules:config' at Wed, 07 May 2014 10:14:00 +0000. Up 155.44 seconds.
* Restoring resolver state... [80G [74G[ OK ]
grub-editenv: error: invalid environment block.
Cloud-init v. 0.7.3 running 'modules:final' at Wed, 07 May 2014 10:14:20 +0000. Up 176.30 seconds.
Cloud-init v. 0.7.3 finished at Wed, 07 May 2014 10:14:22 +0000. Datasource DataSourceEc2. Up 178.27 seconds
Changed in tripleo: | |
importance: | Undecided → Critical |
Changed in tripleo: | |
status: | New → Triaged |
Changed in tripleo: | |
assignee: | nobody → Roman Podoliaka (rpodolyaka) |
Changed in tripleo: | |
assignee: | Roman Podoliaka (rpodolyaka) → nobody |
Changed in tripleo: | |
assignee: | nobody → James Polley (tchaypo) |
This seems odd. If a clean reboot is done, the OS should send libvirt a SIGTERM, which should then send a clean shutdown to all of the instances. We definitely need to investigate.