Failed to boot overcloud compute node due to sfdisk error

Bug #1326172 reported by Dan Prince
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Expired
Medium
Unassigned

Bug Description

Saw this trace in the seed VM trying to deploy an Overcloud compute instance which failed ( the instance went into the ERROR state).

This was on Fedora 20:

Jun 3 18:22:36 localhost kernel: [ 1883.060983] sd 9:0:0:1: [sdb] Synchronizing SCSI cache
Jun 3 18:22:36 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:36.898 3523 ERROR nova.virt.baremetal.deploy_helper [-] Cmd : sudo nova-rootwrap /etc/nova/rootwrap.conf sfdisk -uM /dev/disk/by-path/ip-192.0.2.5:3260-iscsi-iqn-1740ad5e-4d96-482f-a927-c0439faa07bc-lun-1
Jun 3 18:22:36 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:36.899 3523 ERROR nova.virt.baremetal.deploy_helper [-] StdOut : ''
Jun 3 18:22:37 localhost kernel: [ 1883.146980] quiet_error: 163 callbacks suppressed
Jun 3 18:22:37 localhost kernel: [ 1883.147419] Buffer I/O error on device sdb3, logical block 0
Jun 3 18:22:37 localhost kernel: [ 1883.148444] Buffer I/O error on device sdb3, logical block 2
Jun 3 18:22:37 localhost kernel: [ 1883.148857] Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: [ 1883.149373] Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: [ 1883.149770] Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: [ 1883.150189] Buffer I/O error on device sdb3, logical block 2
Jun 3 18:22:37 localhost kernel: [ 1883.150555] Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: [ 1883.150968] Buffer I/O error on device sdb3, logical block 8
Jun 3 18:22:37 localhost kernel: [ 1883.151415] Buffer I/O error on device sdb1, logical block 126
Jun 3 18:22:37 localhost kernel: [ 1883.151796] Buffer I/O error on device sdb1, logical block 126
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:36.899 3523 ERROR nova.virt.baremetal.deploy_helper [-] StdErr : 'sfdisk: Checking that no-one is using this disk right now ...\nsfdisk: BLKRRPART: Device or resource busy\nsfdisk: \nThis disk is currently in use - repartitioning is probably a bad idea.\nUmount all file systems, and swapoff all swap partitions on this disk.\nUse the --no-reread flag to suppress this check.\nsfdisk: Use the --force flag to overrule all checks.\n'
Jun 3 18:22:37 localhost kernel: sd 9:0:0:1: [sdb] Synchronizing SCSI cache
Jun 3 18:22:37 localhost systemd-udevd: inotify_add_watch(7, /dev/sdb3, 10) failed: No such file or directory
Jun 3 18:22:37 localhost systemd-udevd: inotify_add_watch(7, /dev/sdb1, 10) failed: No such file or directory
Jun 3 18:22:37 localhost systemd-udevd: inotify_add_watch(7, /dev/sdb, 10) failed: No such file or directory
Jun 3 18:22:37 localhost systemd-udevd: inotify_add_watch(7, /dev/sdb, 10) failed: No such file or directory
Jun 3 18:22:37 localhost systemd-udevd: inotify_add_watch(7, /dev/sdb, 10) failed: No such file or directory
Jun 3 18:22:37 localhost kernel: quiet_error: 163 callbacks suppressed
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 0
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 2
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 2
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 16
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb3, logical block 8
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb1, logical block 126
Jun 3 18:22:37 localhost kernel: Buffer I/O error on device sdb1, logical block 126
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 ERROR nova.virt.baremetal.deploy_helper [-] deployment to node 2 failed
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper Traceback (most recent call last):
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/cmd/baremetal_deploy_helper.py", line 289, in run
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper deploy(**params)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/cmd/baremetal_deploy_helper.py", line 254, in deploy
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper LOG.error(_("StdErr : %r"), err.stderr)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 82, in __exit__
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper six.reraise(self.type_, self.value, self.tb)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/cmd/baremetal_deploy_helper.py", line 248, in deploy
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper image_path, preserve_ephemeral)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/cmd/baremetal_deploy_helper.py", line 212, in work_on_disk
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper make_partitions(dev, root_mb, swap_mb, ephemeral_mb)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/cmd/baremetal_deploy_helper.py", line 106, in make_partitions
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper check_exit_code=[0])
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/utils.py", line 164, in execute
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper return processutils.execute(*cmd, **kwargs)
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/processutils.py", line 194, in execute
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper cmd=' '.join(cmd))
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper ProcessExecutionError: Unexpected error while running command.
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper Command: sudo nova-rootwrap /etc/nova/rootwrap.conf sfdisk -uM /dev/disk/by-path/ip-192.0.2.5:3260-iscsi-iqn-1740ad5e-4d96-482f-a927-c0439faa07bc-lun-1
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper Exit code: 1
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper Stdout: ''
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper Stderr: 'sfdisk: Checking that no-one is using this disk right now ...\nsfdisk: BLKRRPART: Device or resource busy\nsfdisk: \nThis disk is currently in use - repartitioning is probably a bad idea.\nUmount all file systems, and swapoff all swap partitions on this disk.\nUse the --no-reread flag to suppress this check.\nsfdisk: Use the --force flag to overrule all checks.\n'
Jun 3 18:22:37 localhost nova-baremetal-deploy-helper: 2014-06-03 18:22:37.554 3523 TRACE nova.virt.baremetal.deploy_helper
Jun 3 18:22:37 localhost nova-compute: 2014-06-03 18:22:37.952 3575 ERROR nova.virt.baremetal.driver [req-76b7a09d-41f0-4f65-8681-f2183703c7cc None] Error deploying instance 1740ad5e-4d96-482f-a927-c0439faa07bc on baremetal node c33f54fb-628e-40e3-aeea-e096cc01458c.
Jun 3 18:22:38 localhost iscsid: Connection2:0 to [target: iqn-1740ad5e-4d96-482f-a927-c0439faa07bc, portal: 192.0.2.5,3260] through [iface: default] is shutdown.

Dan Prince (dan-prince)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Richard Su (rwsu) wrote :
Changed in tripleo:
importance: Low → Medium
James Polley (tchaypo)
Changed in tripleo:
status: Triaged → Incomplete
Revision history for this message
James Polley (tchaypo) wrote :

This looks like a real issue, but I can't see any action we could take on it right now.

Logstash isn't finding me any "Buffer I/O error on device " messages recently, nor any messages mentining sfdisk.

Logs for https://review.openstack.org/#/c/107447/ don't seem to be available any longer so I can't check what happened there.

I'm marking this as incomplete, because it seems to me as though it's a real problem - but probably a random glitch rather than something we can fix

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.