Failed to provision cluster with cinder volumes

Bug #1326021 reported by Dmitry Mescheryakov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Medium
Andrew Lazarev

Bug Description

Create a cluster with node groups using cinder volumes as a backend. Use flavor which either has non-zero ephemeral drive or non-zero swap or both. During the provisioning you will see the following error:

2014-06-03 16:37:09.917 5530 TRACE sahara.context Traceback (most recent call last):
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/context.py", line 124, in _wrapper
2014-06-03 16:37:09.917 5530 TRACE sahara.context func(*args, **kwargs)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/service/api.py", line 202, in _provision_cluster
2014-06-03 16:37:09.917 5530 TRACE sahara.context INFRA.create_cluster(cluster)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/service/direct_engine.py", line 82, in create_cluster
2014-06-03 16:37:09.917 5530 TRACE sahara.context self._rollback_cluster_creation(cluster, ex)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/openstack/common/excutils.py", line 68, in __exit__
2014-06-03 16:37:09.917 5530 TRACE sahara.context six.reraise(self.type_, self.value, self.tb)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/service/direct_engine.py", line 65, in create_cluster
2014-06-03 16:37:09.917 5530 TRACE sahara.context volumes.attach(cluster)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/service/volumes.py", line 33, in attach
2014-06-03 16:37:09.917 5530 TRACE sahara.context attach_to_instances, node_group.instances)
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/context.py", line 190, in __exit__
2014-06-03 16:37:09.917 5530 TRACE sahara.context self._wait()
2014-06-03 16:37:09.917 5530 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/context.py", line 183, in _wait
2014-06-03 16:37:09.917 5530 TRACE sahara.context raise exceptions.ThreadException(self.failed_thread, self.exc)
2014-06-03 16:37:09.917 5530 TRACE sahara.context ThreadException: An error occurred in thread 'attach-volumes-for-ng-bigD-slave': An error occurred in thread 'mount-volumes-to-node-BIGDATAMEC-bigD-slave-001': RemoteCommandException: Error during command execution: "sudo mkfs.ext4 /dev/vdb"
2014-06-03 16:37:09.917 5530 TRACE sahara.context Return code: 1
2014-06-03 16:37:09.917 5530 TRACE sahara.context STDERR:
2014-06-03 16:37:09.917 5530 TRACE sahara.context mke2fs 1.41.12 (17-May-2010)
2014-06-03 16:37:09.917 5530 TRACE sahara.context /dev/vdb is mounted; will not make a filesystem here!
2014-06-03 16:37:09.917 5530 TRACE sahara.context

The problem is caused by our logic which recognizes cinder volumes, it does not handle ephemeral drives or volumes.

Changed in sahara:
assignee: nobody → Andrew Lazarev (alazarev)
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (master)

Fix proposed to branch: master
Review: https://review.openstack.org/103691

Changed in sahara:
status: Confirmed → In Progress
Changed in sahara:
milestone: none → juno-2
Revision history for this message
Luigi Toscano (ltoscano) wrote :

I'd update the title: I hit the issue also with non cinder-volumes (ephemeral drive) as long as I added a non-zero swap file, at least on IceHouse.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/103691
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=b5ed2f41da16755b1bbf6c9e47427db2a28b65e3
Submitter: Jenkins
Branch: master

commit b5ed2f41da16755b1bbf6c9e47427db2a28b65e3
Author: Andrew Lazarev <email address hidden>
Date: Tue Jul 1 13:41:16 2014 -0700

    Fixed volumes mount in case of existing volumes

    Changed pattern of volumes mounting from "create all volumes,
    find unmounted partitions, mount them" to "create volumes,
    remember their device names, wait for specific devices (not
    abstract as before), mount them". This allows to make sure that
    we are mounting devices we requisted and avoid interfering with
    other devices.

    Change-Id: Ib2f7a801f3e45020d24a59564a102c2fd77ab525
    Closes-Bug: #1326021

Changed in sahara:
status: In Progress → Fix Committed
Changed in sahara:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: juno-2 → 2014.2
Revision history for this message
Andrew Lazarev (alazarev) wrote :

One more way to hit the issue:

1. Use flavor with swap file for node group without volumes.

Observed behavior: cluster is in 'Waiting' state forever.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.