Fuel for OpenStack

insufficient diskspace on deploy after deployment error

Bug #1262973 reported by Jason Venner on 2013-12-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Evgeniy L	Fuel for OpenStack 4.0

Bug Description

4.0 - 167 build
Host OS selected is ubuntu 12.04
Physical hardware with 3 disks per machine

On the first deploy, one host became unresponsive and the deployment failed during the openstack install.
Note the base host os for the nodes has completed as had the openstack deploy on the controller.
The node that failed was being provisioned as a CEPH OSD

I added a new node into that role, and deleted the failed node
On hitting the deploy button, the following error is given

Node 'Untitled (AE:E6)' has insufficient disk space
Volume group 'Cinder' requires a minimum of 1536MB
Volume group 'Image Storage' requires a minimum of 5120MB

AE:E6 has 3 disks
Disk one is 148.4GB and is fully allocated to the Base System
Disk two is 1.8T and is fully allocated to Cinder
Disk three is 1.8T and is fully allocated to Image Storage

Note that id 1 is AE:E6, passed below is the node_1/disks.yaml file

--
fuel node list
id | status | name | cluster | mac | roles | pending_roles | online
---|----------|------------------|---------|-------------------|----------------------------|---------------|-------
17 | discover | Untitled (AB:74) | 2 | 00:25:90:66:ab:74 | [] | [u'ceph-osd'] | True
1 | ready | Untitled (AE:E6) | 2 | 00:30:48:f7:ae:e6 | [u'controller', u'cinder'] | [] | True
16 | ready | Untitled (AB:80) | 2 | 00:25:90:66:ab:80 | [u'compute'] | [] | True
13 | error | Untitled (60:88) | 2 | 00:25:90:c1:60:88 | [u'ceph-osd'] | [] | True
14 | ready | Untitled (AB:92) | 2 | 00:25:90:66:ab:92 | [u'compute'] | [] | True
15 | ready | Untitled (EE:CC) | 2 | 00:25:90:c2:ee:cc | [u'compute'] | [] | True
12 | ready | Untitled (AF:24) | 2 | 00:30:48:f7:af:24 | [u'ceph-osd'] | [] | True
10 | ready | Untitled (AF:42) | 2 | 00:30:48:f7:af:42 | [u'ceph-osd'] | [] | True
11 | discover | Untitled (AE:E0) | None | 00:30:48:f7:ae:e0 | [] | [] | True
[root@fuel-master tmp]# fuel node --download --disk --node 1
disks configuration downloaded to /tmp/node_1/disks.yaml
[root@fuel-master tmp]# cd /tmp/node_1/
[root@fuel-master node_1]# more disks.yaml
- id: disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0
  name: sda
  size: 151935
  volumes:
  - name: os
    size: 151935
  - name: cinder
    size: 0
  - name: image
    size: 0
- id: disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0
  name: sdb
  size: 1907037
  volumes:
  - name: os
    size: 0
  - name: cinder
    size: 1907037
  - name: image
    size: 0
- id: disk/by-path/pci-0000:00:1f.2-scsi-2:0:0:0
  name: sdc
  size: 1907037
  volumes:
  - name: os
    size: 0
  - name: cinder
    size: 0
  - name: image
    size: 1907037

Revision history for this message

Jason Venner (jvenner-e) wrote on 2013-12-20:

environment yaml files Edit (3.4 KiB, application/x-tar)

Revision history for this message

Jason Venner (jvenner-e) wrote on 2013-12-20:

fuel master logfiles Edit (3.5 MiB, application/x-tar)

Mike Scherbakov (mihgen) on 2013-12-20

Changed in fuel:
milestone:	none → 4.0
assignee:	nobody → Fuel Python Team (fuel-python)

Evgeniy L (rustyrobot) on 2013-12-20

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Evgeniy L (rustyrobot)
status:	New → In Progress
importance:	Undecided → Medium

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2013-12-20:

The problem is after deployment changed disk naming

Before deployment it looked good
[
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0",
        "model": "ST3160815AS",
        "size": 160041885696,
        "name": "sda"
    },
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-2:0:0:0",
        "model": "ST2000DM001-1CH1",
        "size": 2000398934016,
        "name": "sdc"
    },
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0",
        "model": "ST2000DM001-1CH1",
        "size": 2000398934016,
        "name": "sdb"
    }
]

But after deployment format was changed

[
    {
        "model": "ST2000DM001-1CH1",
        "disk": "sdc",
        "size": 2000398934016,
        "name": "sdc"
    },
    {
        "model": "ST2000DM001-1CH1",
        "disk": "sdb",
        "size": 2000398934016,
        "name": "sdb"
    },
    {
        "model": "ST3160815AS",
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0",
        "size": 160041885696,
        "name": "sda"
    }
]

Changed in fuel:
status:	In Progress → Triaged
assignee:	Evgeniy L (rustyrobot) → Andrey Korolyov (xdeller)

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2013-12-20:

We use `disk` as uniq disk identificator.

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2013-12-20:

Name like sdb\sdc in `disk` field use in case if disk wasn't found in /dev/disk/by-path
See https://github.com/stackforge/fuel-web/blob/master/bin/agent#L249-L253

Revision history for this message

Ryan Moe (rmoe) wrote on 2013-12-20:

udev in Ubuntu returns the same path ID for all of these ATA devices (CentOS returns unique paths).

root@node-24:/lib/udev# udevadm test-builtin path_id /block/sda/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0
root@node-24:/lib/udev# udevadm test-builtin path_id /block/sdb/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0
root@node-24:/lib/udev# udevadm test-builtin path_id /block/sdc/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0

Therefore only one symlink is created in /dev/disk/by-path. We should use /dev/disk/by-id/ instead which is consistent across CentOS and Ubuntu and then to fall back to /dev/disk/by-path if there is no entry in by-id (this case should only occur when using virtio drives).

Revision history for this message

Ryan Moe (rmoe) wrote on 2013-12-21:

Additional info:

- During the Ubuntu installer the by-path entries are there. The udev version is the same

- This is repeatable in KVM by setting your disks to SATA. Deploy an environment, add a new node and deploy your change. You'll see the same error because the disks of the already deployed nodes has changed.

- In udev versions > 182 it won't even attempt to create by-path entries for ATA devices. http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=481dcf7c8f. At some point in the future we'll have to come up with a different way to find disks anyway.

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2013-12-22:

Looks like it affects one of our major features - adding node after deployment - for many cases, so increasing the priority to High. Let's see if we can provide quick workaround in 4.0 for this - why do we need any disk calculations of already deployed nodes? Can we skip this?

Changed in fuel:
importance:	Medium → High

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2013-12-23:

>> Can we skip this?

We have two statuses (error, operational), when redeployment could be required, so, we can't skip validation because user could remove some of the nodes, and we should check that cluster has enough space.

We can use name of disks (sda, sdb, sdc) as an uniq id for disks, but I'm not sure that it's a good idea.

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2013-12-23:

#10

We can add disk discovery by uuid using sysfs therefore not rely to the devfs paths, but it looks quite complex.

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2013-12-23:

#11

We decided to make workaround on nailgun said, but problem on agent said should be resolved anyway, so, I created seaprate ticket for it https://bugs.launchpad.net/fuel/+bug/1263648

Changed in fuel:
assignee:	Andrey Korolyov (xdeller) → Evgeniy L (rustyrobot)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-12-23: Fix proposed to fuel-web (master)

#12

Fix proposed to branch: master
Review: https://review.openstack.org/63739

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-12-23: Fix merged to fuel-web (master)

#13

Reviewed: https://review.openstack.org/63739
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=531fe28a2e614938dccd1f192939bdf54d9a8dd4
Submitter: Jenkins
Branch: master

commit 531fe28a2e614938dccd1f192939bdf54d9a8dd4
Author: Evgeniy L <email address hidden>
Date: Mon Dec 23 16:53:40 2013 +0400

Don't run volumes/disks checking in case
if node was provisioned

    * add checking if node was provisioned
    * make checking methods as protected
      instead of private, because they
      shouldn't be private from design point
      of view, it simplify unit testing

Closes-bug: #1262973
Change-Id: I036c16138c48054579eb1d8388d048ac7331c041