insufficient diskspace on deploy after deployment error

Bug #1262973 reported by Jason Venner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Evgeniy L

Bug Description

4.0 - 167 build
Host OS selected is ubuntu 12.04
Physical hardware with 3 disks per machine

On the first deploy, one host became unresponsive and the deployment failed during the openstack install.
Note the base host os for the nodes has completed as had the openstack deploy on the controller.
The node that failed was being provisioned as a CEPH OSD

I added a new node into that role, and deleted the failed node
On hitting the deploy button, the following error is given

Node 'Untitled (AE:E6)' has insufficient disk space
Volume group 'Cinder' requires a minimum of 1536MB
Volume group 'Image Storage' requires a minimum of 5120MB

AE:E6 has 3 disks
Disk one is 148.4GB and is fully allocated to the Base System
Disk two is 1.8T and is fully allocated to Cinder
Disk three is 1.8T and is fully allocated to Image Storage

Note that id 1 is AE:E6, passed below is the node_1/disks.yaml file

--
 fuel node list
id | status | name | cluster | mac | roles | pending_roles | online
---|----------|------------------|---------|-------------------|----------------------------|---------------|-------
17 | discover | Untitled (AB:74) | 2 | 00:25:90:66:ab:74 | [] | [u'ceph-osd'] | True
1 | ready | Untitled (AE:E6) | 2 | 00:30:48:f7:ae:e6 | [u'controller', u'cinder'] | [] | True
16 | ready | Untitled (AB:80) | 2 | 00:25:90:66:ab:80 | [u'compute'] | [] | True
13 | error | Untitled (60:88) | 2 | 00:25:90:c1:60:88 | [u'ceph-osd'] | [] | True
14 | ready | Untitled (AB:92) | 2 | 00:25:90:66:ab:92 | [u'compute'] | [] | True
15 | ready | Untitled (EE:CC) | 2 | 00:25:90:c2:ee:cc | [u'compute'] | [] | True
12 | ready | Untitled (AF:24) | 2 | 00:30:48:f7:af:24 | [u'ceph-osd'] | [] | True
10 | ready | Untitled (AF:42) | 2 | 00:30:48:f7:af:42 | [u'ceph-osd'] | [] | True
11 | discover | Untitled (AE:E0) | None | 00:30:48:f7:ae:e0 | [] | [] | True
[root@fuel-master tmp]# fuel node --download --disk --node 1
disks configuration downloaded to /tmp/node_1/disks.yaml
[root@fuel-master tmp]# cd /tmp/node_1/
[root@fuel-master node_1]# more disks.yaml
- id: disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0
  name: sda
  size: 151935
  volumes:
  - name: os
    size: 151935
  - name: cinder
    size: 0
  - name: image
    size: 0
- id: disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0
  name: sdb
  size: 1907037
  volumes:
  - name: os
    size: 0
  - name: cinder
    size: 1907037
  - name: image
    size: 0
- id: disk/by-path/pci-0000:00:1f.2-scsi-2:0:0:0
  name: sdc
  size: 1907037
  volumes:
  - name: os
    size: 0
  - name: cinder
    size: 0
  - name: image
    size: 1907037

Revision history for this message
Jason Venner (jvenner-e) wrote :
Revision history for this message
Jason Venner (jvenner-e) wrote :
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.0
assignee: nobody → Fuel Python Team (fuel-python)
Evgeniy L (rustyrobot)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Evgeniy L (rustyrobot)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Evgeniy L (rustyrobot) wrote :

The problem is after deployment changed disk naming

Before deployment it looked good
[
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0",
        "model": "ST3160815AS",
        "size": 160041885696,
        "name": "sda"
    },
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-2:0:0:0",
        "model": "ST2000DM001-1CH1",
        "size": 2000398934016,
        "name": "sdc"
    },
    {
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0",
        "model": "ST2000DM001-1CH1",
        "size": 2000398934016,
        "name": "sdb"
    }
]

But after deployment format was changed

[
    {
        "model": "ST2000DM001-1CH1",
        "disk": "sdc",
        "size": 2000398934016,
        "name": "sdc"
    },
    {
        "model": "ST2000DM001-1CH1",
        "disk": "sdb",
        "size": 2000398934016,
        "name": "sdb"
    },
    {
        "model": "ST3160815AS",
        "disk": "disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0",
        "size": 160041885696,
        "name": "sda"
    }
]

Changed in fuel:
status: In Progress → Triaged
assignee: Evgeniy L (rustyrobot) → Andrey Korolyov (xdeller)
Revision history for this message
Evgeniy L (rustyrobot) wrote :

We use `disk` as uniq disk identificator.

Revision history for this message
Evgeniy L (rustyrobot) wrote :

Name like sdb\sdc in `disk` field use in case if disk wasn't found in /dev/disk/by-path
See https://github.com/stackforge/fuel-web/blob/master/bin/agent#L249-L253

Revision history for this message
Ryan Moe (rmoe) wrote :

udev in Ubuntu returns the same path ID for all of these ATA devices (CentOS returns unique paths).

root@node-24:/lib/udev# udevadm test-builtin path_id /block/sda/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0
root@node-24:/lib/udev# udevadm test-builtin path_id /block/sdb/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0
root@node-24:/lib/udev# udevadm test-builtin path_id /block/sdc/
ID_PATH=pci-0000:00:1f.2-scsi-0:0:0:0
ID_PATH_TAG=pci-0000_00_1f_2-scsi-0_0_0_0

Therefore only one symlink is created in /dev/disk/by-path. We should use /dev/disk/by-id/ instead which is consistent across CentOS and Ubuntu and then to fall back to /dev/disk/by-path if there is no entry in by-id (this case should only occur when using virtio drives).

Revision history for this message
Ryan Moe (rmoe) wrote :

Additional info:

- During the Ubuntu installer the by-path entries are there. The udev version is the same

- This is repeatable in KVM by setting your disks to SATA. Deploy an environment, add a new node and deploy your change. You'll see the same error because the disks of the already deployed nodes has changed.

- In udev versions > 182 it won't even attempt to create by-path entries for ATA devices. http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=481dcf7c8f. At some point in the future we'll have to come up with a different way to find disks anyway.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Looks like it affects one of our major features - adding node after deployment - for many cases, so increasing the priority to High. Let's see if we can provide quick workaround in 4.0 for this - why do we need any disk calculations of already deployed nodes? Can we skip this?

Changed in fuel:
importance: Medium → High
Revision history for this message
Evgeniy L (rustyrobot) wrote :

>> Can we skip this?

We have two statuses (error, operational), when redeployment could be required, so, we can't skip validation because user could remove some of the nodes, and we should check that cluster has enough space.

We can use name of disks (sda, sdb, sdc) as an uniq id for disks, but I'm not sure that it's a good idea.

Revision history for this message
Andrey Korolyov (xdeller) wrote :

We can add disk discovery by uuid using sysfs therefore not rely to the devfs paths, but it looks quite complex.

Revision history for this message
Evgeniy L (rustyrobot) wrote :

We decided to make workaround on nailgun said, but problem on agent said should be resolved anyway, so, I created seaprate ticket for it https://bugs.launchpad.net/fuel/+bug/1263648

Changed in fuel:
assignee: Andrey Korolyov (xdeller) → Evgeniy L (rustyrobot)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/63739

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/63739
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=531fe28a2e614938dccd1f192939bdf54d9a8dd4
Submitter: Jenkins
Branch: master

commit 531fe28a2e614938dccd1f192939bdf54d9a8dd4
Author: Evgeniy L <email address hidden>
Date: Mon Dec 23 16:53:40 2013 +0400

    Don't run volumes/disks checking in case
    if node was provisioned

    * add checking if node was provisioned
    * make checking methods as protected
      instead of private, because they
      shouldn't be private from design point
      of view, it simplify unit testing

    Closes-bug: #1262973
    Change-Id: I036c16138c48054579eb1d8388d048ac7331c041

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.