Failing to provision nodes with large storages

Bug #1526845 reported by Jeff Bilder
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry Bilunov
7.0.x
Fix Released
High
Alexey Stupnikov
8.0.x
Fix Released
High
Alexander Gordeev

Bug Description

I'm attempting to deploy an environment with (1) Controller (2) Storage Nodes (Cinder) (2) Compute nodes. The storage nodes are exactly the same:

Chassis:
SuperMicro 6047R-E1R36N
http://www.supermicro.com/products/system/4U/6047/SSG-6047R-E1R36N.cfm

Motherboard:
X9DRi-LN4F+
http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-LN4F_.cfm

root@bootstrap:~# parted /dev/sda print
Model: SMC SMC2108 (scsi)
Disk /dev/sda: 21.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number Start End Size File system Name Flags

root@bootstrap:~# parted /dev/sdb print
Model: SMC SMC2108 (scsi)
Disk /dev/sdb: 12.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number Start End Size File system Name Flags

root@bootstrap:~#

Deployment process goes through the process of installing the Operating System (Ubuntu). System reboots and boots back into PXE, since the disks were never partitioned and installed.

Revision history for this message
Jeff Bilder (jeffry-f) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Jeff, could you please state the exact MOS version you are using? (/etc/fuel/version.yaml on the master node)

Changed in mos:
status: New → Incomplete
assignee: nobody → MOS Puppet Team (mos-puppet)
milestone: none → 8.0
Changed in mos:
assignee: MOS Puppet Team (mos-puppet) → Fuel Library Team (fuel-library)
Revision history for this message
Jeff Bilder (jeffry-f) wrote :

[root@fuel ~]# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

Changed in mos:
status: Incomplete → New
Ilya Kutukov (ikutukov)
Changed in mos:
status: New → Confirmed
Changed in mos:
importance: Undecided → High
tags: added: area-library
tags: added: customer-found
tags: added: team-bugfix
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

Jeff, could you please generate a diagnostic snapshot using "fuel snapshot" CLI command and provide us the generated file?

Changed in mos:
assignee: Fuel Library Team (fuel-library) → Dmitry Bilunov (dbilunov)
Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Jeff Bilder (jeffry-f) wrote :
Changed in mos:
status: Incomplete → New
status: New → Confirmed
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

Sorry, Jeff, we still don't have enough data to investigate your issue.
Could you, please, provide us a copy of /var/log/docker-logs/ from the master node?

Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Jeff Bilder (jeffry-f) wrote :

Here are the logs from the fuel node's docker-logs. Please let me know if anything else is needed and appreciate the assistance.

Changed in mos:
status: Incomplete → In Progress
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

It seems there is a bug in fuel-agent's kickstart spaces validation code. It works fine on node 19 (SuperMicro1-Storage) but crashes on node 20 (SuperMicro2-Storage) because of a different volume set - node 19 has a 2.6M "cinder" PV, while node 20 has a 19.0M "cinder" PV.

You could possibly avoid triggering this bug by changing the partition scheme on node 20 to make volumes without an assigned mountpoint to have a size less than 16M.

https://github.com/openstack/fuel-agent/blob/50e90af6e3d560e9085ff71d2950cfbcca91af67/fuel_agent/drivers/ks_spaces_validator.py#L144

Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

Sorry, please multiply all volume sizes in my previous comment by 1M (so 2.6M becomes 2.6T, 19.0M becomes 19T and 16M becomes 16T).

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

MOS Maintenance, please apply the same patch to 7.0.

Revision history for this message
Dmitry Bilunov (dbilunov) wrote :
Changed in fuel:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Dmitry Bilunov (dbilunov)
milestone: none → 9.0
no longer affects: mos
no longer affects: mos/7.0.x
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/267047

summary: - Failing to provision storage node
+ Failing to provision storage node. fuel_agent.cmd.agent KeyError:
+ 'mount'
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/8.0)

Reviewed: https://review.openstack.org/267047
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=0c1b17537a35d1b56f382fcc48197d0ee0c02d90
Submitter: Jenkins
Branch: stable/8.0

commit 0c1b17537a35d1b56f382fcc48197d0ee0c02d90
Author: Dmitry Bilunov <email address hidden>
Date: Wed Dec 30 11:53:34 2015 +0300

    Fix fuel-agent crashes during ks_spaces validation

    fuel-agent validates pm_data.ks_spaces data structure to have a valid
    schema, to be non-empty and to have a root partition less than 16T.

    If it encounters a volume which does not have an assigned mountpoint
    (could be an LVM PV) and at the same time is larger than 16T, it crashes
    with a KeyError.

    This change avoids a crash by using dict.get which does not throw an
    exception in case a key is not present in the dict.

    Change-Id: I34971f756d6b17c334dd5f7834af5ca778f2462a
    Closes-Bug: 1526845
    (cherry picked from commit 96a19bd7911a7c535078284acfe355b850fd4f24)

Revision history for this message
Alexey Stupnikov (astupnikov) wrote : Re: Failing to provision storage node. fuel_agent.cmd.agent KeyError: 'mount'

Steps to reproduce in Mintenance Lab (tested for MOS 7.0):
1. Create lab environment with slave nodes having 20TB HDD.
2. Connect to Fuel web UI, check node count. Result: zero active slave nodes, bootstrap is loaded on slave nodes.

After patching and contaner restart:
1. All slave nodes are available.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/267493

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/fuel-agent (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Dmitry Bilunov <email address hidden>
Review: https://review.fuel-infra.org/16191

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/fuel-agent (openstack-ci/fuel-8.0/liberty)

Change abandoned by Alexander Evseev <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/16191

Revision history for this message
Artem Panchenko (apanchenko-8) wrote : Re: Failing to provision storage node. fuel_agent.cmd.agent KeyError: 'mount'

verified on bare-metal lab

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "506"
  build_id: "506"
  fuel-nailgun_sha: "8e954abd70ef0083109f34289de2553dcda544d4"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "ec7e212972ead554f21b52b9e165156665f659df"
  fuel-ostf_sha: "ab5fd151fc6c1aa0b35bc2023631b1f4836ecd61"
  fuel-mirror_sha: "351d568fa3b3e4dd062054b91d766aa54d379867"
  fuelmenu_sha: "234cb4cbb30fbd2df00f388c28f31606d9cae15f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "94507c5e4dad6d8cfbd8f5d41aa8389d5335990a"

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/7.0)

Reviewed: https://review.openstack.org/267493
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=57d3855758d00c1921eedc93ff7006b5a629fa5b
Submitter: Jenkins
Branch: stable/7.0

commit 57d3855758d00c1921eedc93ff7006b5a629fa5b
Author: Dmitry Bilunov <email address hidden>
Date: Wed Dec 30 11:53:34 2015 +0300

    Fix fuel-agent crashes during ks_spaces validation

    fuel-agent validates pm_data.ks_spaces data structure to have a valid
    schema, to be non-empty and to have a root partition less than 16T.

    If it encounters a volume which does not have an assigned mountpoint
    (could be an LVM PV) and at the same time is larger than 16T, it crashes
    with a KeyError.

    This change avoids a crash by using dict.get which does not throw an
    exception in case a key is not present in the dict.

    Change-Id: I34971f756d6b17c334dd5f7834af5ca778f2462a
    Closes-Bug: 1526845
    (cherry picked from commit 96a19bd7911a7c535078284acfe355b850fd4f24)

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote : Re: Failing to provision storage node. fuel_agent.cmd.agent KeyError: 'mount'

Verification is postponed. Need bare-metal lab for testing.

Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 7.0 + mu3 updates.

Result: Operating System (Ubuntu) is installed successfully on node with HDD 20TB.

tags: removed: on-verification
summary: - Failing to provision storage node. fuel_agent.cmd.agent KeyError:
- 'mount'
+ Failing to provision storage nodes with large storages
summary: - Failing to provision storage nodes with large storages
+ Failing to provision nodes with large storages
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.