WrongPartitionSchemeError on RAID

Bug #1543233 reported by Sergey Galkin on 2016-02-08
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Alexander Gordeev
8.0.x
High
Alexey Stupnikov
Mitaka
High
Alexander Gordeev
Newton
High
Alexander Gordeev

Bug Description

Steps to reproduce:
1. Create env
2. Add 1 controller 2 computes and 3 ceph nodes on 6 servers ProLiant DL380 Gen9 with HP Smart Array P840 Controller
3. Reconfigure disk from default. In my case: On sda - Base System: 0.3TB, Virtual storage: 2.8TB, remove all partition from sdb <attached as before.png>
4. Start deploy

Deploy controller nodes failed with error 'WrongPartitionSchemeError: Invalid boundaries: begin and end are not inside available free space'

2016-02-08 17:05:29.430 6277 DEBUG fuel_agent.utils.partition [-] Info output:
BYT;
/dev/sda:3433759MiB:scsi:512:512:gpt:HP LOGICAL VOLUME;
1:0.02MiB:1.50MiB:1.48MiB:free;
1:1.50MiB:620395MiB:620394MiB::primary:;
1:620396MiB:3433759MiB:2813363MiB:free;

2016-02-08 17:05:29.430 6277 DEBUG fuel_agent.utils.partition [-] Info result: {'generic': {'dev': '/dev/sda', 'physical_block': 512, 'table': 'gpt', 'logical_block': 512, 'model': 'HP LOGICAL VOLUME', 'size': 3433759}, 'parts': [{'begin': 1, 'num': 1, 'end': 2, 'fstype': 'free', 'size': 2}, {'begin': 2, 'num': 1, 'end': 620395, 'fstype': None, 'size': 620394}, {'begin': 620396, 'num': 1, 'end': 3433759, 'fstype': 'free', 'size': 2813363}]}
2016-02-08 17:05:29.431 6277 ERROR fuel_agent.cmd.agent [-] Invalid boundaries: begin and end are not inside available free space
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent Traceback (most recent call last):
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/cmd/agent.py", line 101, in main
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent getattr(mgr, action)()
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/manager.py", line 776, in do_provisioning
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent self.do_partitioning()
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/manager.py", line 226, in do_partitioning
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent pu.make_partition(prt.device, prt.begin, prt.end, prt.type)
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent File "/usr/lib/python2.7/dist-packages/fuel_agent/utils/partition.py", line 154, in make_partition
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent 'Invalid boundaries: begin and end '
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent WrongPartitionSchemeError: Invalid boundaries: begin and end are not inside available free space
2016-02-08 17:05:29.431 6277 TRACE fuel_agent.cmd.agent

And after this disk configuration returned to default <attached as after.png>

Snapshot attached

fuel - version
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "539"
  build_id: "539"
  fuel-nailgun_sha: "baec8643ca624e52b37873f2dbd511c135d236d9"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "e2d79330d5d708796330fac67722c21f85569b87"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "87dfb6bc25d4650264f09c338ed77c21a3d6fe87"

Sergey Galkin (sgalkin) wrote :
Sergey Galkin (sgalkin) wrote :

Configured disks

Sergey Galkin (sgalkin) wrote :

disk configuration after error

Ilya Kutukov (ikutukov) on 2016-02-09
tags: added: area-python

Fix proposed to branch: master
Review: https://review.openstack.org/278492

Changed in fuel:
status: Confirmed → In Progress
Alexander Gordeev (a-gordeev) wrote :

fuel-agent raised an exception regarding Invalid Boundaries due to partition' boundaries alignment done by parted.

Under certain circumstances due to that alignment, the end of particular partition could cross 1M boundary. And due to actual partition' bounderies being rounded up, fuel-agent mistakenly assumes that partition couldn't fit within provided boundaries and raises errors.WrongPartitionSchemeError.

In addition to that, parted always creates a relatively huge 1.5M gap prior the first partition on this particular combination of h/w (HP Gen9 and raid). However, it looks unrelated to the original issue, unless one will care that first partition becomes 2M smaller than expected.

In order to mitigate aligning related issue, i proposed to add 1M room between partitions. I will test it against h/w and will post the result here.

Bad news is the issue with partition alignment is floating around and only occurs under very specific conditions. It looks like it depends only generated partitioning layout. Provided snapshot indicates that few nodes were provisioned smooth and without errors.

Probably, the change is too high to reproduce the issue under every version of fuel. As only partitioning layout matters. I think it's critical and we need to fix it ASAP and arrange backported fixes for updates for all supported version of fuel.

Alexander Gordeev (a-gordeev) wrote :

Eg:

2016-02-08T17:05:29.720187+00:00 info: 2016-02-08 17:05:29.422 6277 DEBUG fuel_agent.utils.partition [-] Trying to create a partition: dev=/dev/sda begin=620395 end=3433259
2016-02-08T17:05:29.720331+00:00 info: 2016-02-08 17:05:29.425 6277 DEBUG fuel_agent.utils.utils [-] Trying to execute command: parted -s /dev/sda -m unit MiB print free
2016-02-08T17:05:29.720413+00:00 info: 2016-02-08 17:05:29.430 6277 DEBUG fuel_agent.utils.partition [-] Info output:
2016-02-08T17:05:29.720466+00:00 info: BYT;
2016-02-08T17:05:29.720532+00:00 info: /dev/sda:3433759MiB:scsi:512:512:gpt:HP LOGICAL VOLUME;
2016-02-08T17:05:29.720597+00:00 info: 1:0.02MiB:1.50MiB:1.48MiB:free;
2016-02-08T17:05:29.720664+00:00 info: 1:1.50MiB:620395MiB:620394MiB::primary:;
2016-02-08T17:05:29.720730+00:00 info: 1:620396MiB:3433759MiB:2813363MiB:free;
2016-02-08T17:05:29.720793+00:00 info:

free space starts with 620396:
> 1:620396MiB:3433759MiB:2813363MiB:free;

but fuel-agent tried to create partition with the following boundaries: begin=620395 end=3433259

that's exactly why the exception was thrown.

Dmitry Pyzhov (dpyzhov) on 2016-02-10
no longer affects: fuel/mitaka
Alexander Gordeev (a-gordeev) wrote :

I've applied a fix which adds 1M room to the beginning of each new partition https://review.openstack.org/#/c/278492/1

the results are quite shocking: http://paste.openstack.org/show/486595/

For some reasons, parted often adds 1.5M gap between partitions. I don't know why.

Alexander Gordeev (a-gordeev) wrote :

here the answer:

man parted:

              minimal
                     Use minimum alignment as given by the disk topology information. This and the opt value will use layout information provided by the disk to align the logical partition table addresses to actual physical blocks on the disk. The min value is the minimum aligment needed to align the partition properly to physical blocks, which avoids performance degradation.

              optimal
                     Use optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal
                     performance.

so, disk /dev/sda reported that its own optimal alignment is 1.5M, therefore parted applied it.

http://paste.openstack.org/show/486604/ gives the following results http://paste.openstack.org/show/486595/

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: master
Review: https://review.openstack.org/278492
Reason: didn't resolve the issue

Alexander Gordeev (a-gordeev) wrote :

so, the bug is reproducible only under specific disk h/w which reports relatively huge (more than standard 4k) aligments values.

so, tricky raid arrays/SAN/HBA probably will be affected as well. While regular rotating HDDs wont be affected.

tags: added: tricky
Alexander Gordeev (a-gordeev) wrote :

Proposed fix was abandoned since it's not a fix but rather an ugly workaround for a particular combination of h/w.

it's not that easy to teach fuel-agent to calculate partition boundaries with optimal/minimum alignment values reported by disk subsystem.

Dmitry Pyzhov (dpyzhov) on 2016-03-02
tags: added: module-volumes
Vasyl Saienko (vsaienko) wrote :

Ironic team faced with the same prolem on Intel scale lab. there is logs from fuel-agent http://paste.openstack.org/show/492903/

Reviewed: https://review.openstack.org/278492
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=4c74eaa1875dd28d2e2c8256281d9088f83f25fb
Submitter: Jenkins
Branch: master

commit 4c74eaa1875dd28d2e2c8256281d9088f83f25fb
Author: Alexander Gordeev <email address hidden>
Date: Wed Feb 10 19:07:48 2016 +0300

    Switch from optimal alignment to minimum

    fuel-agent always uses optimal alignment for partitions and
    thoroughly relies on parted to perform that aligning. (-a optimal)

    The issue is that under certain circumstances due to that alignment,
    the end of particular partition could cross 1M boundary. And due to
    actual partition' bounderies being rounded up, fuel-agent mistakenly
    assumes that partition couldn't fit within provided boundaries and
    raises errors.WrongPartitionSchemeError.

    However, some h/w data storages are well known for reporting
    relatively huge optimal IO sizes (16M or even bigger), so the 1M room
    can't be enough. Thus, optimal aligning is not an option. Therefore,
    parted has been switched to the minimum alignment.

    The min value is the minimum aligment needed to align the partition
    properly to physical blocks, which avoids performance degradation.

    Change-Id: I83116ccc9236053a93664c1cf40a3ef0c1a189b7
    Closes-Bug: #1543233

Changed in fuel:
status: In Progress → Fix Committed

Change abandoned by Leontii Istomin (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/304561

Reviewed: https://review.openstack.org/312933
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=af247ea07a469c467b83687013a68b0e3cff05fe
Submitter: Jenkins
Branch: master

commit af247ea07a469c467b83687013a68b0e3cff05fe
Author: Alexander Gordeev <email address hidden>
Date: Thu May 5 15:52:40 2016 +0300

    Add option to choose partition alignment mode

    Under certain circumstances due to that alignment, the end of
    particular partition could cross 1M boundary. And due to actual
    partition' bounderies being rounded up, fuel-agent mistakenly
    assumes that partition couldn't fit within provided boundaries
    and raises errors.WrongPartitionSchemeError.

    However, some h/w data storages are well known for reporting
    relatively huge optimal IO sizes (16M or even bigger), so the 1M
    room can't be enough. Thus, optimal aligning is not an option.
    In such cases, it's better to proceed with minimal mode.

    It's the minimum aligment needed to align the partition properly
    to physical blocks which avoids performance degradation.

    Partial-Bug: #1543233
    DocImpact

    Change-Id: I1dd94731f497f3deb47cec9a23957e3706c2fb3b

Reviewed: https://review.openstack.org/320050
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=bf4ed24462375ed3c1741a3a28b5840f582f1588
Submitter: Jenkins
Branch: stable/mitaka

commit bf4ed24462375ed3c1741a3a28b5840f582f1588
Author: Alexander Gordeev <email address hidden>
Date: Thu May 5 15:52:40 2016 +0300

    Add option to choose partition alignment mode

    Under certain circumstances due to that alignment, the end of
    particular partition could cross 1M boundary. And due to actual
    partition' bounderies being rounded up, fuel-agent mistakenly
    assumes that partition couldn't fit within provided boundaries
    and raises errors.WrongPartitionSchemeError.

    However, some h/w data storages are well known for reporting
    relatively huge optimal IO sizes (16M or even bigger), so the 1M
    room can't be enough. Thus, optimal aligning is not an option.
    In such cases, it's better to proceed with minimal mode.

    It's the minimum aligment needed to align the partition properly
    to physical blocks which avoids performance degradation.

    Partial-Bug: #1543233
    DocImpact

    Change-Id: I1dd94731f497f3deb47cec9a23957e3706c2fb3b
    (cherry picked from commit af247ea07a469c467b83687013a68b0e3cff05fe)

tags: added: in-stable-mitaka
Vitaly Sedelnik (vsedelnik) wrote :

Targeted to 8.0-mu-2 as scale bug.

Change abandoned by Vasyl Saienko (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/301248

Vitaly Sedelnik (vsedelnik) wrote :

Per Alexander Gordeev the recommendation for 8.0 is to backport https://review.openstack.org/#/c/320050/

Alexey Stupnikov (astupnikov) wrote :

Steps to reproduce.

Note. It is impossible (I couldn't find the way to do it) to reproduce specific disk architectures in virtualized environments, so we can't explicitly confirm that this patch will work in production environments.

On the other hand, we still can confirm that this patch works well. Since the problem itself and the way to fix it are both clear enough, we can simplify testing process and just confirm that patch is ok.

Confirm that additional 1MB indent was added:
1. Create new environment, add one controller node
2. Deploy environment
3. Go to Logs tab; select slave's logs; open fuel-agent's logs.
4. Look for a lines containing "parted -a optimal -s /dev/vda" text (there should be 9 lines).
After fix 1MB should be added at the end of every partition.

Confirm that "partition_alignment" option was implemented:
1. Same as above
2. Edit /etc/fuel-agent/fuel-agent.conf on slave node, change partition_alignment parameter to "minimal"
3. Repeat steps 2-4 from previous description. On step 4 look for the same log messages, but check -a option value.

You can do both checks at the same time.

Reviewed: https://review.openstack.org/333837
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=536d9ef089b9ade35d455f1afadea9f2f70b390f
Submitter: Jenkins
Branch: stable/8.0

commit 536d9ef089b9ade35d455f1afadea9f2f70b390f
Author: Alexander Gordeev <email address hidden>
Date: Thu May 5 15:52:40 2016 +0300

    Add option to choose partition alignment mode

    Under certain circumstances due to that alignment, the end of
    particular partition could cross 1M boundary. And due to actual
    partition' bounderies being rounded up, fuel-agent mistakenly
    assumes that partition couldn't fit within provided boundaries
    and raises errors.WrongPartitionSchemeError.

    However, some h/w data storages are well known for reporting
    relatively huge optimal IO sizes (16M or even bigger), so the 1M
    room can't be enough. Thus, optimal aligning is not an option.
    In such cases, it's better to proceed with minimal mode.

    It's the minimum aligment needed to align the partition properly
    to physical blocks which avoids performance degradation.

    Closes-Bug: #1543233

    Change-Id: I1dd94731f497f3deb47cec9a23957e3706c2fb3b
    (cherry picked from commit bf4ed24462375ed3c1741a3a28b5840f582f1588)

tags: added: on-verification
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers