Partitioning should align for performance

Bug #1513085 reported by Mark Shuttleworth on 2015-11-04
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Jeffrey C Jones
curtin
Medium
Unassigned

Bug Description

Watching a recent MAAS install I noticed this in the console output:

mdadm: No arrays found in config file or automatically
Error: /dev/sdb: unrecognised disk label
mdadm: No arrays found in config file or automatically
Error: /dev/sdb: unrecognised disk label
Error: /dev/sdc: unrecognised disk label
mdadm: No arrays found in config file or automatically
Error: /dev/sdc: unrecognised disk label
Error: /dev/sdd: unrecognised disk label
mdadm: No arrays found in config file or automatically
Error: /dev/sdd: unrecognised disk label
Warning: The resulting partition is not properly aligned for best performance.
Error: /dev/sdd: unrecognised disk label
UUID: ac6fd136-f33f-47cf-93f2-9502e3ad67f9
Set UUID: 98ff872c-cd60-435e-9751-6af08d1dd74a
version: 0
nbuckets: 488396
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1
UUID: 14e85839-2906-4b76-8bf0-17c98d3f75ee
Set UUID: 98ff872c-cd60-435e-9751-6af08d1dd74a
version: 1
block_size: 1
data_offset: 16
Error: /dev/bcache0: unrecognised disk label
Error: /dev/bcache0: unrecognised disk label
Warning: The resulting partition is not properly aligned for best performance.
Error: /dev/bcache0: unrecognised disk label
--2015-11-04 14:19:59-- http://192.168.9.2:5248/images/ubuntu/amd64/generic/trusty/daily/root-tgz
Connecting to 192.168.9.2:5248... connected.
HTTP request sent, awaiting response... 200 OK
Length: 322584502 (308M) [text/html]

In particular I noticed the "Warning: The resulting partition is not properly aligned for best performance." Using parted, I've found that it's best to specify the first partitioning as starting at "1" using the compact units, which avoids this error, though TBH I have no idea why that works and look forward to having that all explained to me ;)

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.9.0
importance: Critical → High
Blake Rouse (blake-rouse) wrote :

It might be best for MAAS to make sure that all partition sizes are a multiple of 4MiB. That should align the partitions correctly on disk and fix an issue where the LVM PV extents need to be a multiple of 4MiB.

Download full text (3.3 KiB)

A few comments for the "look forward to having that all explained to me" part.

As already identified the message is from parted as a warning, not something like mdadm/lvm/dm that actually detect a real issue (yet).

At the core of the check it does:
ok = (part->geom.start % pa->grain_size == pa->offset);
Attributes of "part" are our configuration and "pa" are disk attributes it derives disk attributes.

Parted defines:
ped_device_get_minimum_alignment
ped_device_get_optimum_alignment
The values it considers are from /sys/block/<disk>/queue/

The workaround of "1 compact" works because compact it is defined as:
"This is a special unit that defaults to megabytes for input, and picks a unit that gives a compact human readable representation for output."
That means "1 compact" translates into "1 MB" which is a good alignment for the usual values of 4k or so.
In fact >=1M is parteds default expected value for "good" alignment if it has no further disk information.

I can reproduce and test the behaviour with almost any current block testcase we have. in curtin vmtests
In there curtin just does what it is told - no magic to avoid alignment issues there.
So the following yaml translates to the latter commands:
      - id: sda
        type: disk
        ptable: msdos
        model: QEMU HARDDISK
        path: /dev/vdb
        name: main_disk
      - id: sda1
        type: partition
        size: 3GB
        device: sda
        flag: boot
      - id: sda_extended
        type: partition
        size: 5G
        flag: extended
        device: sda
      - id: sda2
        type: partition
        size: 2G
        flag: logical
        device: sda
      - id: sda3
        type: partition
        size: 2G
        flag: logical
        device: sda

Running command ['parted', '/dev/vdb', '--script', 'mklabel', 'msdos'] with allowed return codes [0] (shell=False, capture=False)
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'primary', '2048s', '6293504s'] with allowed return codes [0] (shell=False, capture=False)
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'extended', '6293506s', '16779266s'] with allowed return codes [0] (shell=False, capture=False)
Warning: The resulting partition is not properly aligned for best performance.
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'logical', '6293509s', '10487813s'] with allowed return codes [0] (shell=False, capture=False)
Warning: The resulting partition is not properly aligned for best performance.
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'logical', '10487815s', '14682119s'] with allowed return codes [0] (shell=False, capture=False)
Warning: The resulting partition is not properly aligned for best performance.

Curtin starts the first partition at 1 MB and ends at 1MB+Size
In our example we requested 3GB
START of 1st Partition = 2048
END => 2048+((2**30)*3/512)
So the partition goes 2048s to 6293504s
Note ((6293504-2048)*512)/(2**30) = 3GB

If the config is rather vague like in the example above we might try to auto-align in curtin at the cost of some space.

As soon as the yaml from maas will contain sector specifications curtin has to obey ...

Read more...

Changed in curtin:
status: New → Confirmed

It's fine by me if the finest granularity we allow is 4MB and also fine
if we waste up to that amount at the start of any disk that MAAS
touches. Seriously, disks are cheap these days :)

So let's just round off to the nearest 4MB and optimise for clean
performance rather than squeezing every byte off the disk. All SSDs are
overprovisioned anyhow, the actual size of the disk bears no correlation
to these numbers!

I'd absolutely NOT want the admin to be thinking about sectors AT ALL.
Just let them specify sizes down to the nearest 4MB and we'll be fine.
Let parted handle the actual micro-layout. If using "1 compact unit" is
the best way to be optimised at the start, let's do that. And to be
safe, feed it the end of the final partition as "-1" in compact units
rather than running the risk that it fails due to a small block count issue.

Mark

if I read the curtin method commands/block_meta.py:partition_handler correctly there is no way to directly specify sectors in the yaml.
With that in mind IMHO MAAS can't easily force the hard 4MB alignment so curtin has to deal with the fix.

After an experimental version 1 I found http://users.suse.com/~garloff/align_partition.py which is GPL and contains almost all we need.

The patch:
- aligns partition start and end at 4MB
- also considers disk alignment offset when aligning
- adds the feature of taking care of 512b/4k logical sector size when partitioning

The following patch is is a suggestion to start review and discussion before going further down that path.

The former test log now looks like:
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'primary', '8192s', '6299648s']
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'extended', '6307840s', '16793600s']
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'logical', '6316032s', '10510336s']
Running command ['parted', '/dev/vdb', '--script', 'mkpart', 'logical', '10518528s', '14712832s']

Changed in curtin:
status: Confirmed → In Progress
Mark Shuttleworth (sabdfl) wrote :

It may not be appropriate to merge code just based on a license, but in
principle that's great.

Mark

Scott Moser (smoser) on 2015-11-12
Changed in curtin:
status: In Progress → Fix Committed
importance: Undecided → Medium
Changed in maas:
assignee: nobody → Jeffrey C Jones (trapnine)
status: Triaged → In Progress

Thanks Scott for merging the changes. These should fix the issue as soon as the new curtin is used.

@Jeffrey - FYI - there should be no MAAS code change required. But I'd recommend running all kind of tests you have to make sure everything is still fine.

Long Story short (for you to focus testing):

ISSUE 1: Partitions were actually always "requested size+1sector"
=> Now they are exactly the requested size

ISSUE 2: An extended partition always needed "requested size + 2 sectors per logical partition" and e.g. 10G partition could not fit e-g- 3G+3G+4G logical partitions
=> there are various constraints with extended partitions, to avoid these they will now "requested size + 1MB per logical partition"
=> also as "Humans would assume" 3G+3G+4G now fits into 10G

ISSUE 3: Due to the former two issues usually only the first partition was properly aligned to 1MB
=> now they are all aligned as long as the sizes you request are aligned to 1MB (which they usually are as MAAS exposes 1MB granularity)
Most issues came up with extended/logical msdos partitions, but the non-alignment issue also hit gpt partitions.

Jeffrey C Jones (trapnine) wrote :

We decided to go ahead and align the partitions in MAAS to 4MiB anyway, so that the sizes in the DB would match those created on the real machines exactly in all cases.

Tested curtin #298 and so far, so good with several lvm and flat setups. curtin is making the exact partition sizes requested.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers