use larger chunks to avoid write amplification

Bug #1612465 reported by Seth Arnold
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snapcraft
Invalid
Medium
Sergio Schvezov

Bug Description

https://github.com/snapcore/snapcraft/pull/615 was proposed as a fix for https://bugs.launchpad.net/snapcraft/+bug/1597387 .

This pull included some code that operated on kilobyte-sized chunks of data:

+ with open(destination, 'wb') as destination_file:
+ for buf in request_stream.iter_content(1024):
+ destination_file.write(buf)

Because Advanced Format disks use 4k sectors and SSDs typically use 8K sectors (I've heard 2K-32K before), this may introduce write amplifications. Hopefully the OS can coalesce these small writes but we could avoid the issue entirely by using larger write(2) sizes. Most disks appear happiest writing 128K blocks, but even 8K would be an improvement for many environments.

Thanks

Kyle Fazzari (kyrofa)
Changed in snapcraft:
status: New → Confirmed
importance: Undecided → Medium
Changed in snapcraft:
assignee: nobody → Sergio Schvezov (sergiusens)
milestone: none → 2.28
status: Confirmed → Triaged
Changed in snapcraft:
milestone: 2.28 → none
Revision history for this message
Kyle Fazzari (kyrofa) wrote :

I've looked into this a bit more. Reading the python docs for open, where we do _not_ supply the `buffering` parameter:

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

- Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.
- “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

Thanks to Python's default of buffering with a heuristic to determine the underlying device's block size, I think this bug is invalid. Please let me know if you disagree.

Changed in snapcraft:
status: Triaged → Invalid
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Thanks for investigating, I believe you're right:

open("/home/sarnold/tmp/deleteme", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
ioctl(3, TCGETS, 0x7ffc50d63690) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
write(3, " "..., 4096) = 4096
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
write(3, " "..., 4096) = 4096
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
write(3, " "..., 2048) = 2048
close(3) = 0

#!/usr/bin/python3
import time

with open("/home/sarnold/tmp/deleteme", 'wb') as f:
    for x in range(0,10):
        f.write(b" " * 1024)
        time.sleep(1)

Thanks

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.