Replace hacky /usr/bin/fallocate subprocess call

Bug #824891 reported by Jason Gerard DeRose
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
FileStore
Fix Released
High
Jason Gerard DeRose

Bug Description

As with many peer-to-peer sorts of software, file pre-allocation is important in dmedia as it ensures the needed space for a file is reserved at the start of a file transfer.

However, pre-allocation is also extremely important in dmedia for performance reasons, especially as video editing is perhaps the most important use-case for dmedia. Particularly when a FileStore resides on an ext4 filesystem (which is what we recommend), using fallocate allows us to get instant pre-allocation without writing zeros, to get the most contiguous allocation possible, and to take advantage of ext4 extents as much as possible.

When importing or transferring, the file-size is known in advance, so FileStore uses fallocate (when available). The only time FileStore doesn't use fallocate is when writing a file into the FileStore as it is rendered (in which case the final file-size isn't known in advance).

Currently FileStore calls `/usr/bin/fallocate`, which gets the job done but is kinda hacky and dirty. Instead, we should use the posix_fallocate() system call.

I'd personally really like to see os.fallocate(fd, offset, length) added to the standard library in Python 3.3, but in the meantime, we can carry a small extension module in filestore to do the same.

Tags: fallocate

Related branches

Revision history for this message
Manish Sinha (मनीष सिन्हा) (manishsinha) wrote :

I would suggest keeping it open. This might be a hacky solution but works perfectly.

Just think about the solutions.
1) Getting fallocate implemented upstream - Takes time
2) Create your own goddamn wrapper library - One more goddamn library to maintain

I would prefer 1. Solution no 2 would make maintenance tougher.
It would be better to remove this bug from that milestone. Maintainer - take your call.

Changed in filestore:
milestone: 11.09 → 11.10
Changed in filestore:
status: Triaged → Fix Committed
assignee: nobody → Jason Gerard DeRose (jderose)
milestone: 11.10 → 11.09
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Well, I was fixing FileStore.allocate_partial() so tmp_fp would be in mode 'wb' when the file didn't exist before, and mode 'rb+' only when truly resuming a download (so higher level code knows whether to check the content hashes of existing leaves)... and one thing led to another.

As far as I can tell, tmp_fp.truncate() is using fallocate when available. Timing things when allocating a 4GB file strongly suggests as much.

When I tried it with a FileStore in my ecryptfs home partition (which does not support fallocate), it took close to a minute. But when I tried it on an ext4 partition I use just for dmedia testing, it took like half a second. In both cases I included a call to /bin/sync in the elapsed time, just in case that was masking things in some weird way. Oh, and both are on rotational drives... actually, the 2nd test was on a slower WDC green drive.

I'm not sure if this is a Python3 feature, or just some glibc magic behind the scenes. Either way, cool. And now the files are guaranteed to be pre-allocated, even if tmp_fp.truncate() resorts to slow writing of zero blocks. This is better.

Note to self: we should talk to the ecryptfs folks about whether fallocate support could be added in the future.

Another note to self: to be conservative, I'm thinking initially we should not recommend FileStore be on encrypts partitions, just to be extra safe about people's data. I use ecryptfs on all my installs, never had a problem, and I've testing dmedia/filestore a lot of it, never had a problem... but I'd like to have a little more field time there just in case. dmedia is slightly weird as far as file-system usage patterns go, so I'm a touch worried we might bump up into some weird corner case.

Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Oops, spoke too soon. `df` shows that the space isn't actually getting reserved... and you can tmp_fp.truncate() files till past the available space on the drive. Apparently there are some POSIX file system semantics I don't understand.

So check_call() is back, although these two methods did get some needed cleanup in the process.

Changed in filestore:
status: Fix Committed → Triaged
assignee: Jason Gerard DeRose (jderose) → nobody
milestone: 11.09 → 11.10
Changed in filestore:
milestone: 11.10 → 11.11
Changed in filestore:
milestone: 11.11 → 11.12
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Hot damn, posix_fallocate() is coming to Python3.3, my prayers have been answered:

http://docs.python.org/dev/library/os.html#os.posix_fallocate

Changed in filestore:
milestone: 11.12 → 12.01
Changed in filestore:
milestone: 12.01 → 12.02
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

I re-targeted this for 12.06 as Python 3.3 beta1 comes out in June, so that would a good time to implement this and start serious testing with Python 3.3. We'll have a fallback to use check_call() for Python 3.2.

http://www.python.org/dev/peps/pep-0398/#id2

Changed in filestore:
milestone: 12.02 → 12.06
Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

Careful! posix_fallocate() is a different beast to fallocate().
posix_fallocate() will use fallocate() if available but fall back to writing NULs if not.
I.E. write the file twice if not!
Really posix_fallocate() is of limited use and python should IMHO be providing os.fallocate(),
in preference to os.posix_fallocate().

Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Pádraig,

Thanks for the warning! Yeah, we want real-fallocate if available, but don't want to write zeros. So with posix_fallocate(), is there no way to know ahead of time if real fallocate will be used?

Hmm, maybe we'll need a small extension we ship that gives us access to real fallocate instead.

Cheers!

Changed in filestore:
milestone: 12.06 → 12.07
Changed in filestore:
milestone: 12.07 → 12.08
Changed in filestore:
milestone: 12.08 → 12.09
Changed in filestore:
milestone: 12.09 → 12.10
Changed in filestore:
milestone: 12.10 → 12.11
Changed in filestore:
status: Triaged → In Progress
assignee: nobody → Jason Gerard DeRose (jderose)
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

I decided to just byte the bullet and write a small C extension to give us fallocate(), and to also backport posix_fadvise() from Python 3.3.

Changed in filestore:
status: In Progress → Fix Committed
Changed in filestore:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.