Page MenuHomePhabricator

Define fs::allocate_file which preallocates disk blocks.
Needs ReviewPublic

Authored by ruiu on Oct 31 2017, 11:09 AM.

Details

Reviewers
zturner
espindola
Summary

Unlike fs::resize_file, the new function guarantees that disk blocks
for a given size are actually allocated on a file system. In other
words, it doesn't create a sparse file.

If you want to do mmap IO, you generally want to preallocate disk
blocks if you can, because if you mmap a sparse file for writing and
the disk becomes full, your process will be killed by SIGBUS. There's
no way to gracefully handle that error condition. If you use fallocate(2)
or equivalent, you can handle that error before calling mmap.

My plan is to use this function in FileOutputBuffer so that we do mmap
IO only when we can preallocate disk blocks. That prevents LLVM tools
from crashing with a mysterious bus error when a disk is almost full.

Event Timeline

ruiu created this revision.Oct 31 2017, 11:09 AM
ruiu updated this revision to Diff 121021.Oct 31 2017, 11:12 AM
  • Fix typo
mgorny added inline comments.Oct 31 2017, 4:15 PM
llvm/lib/Support/Unix/Path.inc
453

Any reason not to use posix_fallocate() which is more portable by definition?

ruiu added inline comments.Oct 31 2017, 4:21 PM
llvm/lib/Support/Unix/Path.inc
453

posix_fallocate does not fail even on filesystems that do not support block preallocation. What it does (at least in glibc) when the underlying filesystem doesn't have the feature is to write a zero byte for each disk block to force the filesystem to actually allocate disk blocks. It is as you can imagine pretty slow.

mgorny added inline comments.Nov 1 2017, 12:55 AM
llvm/lib/Support/Unix/Path.inc
453

Ok then.

kettenis added inline comments.
llvm/lib/Support/Unix/Path.inc
453

Can I ask you to reconsider this? fallocate(2) is Linux-specific and other systems are likely to only implement the standardized posix_fallocate(2). And while the posix_fallocate(2) implementation might be somewhat suboptimal on filesystems that don't implement fallocate(2), this should only affect the small number of users that use lld on non-standard filesystems.

When I said that we might consider implementing fallocate(2) on OpenBSD, I really meant to say that we will consider implementing posix_fallocate(2).

ruiu added inline comments.Nov 2 2017, 11:32 AM
llvm/lib/Support/Unix/Path.inc
453

I don't know if we really want to hide fallocate and posix_fallocate because the performance characteristics of the two functions are so different when they are used on a file system that doesn't support block preallocation. I believe posix_fallocate shouldn't have completely abstracted away that difference.

That said I understand your concern too. This code might be too specific to Linux, and it just does nothing for other Unix systems. That's bad.

How about this: we can make this function to call posix_fallocate and make FileOutputBuffer to use this function only when the disk is almost full. Then in most cases we don't need to call posix_fallocate at all on any system, and it can still catch disk full errors in practice.

mgorny added inline comments.Nov 2 2017, 12:14 PM
llvm/lib/Support/Unix/Path.inc
453

I'd dare say relying on 'free space' to be meaningful and stable is a very bad idea.

I'm going to hate myself for suggesting this but how about using fallocate() specifically on Linux, and posix_fallocate() otherwise?

ruiu added inline comments.Nov 2 2017, 12:30 PM
llvm/lib/Support/Unix/Path.inc
453

Why do you think that is so bad? In the worse case scenario, we can't get a correct value of disk free space and lld is killed by sigbus due to a disk full error. But that's the same as the current situation. (I'm not suggesting we return a disk full error if a disk looks like almost full. What I'm suggesting is to call posix_fallocate only when a disk looks almost full.)

I'm going to hate myself for suggesting this but how about using fallocate() specifically on Linux, and posix_fallocate() otherwise?

I'm reluctant to do that because it feels like we are treating non-Linux Unix systems as second class citizens -- we know that posix_fallocate is slow on many Unixes, so calling that function makes lld slower on many non-Linux systems.

mgorny added inline comments.Nov 2 2017, 1:37 PM
llvm/lib/Support/Unix/Path.inc
453

But AFAIU avoiding SIGBUS is the purpose of this whole effort — and with modern filesystems (I'm thinking btrfs) 'free space' became something very inaccurate to rely on. I think it's better to have a safe behavior that avoids SIGBUS altogether.

I'm reluctant to do that because it feels like we are treating non-Linux Unix systems as second class citizens -- we know that posix_fallocate is slow on many Unixes, so calling that function makes lld slower on many non-Linux systems.

Then just make it not use mmap when it can't ensure a safe & fast allocation?

ruiu added inline comments.Nov 2 2017, 1:46 PM
llvm/lib/Support/Unix/Path.inc
453

But AFAIU avoiding SIGBUS is the purpose of this whole effort — and with modern filesystems (I'm thinking btrfs) 'free space' became something very inaccurate to rely on. I think it's better to have a safe behavior that avoids SIGBUS altogether.

How much inaccurate thy are? If you are using a filesystem nearly full, that is slow anyway, so I'm not worry too much about calling posix_fallocate conservatively. Say, we can call posix_fallocate() when disk look like 90% full. is this still unacceptably inaccurate?

Then just make it not use mmap when it can't ensure a safe & fast allocation?

That's slower than mmap. Basically, I want to use mmap, but the issue of the sparse file & mmap write came up, so I'm trying to find a way to detect an error without sacrificing performance too much.

mgorny added inline comments.Nov 2 2017, 1:53 PM
llvm/lib/Support/Unix/Path.inc
453

If you're talking about the data from statvfs(), then you shouldn't trust its value at all. The data straight from btrfs tools might be better but with all the high magic involved in it, it can fail to fail to write with any apparent free space.

Then, there's the matter of quotas.

emaste added a subscriber: emaste.Nov 4 2017, 6:25 PM
emaste added inline comments.
llvm/lib/Support/Unix/Path.inc
453

posix_fallocate does not fail even on filesystems that do not support block preallocation.

It can, in 1003.1-2008: [EINVAL] The len argument is less than zero, or the offset argument is less than zero, or the underlying file system does not support this operation.

ruiu updated this revision to Diff 122167.Nov 8 2017, 3:25 PM
  • Rename resize_file allocate_file
  • Make FileOutputBuffer to use allocate_file instead of resize_file
ruiu updated this revision to Diff 122173.Nov 8 2017, 3:46 PM
  • Fix tests
bcain added a subscriber: bcain.Nov 8 2017, 7:45 PM

I think it's a mistake to have the behavior be contingent on available disk space for the destination partition. Has that proposal been ruled out now?

llvm/lib/Support/Unix/Path.inc
443

Does this code compile with __APPLE__? It looks like this typo means it won't compile. s/fd/FD/

krytarowski added inline comments.
llvm/lib/Support/Unix/Path.inc
436

NetBSD needs ftruncate(2) as a fallback for posix_fallocate(2).

In the default setup posix_fallocate() returns EOPNOTSUPP.

ruiu added inline comments.Nov 12 2017, 11:56 PM
llvm/lib/Support/Unix/Path.inc
436

The current code use ftruncate if the function is available. So it doesn't use it as a fallback, but it uses only fallocate. It seems like a OK behavior to me on NetBSD if posix_fallocate always returns EOPNOTSUPP, but can you confirm?

krytarowski added inline comments.Nov 13 2017, 1:48 AM
llvm/lib/Support/Unix/Path.inc
436

By "the current code" I understand the pristine one. It calls posix_fallocate(2) and if it fails with EOPNOTSUPP, it moves on to the ftruncate(2) fallback.

As far as I can tell, this fallocate is a missing feature as of today in the FFSv2 file-system, the default one on NetBSD.

ruiu added inline comments.Nov 13 2017, 1:54 AM
llvm/lib/Support/Unix/Path.inc
436

Ah, sorry, I misread ftruncate fallocate.

I intentionally do not fallback to ftruncate in this function. This function's contract is to allocate actual disk blocks to a given file or return with a failure code. On the caller side, if this function returns an error, we use an in-memory buffer instead of an on-disk temporary file.

joerg added a subscriber: joerg.Nov 13 2017, 3:35 PM

I really dislike this direction. fallocate can double the amount of disk IO and increase cache trashing, especially when linking large programs with debug information. Keeping more things in memory doesn't sound like an actual improvement either. If the goal is really only to improve the diagnostics in tools, I think a better idea would be to figure out a good way to handle this from a SIGBUS handler based on the passed in siginfo_t.

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 9:07 AM