This is an archive of the discontinued LLVM Phabricator instance.

Try reading a smaller chunk when (p)read fails
ClosedPublic

Authored by JDevlieghere on Oct 26 2020, 5:28 PM.

Details

Summary

On macOS, the read and pread calls return EINVAL when the number of bytes exceeds INT32_MAX: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/bsd/kern/sys_generic.c#L355

I found out when passing a 2.5G Mach-O to llvm-dwarfdump. Normally we would mmap the file, but this particular file happens to be the exact multiple of the page size (2628505600 bytes) in which case llvm prefers not to mmap (as to not waste a whole page for the terminating NULL byte).

rdar://68751407

Diff Detail

Event Timeline

JDevlieghere created this revision.Oct 26 2020, 5:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2020, 5:28 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
JDevlieghere requested review of this revision.Oct 26 2020, 5:28 PM
JDevlieghere edited the summary of this revision. (Show Details)Oct 26 2020, 5:34 PM

Seems like the issue might be specific to macOS (with a fixed max value of INT32_MAX).

thegameg accepted this revision.Oct 26 2020, 8:06 PM
thegameg added inline comments.
llvm/lib/Support/Unix/Path.inc
1079

= std::min(Buf.size(), INT32_MAX) ?

This revision is now accepted and ready to land.Oct 26 2020, 8:06 PM
thegameg added inline comments.Oct 26 2020, 8:13 PM
llvm/lib/Support/Unix/Path.inc
1068

Actually, is this also affected?

JDevlieghere edited the summary of this revision. (Show Details)Oct 26 2020, 8:30 PM

Big thanks to Francis who discovered the max value is INT32_MAX. I had only tried UIN32_MAX.

This revision was automatically updated to reflect the committed changes.

There's also a similar piece of code in raw_ostream.cpp (raw_fd_ostream::write_impl), only it hard-limits the size to 2G everywhere, and additionally to 1G on linux. It would probably be a good idea to introduce a helper function for this functionality.

this particular file happens to be the exact multiple of the page size (2628505600 bytes) in which case llvm prefers not to mmap (as to not waste a whole page for the terminating NULL byte).

Do we really need the terminating nul byte? It should be possible to avoid this behavior by passing RequiresNullTerminator=false to the relevant MemoryBuffer method...

rnk added a comment.Oct 27 2020, 10:55 AM

... in which case llvm prefers not to mmap (as to not waste a whole page for the terminating NULL byte)

We should really re-evaluate this behavior at some point. It is quite surprising.

llvm/lib/Support/Unix/Path.inc
1069

You know, reading maximum 2GB size chunks seems like reasonable behavior for any OS. We do it for Windows too:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Windows/Path.inc#L1235

IMO we should be brave, remove the ifdef, and do 2GB reads on all OSs.

In D90201#2357105, @rnk wrote:

... in which case llvm prefers not to mmap (as to not waste a whole page for the terminating NULL byte)

We should really re-evaluate this behavior at some point. It is quite surprising.

I think the problem is that if the filesize is exactly a multiple of the page size, there isn't any simple way to allocate a null at the end of the file. Certain uses (for example, clang's lexer) require null termination. Most uses probably shouldn't be requesting null termination.

joerg added a subscriber: joerg.Oct 27 2020, 11:51 AM

You can do an anon-mapping with size+pagesize in that case and map over it with MAP_FIXED.