This is an archive of the discontinued LLVM Phabricator instance.

Support: mapped_file_region: Pass MAP_NORESERVE to mmap
ClosedPublic

Authored by JosephTremoulet on Feb 12 2021, 12:18 PM.

Details

Summary

This allows mapping larger files, delaying OOM failures until too many
pages of them are accessed. This is makes the behavior of the
mapped_file_region in this regard consistent between its "Unix" and
"Windows" implementations.

Guard the code witih #if defined(MAP_NORESERVE), consistent with other
uses of MAP_NORESERVE in llvm-project, because some FreeBSD versions do
not provide this flag.

Diff Detail

Event Timeline

JosephTremoulet requested review of this revision.Feb 12 2021, 12:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2021, 12:18 PM

What led me here was failure in LLDB to open large core dumps because the mmap call was failing with the default memory overcommit policy. It's easy to override the policy as a workaround, but it would be nice not to have to. I considered making this a configurable attribute of mapped_file_region (as in a constructor parameter specifying whether OOM failure should be prompt or delayed), which could then be plumbed through so specifically opening core files in LLDB requests it, but I figured this should be consistent across the "Unix" and "Windows" implementations, and AFAICT there's not a good way to have a Linux-style eager-commit/failure on the Windows side, and since delaying the failure is the Windows behavior already, this change does make things consistent. So this is what I ended up with.

I also would have liked to include an LLDB regression test that opens a very large core file, but neither option of committing or dynamically generating a very large input file for that test seemed like a great idea; I'm happy for suggestions on the testing front.

MAP_NORESERVE is dead on NetBSD as this is the default mode always.

MAP_NORESERVE is dead on NetBSD as this is the default mode always.

Good to know, thanks. So that's all the more reason to consistently use that mode everywhere, I think. Or am I misunderstanding your point?

emaste added a comment.Mar 3 2021, 8:38 AM

So that's all the more reason to consistently use that mode everywhere, I think.

I think that's a correct interpretation.

MAP_NORESERVE does not exist on any supported FreeBSD version, so no objection from me.

From my brief reading of the man pages on linux I don't see any issue with adding this flag.

I would mark as accepted but I am not the code owner for llvm/lib/Support/Unix.

I would mark as accepted but I am not the code owner for llvm/lib/Support/Unix.

Understood. According to llvm/CODE_OWNERS.txt, Chandler is code owner for Support.

According to author history for Support/Unix:

~/source/llvm-project/llvm/lib/Support/Unix$ git log --format=tformat:%an -- . |sort |uniq -c |sort -g |tail -5
     19 Benjamin Kramer
     30 Chandler Carruth
     32 Zachary Turner
     42 Michael J. Spencer
    103 Rafael Espindola

According to reviewer history for the same:

~/source/llvm-project/llvm/lib/Support/Unix$ git log -- . |grep 'Reviewed By' |sed -e 's/    Reviewed By: //' |sed -e 's/,//g' |sed -e 's/ /\n/' |sort |uniq -c |sort -g |tail -5
      3 hubert.reinterpretcast
      3 xingxue
      4 lhames
      4 sammccall
      5 krytarowski

So, @chandlerc or @Bigcheese or @zturner or @krytarowski or @sammccall, any of you care to weigh in?

Thanks.

Sorry about the lack of response here.

Reading the docs, my understanding is MAP_NORESERVE on file-based mappings:

  • is only relevant for writable mappings, read-only mappings should not fail on account of swap space in either case
  • allows the mapping to succeed if there's not enough swap space to write it all, but if you actually write to too much of it, it will segfault

Is that about right?

If so, the risk is we're turning a handleable error_code into a crash. This might be the right tradeoff for a load coredump in LLDB (where we're very likely to open a huge file RW and then modify very little of it), but seems like a regression for other callers like FileOutputBuffer::create, in turn called by the LLD elf linker with a large filesize.
So this seems like it might not be safe in general, but if it were an option LLD might specify it.

Am I missing something?

(Re: Windows, my understanding matches yours, that we've got non-reserving behavior right now. But that doesn't tell us whether we consistency would be better achieved by adding SEC_COMMIT to the windows version. Confusingly, windows seems to use "commit" to refer to what unix calls "reserve", and SEC_RESERVE is something else).

Thanks for taking a look!

Reading the docs, my understanding is MAP_NORESERVE on file-based mappings:

  • is only relevant for writable mappings, read-only mappings should not fail on account of swap space in either case
  • allows the mapping to succeed if there's not enough swap space to write it all, but if you actually write to too much of it, it will segfault

Is that about right?

I think that's close enough (and I'm no expert). The language I see on the man page is this:

MAP_NORESERVE
       Do not reserve swap space for this mapping.  When swap space is reserved, one has the guarantee that it is possible to modify the mapping.  When swap space is not  reserved  one  might
       get  SIGSEGV  upon  a  write  if  no physical memory is available.  See also the discussion of the file /proc/sys/vm/overcommit_memory in proc(5).  In kernels before 2.6, this flag had
       effect only for private writable mappings.

So the only error it mentions as a possibility is indeed "upon a write", but then at the end it says "In kernels before 2.6, this flag had effect only for private writable mappings". I don't know what changed in 2.6, if it now has effect for shared mappings, some esoteric read-only case, or what. But doing a simple experiment on the Linux machine I have handy (Ubuntu 18.04), yes I'm seeing success for a readonly mapping regardless of MAP_NORESERVE, and success or failure depending on the flag for a read/write mapping.

If so, the risk is we're turning a handleable error_code into a crash. This might be the right tradeoff for a load coredump in LLDB (where we're very likely to open a huge file RW and then modify very little of it), but seems like a regression for other callers like FileOutputBuffer::create, in turn called by the LLD elf linker with a large filesize.
So this seems like it might not be safe in general, but if it were an option LLD might specify it.

Am I missing something?

No, I don't think you're missing anything. But I'll note that the "handleable error_code" case is platform-specific (i.e. regardless of MAP_NORESERVE you'll get the later segfault behavior on NetBSD, FreeBSD, and Windows, IIUC) and so perhaps a bit of false comfort. Also that without MAP_NORESERVE, whether the error code is returned here depends on how the user's system is configured wrt memory_overcommit. And lastly that arguably the segfault/AV case could be handled somewhat gracefully, albeit with a lot of effort, by LLD or whatever, using a signal handler / SEH (though of course the prompt failure would be way better for any callers wanting to handle this gracefully).

(Re: Windows, my understanding matches yours, that we've got non-reserving behavior right now. But that doesn't tell us whether we consistency would be better achieved by adding SEC_COMMIT to the windows version. Confusingly, windows seems to use "commit" to refer to what unix calls "reserve", and SEC_RESERVE is something else).

Sadly, no, SEC_COMMIT doesn't seem to be useful here. The docs[1] have this to say:

If the file mapping object is backed by the operating system paging file (the hfile parameter is INVALID_HANDLE_VALUE), specifies that ...
This attribute has no effect for file mapping objects that are backed by executable image files or data files (the hfile parameter is a handle to a file).

And my local experiments indeed seem to bear that out.

So, while I certainly agree that a prompt failure is easier to handle gracefully, ISTM that making that work on NetBSD or FreeBSD or Windows would require writing our own check of the file size against available memory. And I don't know if we want to do that or if any of the code using this utility is actively expecting that guarantee (which, again, is only a "guarantee" if the system is configured thus).

Thoughts?

1 - https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createfilemappinga

[Disclaimer: I also don't consider myself to be the owner of this code.]

I was thinking whether lldb should even be mapping the entirety of the core file if it knows that a typical session will only access a small fraction of it. I was about to propose some sort of lazy loading scheme for the memory contents, but then I realized that is kind of exactly what we get with mmap.

Are you sure that the lld case is relevant here? If lld using this function to write the output file, then I assume it's passing MAP_SHARED to mmap(2) (i.e., Mode==readwrite to mapped_file_region), as it wants (unlike lldb) the changes to be visible after it is done. I haven't checked the manpage or the linux kernel behavior, but intuitively, I would expect that a shared mapping should not trigger an OOM, because at any point the kernel can decide to flush the dirty pages to disk (that's actually one technique for creating a per-process swap file).

I haven't checked the manpage or the linux kernel behavior, but intuitively, I would expect that a shared mapping should not trigger an OOM, because at any point the kernel can decide to flush the dirty pages to disk (that's actually one technique for creating a per-process swap file).

I tried it out empirically and yes, I'm seeing success for a shared read/write mapping of a very large file without needing MAP_NORESERVE. But the line in the man page saying "In kernels before 2.6, this flag had effect only for private writable mappings" was bugging me, so I went looking through kernel and manpage history. What I found is that shared read/write mappings using "huge TLB pages" will do the reservation check (and this was added in 2.6) and thus can fail without MAP_NORESERVE. And the docs for the huge TLB page feature [1] explain "Huge pages cannot be swapped out under memory pressure." Since we're never passing MAP_HUGETLB in the mapped_file_region code, I believe this case is moot and it's only private read/write mappings that can have the prompt failure.

Perhaps @ruiu has thoughts about the linker's perspective? (though I realize LLD was brought up as just one example of a caller that might care about this flag)

Thanks.

1 - https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Ping. I'd like to figure out how to make forward progress with this. A couple points:

1 - Several commenters have mentioned they don't own this code and thus aren't the right people to officially approve/reject. @sammccall, do you feel the same way, or are you comfortable making the call here?

2 - Which approach to take. I see broadly four options:

option A - Never do the up-front reserve/commit check. The failure mode in the OOM case isn't great, but neither will we artificially fail when we don't actually need that much physical memory (as we currently do with huge core files). And it's at least consistent across platforms. This is what the code in the patch currently does.

option B - Always do whatever is the host platform's default up-front reserve/commit check. This is what the code in main currently does. But on some Linux systems it requires users to adjust system settings to avoid those artificial failures, which is why I'm proposing a change.

option C - Parameterize mapped_file_region (and containing/calling utilities) to give callers the option of either doing no up-front reserve/commit check or doing whatever is the host platform's default up-front reserve/commit check. My hesitation with this approach is that it adds complexity to the API for what seems like little benefit (and some false comfort at that, given the platform differences), and also that it has a bit of a "leaky abstraction code smell" in exposing a flag directly tied to a setting on certain host platforms in a utility that's supposed to provide a platform-agnostic interface.

option D - Parameterize mapped_file_region (and containing/calling utilities) to give callers the option of either doing no up-front reserve/commit check or doing some custom hand-written up-front reserve/commit check. My hesitation with this approach is that it would be difficult to design a custom check that would work well without having artificial failures, and doubly difficult to do so when we don't have the actual use cases for that check in hand. But if we decide that's The Right Thing long-term, then at least I'll have my answer.

Thoughts?

Thanks

I am fine with always adding MAP_NORESERVE unless anyone can think of a reason not to add it. Or we can modify this patch to add the MAP_NORESERVE only when PROT_WRITE isn't specified if that is worrying anyone.

clayborg accepted this revision.Apr 7 2021, 2:22 PM

Since no one else is responding, lets try this out and see how things go.

This revision is now accepted and ready to land.Apr 7 2021, 2:22 PM

Since no one else is responding, lets try this out and see how things go.

SGTM, thanks.

This revision was landed with ongoing or failed builds.Apr 8 2021, 6:07 AM
This revision was automatically updated to reflect the committed changes.