This is an archive of the discontinued LLVM Phabricator instance.

Replace shadow space zero-out by madvise at mmap
ClosedPublic

Authored by stephan.yichao.zhao on Oct 2 2020, 2:03 PM.

Details

Summary

After D88686, munmap uses MADV_DONTNEED to ensure zero-out before the
next access. Because the entire shadow space is created by MAP_PRIVATE
and MAP_ANONYMOUS, the first access is also on zero-filled values.

So it is fine to not zero-out data, but use madvise(MADV_DONTNEED) at mmap. This reduces runtime
overhead.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2020, 2:03 PM
Herald added subscribers: Restricted Project, jfb. · View Herald Transcript
stephan.yichao.zhao requested review of this revision.Oct 2 2020, 2:03 PM

We can't remove the mmap interceptors completely, since munmap doesn't always precede mmap.

For example, it is sometimes useful to mmap over an existing mapping to zero it out quickly and/or adjust the flags.

But we can replace dfsan_set_label with madvise(DONT_NEED) as an optimization.

addressed comments

morehouse added inline comments.Oct 5 2020, 8:34 AM
compiler-rt/test/dfsan/mmap_release_shadow.c
1 ↗(On Diff #295942)

Can we combine this with the munmap test since they are essentially the same? Probably a single ifdef or if (strcmp(argv[1], ...) == 0) would be sufficient to separate the differences.

34 ↗(On Diff #295942)

Can we do a second mmap to make sure memory usage goes back down to before?

41 ↗(On Diff #295942)

Why would the second memset increase memory usage further? Essentially these asserts are expecting a 4x memory increase factor, when I would expect at most 3x...

eugenis added a subscriber: eugenis.Oct 5 2020, 9:23 AM

I would move this under dfsan_set_label for consistency with asan & msan, and also to optimize large memset calls outside of the mmap interceptor (I assume they result in large dfsan_set_label calls?).

compiler-rt/lib/dfsan/dfsan_interceptors.cpp
39

"Release" - typo

stephan.yichao.zhao marked 3 inline comments as done.

addressed comments

compiler-rt/test/dfsan/mmap_release_shadow.c
1 ↗(On Diff #295942)

Yes. These two cases can be merged. I found it is fine not to distinguish them, because the tests need to interleave mmap and munmap for either test.

34 ↗(On Diff #295942)

After merging them into one file, we have munmap and another mmap in the unified test case.

https://github.com/llvm/llvm-project/commit/88c9162c9d47ef43a505bc5301dc626f3cd4f437 fixed the original test case about measuring RSS after munmap.
On LLVM buildbot, after munmap, RSS does not decrease, which is unlike my local testing.
So we only measure after another mmap to check RSS after the second mmap does not increase memory too much.

41 ↗(On Diff #295942)

The first memset sets a value without labels. Its memory cost is 1x.
The second memset sets a value with labels. So it increases another 2x.
So the total overhead is 3x. This tests that if after mmap no labeled values are stored into the mmap addresses, it does not need additional shadow space cost.

stephan.yichao.zhao marked an inline comment as done.Oct 5 2020, 10:00 AM
Harbormaster completed remote builds in B74017: Diff 296213.
Harbormaster completed remote builds in B74016: Diff 296212.

I would move this under dfsan_set_label for consistency with asan & msan, and also to optimize large memset calls outside of the mmap interceptor (I assume they result in large dfsan_set_label calls?).

Did you mean calling ReleaseShadowMemoryPagesToOS here (https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/dfsan/dfsan.cpp#L296)?

void __dfsan_set_label(dfsan_label label, void *addr, uptr size) {
  if (label == 0) {
       ReleaseShadowMemoryPagesToOS(addr, size);
       return;
  }

  for (dfsan_label *labelp = shadow_for(addr); size != 0; --size, ++labelp) {
    // Don't write the label if it is already the value we need it to be.
    // In a program where most addresses are not labeled, it is common that
    // a page of shadow memory is entirely zeroed.  The Linux copy-on-write
    // implementation will share all of the zeroed pages, making a copy of a
    // page when any value is written.  The un-sharing will happen even if
    // the value written does not change the value in memory.  Avoiding the
    // write when both |label| and |*labelp| are zero dramatically reduces
    // the amount of real memory used by large programs.
    if (label == *labelp)
      continue;

    *labelp = label;
  }
}

This may help memset and strncpy too if their labels are 0.

But if users call dfsan_set_label on small bytes. Does the madvise system call introduce additional overhead?

The alternative is calling ReleaseShadowMemoryPagesToOS explicitly at memset and strncpy if madvise has overhead.

Sure, madvise has overhead. Check SetShadow in MSan and FastPoisonShadow in ASan.

I'm confused - I don't see allocator replacement in dfsan. How does it handle the situation where deallocated memory that contains some labelled data is reused for a new allocation - the labels must be cleared somehow, right?

Sure, madvise has overhead. Check SetShadow in MSan and FastPoisonShadow in ASan.

I'm confused - I don't see allocator replacement in dfsan. How does it handle the situation where deallocated memory that contains some labelled data is reused for a new allocation - the labels must be cleared somehow, right?

Labels aren't cleared until new data is written there. In some sense this is preferable since it can detect taint propagation through "uninitialized" heap memory.

Sure, madvise has overhead. Check SetShadow in MSan and FastPoisonShadow in ASan.

It seems that FastPoisonShadow (that calls madvise) is called by asan when ever a memory needs to release. Maybe this means the overhead is fine.
I have planned to do some evaluation on dfsan soon, during the evaluation, I will be testing this. If it works, will be making a separate patch.

I'm confused - I don't see allocator replacement in dfsan. How does it handle the situation where deallocated memory that contains some labelled data is reused for a new allocation - the labels must be cleared somehow, right?

I am new to dfsan, not sure about this.

I agree with @eugenis that the madvise call is more beneficial as a special case for large sizes in dfsan_set_label.

dfsan_set_label is only called from wrappers (which already have lots of branches and shadow propagation), so the overhead from the branch is likely insignificant.

That said, I'm fine with making that change in follow-up patch if you prefer.

compiler-rt/lib/dfsan/dfsan_interceptors.cpp
23–35

Nit: Since you added an unnamed namespace, let's move this variable into there and remove the static.

compiler-rt/test/dfsan/release_shadow_space.c
26

Why is this additional memset needed?

42

Since we now do madvise instead of writing zeros, I expect memory usage to decrease the same as with munmap.

E.g., memory usage decrease here:

void *p = mmap(NULL, size, ...);
memset(p, 0xff, size);
size_t before = get_rss_kb();
munmap(p, size);
size_t after = get_rss_kb();

should be the same as the decrease with:

void *p = mmap(NULL, size, ...);
memset(p, 0xff, size);
size_t before = get_rss_kb();
mmap(p, size, PROT_READ | PROT_WRITE, MAP_FIXED ...)
size_t after = get_rss_kb();

Please test this.

stephan.yichao.zhao marked an inline comment as done.

addressed comments

stephan.yichao.zhao marked an inline comment as done.Oct 5 2020, 8:31 PM

I agree with @eugenis that the madvise call is more beneficial as a special case for large sizes in dfsan_set_label.

dfsan_set_label is only called from wrappers (which already have lots of branches and shadow propagation), so the overhead from the branch is likely insignificant.

That said, I'm fine with making that change in follow-up patch if you prefer.

Will be making another patch after this.

compiler-rt/test/dfsan/release_shadow_space.c
26

without this memset RSS shows the same amount as before mmap. It seems that even at the anonymous mmap, OS still does lazy allocation until we use it.

stephan.yichao.zhao marked an inline comment as done.Oct 5 2020, 8:32 PM
morehouse added inline comments.Oct 6 2020, 9:58 AM
compiler-rt/test/dfsan/release_shadow_space.c
43

after_munmap appears unused.

50

repeating the dfsan_set_label multiple times is unnecessary.

84

I didn't realize this before, but this is 50MB slack. i.e. 50% of map_size.

Is there a chance that your method of measuring RSS isn't very precise or is delayed? IIRC, using /proc/self/smaps is much more precise.

changed from statm to smap

stephan.yichao.zhao marked 3 inline comments as done.Oct 6 2020, 10:43 AM
stephan.yichao.zhao added inline comments.
compiler-rt/test/dfsan/release_shadow_space.c
43

after changing to smaps, this is used by assert.

50

changed to smaps, and removed the additional dfsan_set_label. hopefully LLVM buildbot behaves the same to the local testing environemnt.

84

changed to smaps

morehouse added inline comments.Oct 6 2020, 12:50 PM
compiler-rt/test/dfsan/release_shadow_space.c
56

We should be able to remove this mmap by moving the fixed mmap below in between the first mmap and the munmap.

i.e. mmap -> memset -> mmap(MAP_FIXED) -> memset -> munmap

57

This dfsan_set_label is unnecessary. We already assigned the label to val.

60

s/mumap/munmap

81

delta is unused.

84

Why two different delta methods? Can we use delta for both?

stephan.yichao.zhao marked 8 inline comments as done.

update

compiler-rt/test/dfsan/release_shadow_space.c
57

Thank you. Removed.

81

replaced 5000 by delta.

84

changed them to use delta.

morehouse accepted this revision.Oct 6 2020, 1:23 PM
morehouse added inline comments.
compiler-rt/test/dfsan/release_shadow_space.c
69

Should we be using %zu instead of %td?

This revision is now accepted and ready to land.Oct 6 2020, 1:23 PM

FYI I believe linux updates the rss count only on every 40th-or-so change to the page tables.
We have a test somewhere that runs a single page mmap-munmap in a loop for a while before taking a measurement.

FYI I believe linux updates the rss count only on every 40th-or-so change to the page tables.
We have a test somewhere that runs a single page mmap-munmap in a loop for a while before taking a measurement.

I believe using smaps should avoid this issue. At least I've written a GWP-ASan test in the past that had page precision immediately using smaps.

@stephan.yichao.zhao Please also update the commit description.

stephan.yichao.zhao retitled this revision from Remove shadow space zero-out at mmap to Replace shadow space zero-out by madvise at mmap.Oct 6 2020, 1:35 PM
stephan.yichao.zhao edited the summary of this revision. (Show Details)
stephan.yichao.zhao marked an inline comment as done.Oct 6 2020, 1:39 PM
stephan.yichao.zhao edited the summary of this revision. (Show Details)

update