This is an archive of the discontinued LLVM Phabricator instance.

[dfsan] Add a DFSan allocator
ClosedPublic

Authored by stephan.yichao.zhao on Apr 30 2021, 2:30 PM.

Diff Detail

Event Timeline

stephan.yichao.zhao requested review of this revision.Apr 30 2021, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 2:30 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
stephan.yichao.zhao edited the summary of this revision. (Show Details)Apr 30 2021, 2:32 PM
stephan.yichao.zhao retitled this revision from [dfsan] Add a DFSan allocator to [dfsan] Add a DFSan allocator.

added missing files

fixed test/build failures

fixed failed test cases

fixed failed test cases

morehouse added inline comments.May 3 2021, 1:35 PM
compiler-rt/lib/dfsan/dfsan.cpp
24

We still need dfsan_flags.h.

544

The start + size implementation seems cleaner to me. Can we use that for WriteShadowWithSize and make WriteShadowInRange call that instead?

626

This might be redundant since on Linux releasing the memory zeroes it out on the next load.

637

Should we just call ReleaseOrigins?

654–655

What's the reason to change this from ReleaseMemoryPagesToOS?

696–697

Does this do anything since we already called MmapFixedSuperNoReserve?

705

Can we simplify by moving all this logic into WriteShadowWithSize? It's confusing to have multiple layers of functions to set the shadow, with some repeated checks in each layer.

951

What's going on here? intercept_tld_get_addr appears unused, so why do we need to override it?

1066

dfsan_init is now unused. Why do we need this?

compiler-rt/lib/dfsan/dfsan.h
23

This is unused.

65

Why int instead of bool?

66

Since we currently only call dfsan_init from preinit_array, which happens once in a single thread, do we actually need to guard against multiple initialization?

117

This looks unused.

compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

Maybe we don't need this. We release (zero) shadow on unmap, so it should still be zero if we mmap it again.

43
  1. This appears unused except for kSpaceBeg below.
  2. The name sounds like the max size of the heap, rather than the start address of it.
stephan.yichao.zhao marked 4 inline comments as done.May 3 2021, 4:28 PM
stephan.yichao.zhao added inline comments.
compiler-rt/lib/dfsan/dfsan.cpp
24

dfsan_flags.h is included in dfsan.h

544

If we use "WriteShadowWithSize(dfsan_label label, uptr beg_shadow_addr, uptr size)" with the base function,
here the size is preferred to be the number of labels. If not, we may have to assert size % sizeof(dfsan_label) == 0.

But if size is # fo labels, we would have

WriteShadowInRange(dfsan_label label, uptr beg_shadow_addr,
                               uptr end_shadow_addr)  {
  assert(beg_shadow_addr <= end_shadow_addr);
  assert((end_shadow_addr - beg_shadow_addr) % 2 == 0);
  WriteShadowWithSize(label, (end_shadow_addr - beg_shadow_addr) / 2);
}

I feel after the fast16label -> fast8label change, it is possible to do this because then #bytes == #labels.

626

This mainly follows how MSan does when allocator does unmmap:

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/msan/msan_allocator.cpp#L33

dfsan_set_label only does mmap, at unmmap event, we call ReleaseMemoryPagesToOS explicitly.

637

ReleaseOrigins does not call ReleaseMemoryPagesToOS.

654–655

This mainly follows MSan's approach.

Although MSan does not have a similar function to release origin, it has SetShadow

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/msan/msan_poisoning.cpp#L216

to zero out Shadow, it calls MmapFixedSuperNoReserve w/o ReleaseMemoryPagesToOS,

To be consistent with this, ReleaseOrigins replaces ReleaseMemoryPagesToOS by MmapFixedSuperNoReserve.

696–697
705

WriteShadowInRange works for both 0 and non-zero labels.
Refactored this code into ReleaseOrClearShadows.

951

This will be used by D101204 when interceptors are defined.

1066

This will be used by D101204.

compiler-rt/lib/dfsan/dfsan.h
23

This would be used by those new/delete injections dfsan_new_delete.cpp in D101204.

66

I found dfsan_init_is_running is mainly to ensure interceptors use real calls w/o wrappers before dfsan_init is done.
See dfsan_interceptors.cpp from D101204.
Otherwise mmap called by dfsan_init can be intercepted too.

117

This will be called by dfsan_interceptors.cpp in D101204.
It makes sure that when interceptors are called, dfsan_init (shadow/origin allocation) must be done because wrappers code run.

I also feel like dfsan_init/dfsan_is_running/dfsan_inited are not straightforward. But from MSan's code blame, they seem useful to some corner cases.
For example, calloc, realloc and malloc can be called by DL before preinit_array runs, etc.

compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

Thank you. Removed.

43

inlined the constant.

stephan.yichao.zhao marked 3 inline comments as done.

update

morehouse added inline comments.May 4 2021, 11:17 AM
compiler-rt/lib/dfsan/dfsan.cpp
24

Yes. It's just a nit, but I usually try to explicitly include the things used in each file, so that we can change header files without touching all the places where they're included.

547–554
637

Shouldn't mmap(MAP_NORESERVE) have a similar effect on resident memory to madvise(MADV_DONTNEED)?

Besides, we already have ReleaseOrClearShadow and ReleaseOrigins. Why do we need to reimplement that functionality in this function?

654–655

Don't they have a similar effect? It's confusing to use two different ways of releasing memory without at least a comment explaining why.

compiler-rt/lib/dfsan/dfsan.h
23

In that case, maybe we should define it in the file it will be used in, instead of here.

stephan.yichao.zhao marked 5 inline comments as done.May 4 2021, 2:07 PM
stephan.yichao.zhao added inline comments.
compiler-rt/lib/dfsan/dfsan.cpp
24

moved #include "dfsan/dfsan_flags.h" to each cc file.

547–554

Thank you.

637

removed madvise(MADV_DONTNEED), and replaced dfsan_release_meta_memory by dfsan_set_label(0).

654–655

removed those madvise calls.

compiler-rt/lib/dfsan/dfsan.h
23

moved to dfsan_new_delete.cpp

stephan.yichao.zhao marked 5 inline comments as done.

update

morehouse accepted this revision.May 4 2021, 3:46 PM
morehouse added inline comments.
compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

Looks like you added it back. I think we only need to set 0 label on unmap.

This revision is now accepted and ready to land.May 4 2021, 3:46 PM
compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

I found those mmap interceptors in dfsan_intercerptors.cp also do dfsan_set_label(0).

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/dfsan/dfsan_interceptors.cpp#L41

For the same reason they are not needed either.
In the next change, I will remove all mmap related dfsan_set_label(0) together.

This revision was landed with ongoing or failed builds.May 4 2021, 5:52 PM
This revision was automatically updated to reflect the committed changes.
compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

Thought about the problem a bit more.

I think our discussion is based on an assumption that all unmmap will be precisely intercepted.
This may not be true since if some code calls unmmap by system call number directly like what this sanitizer allocator does, there is no way to intercept that kind of unmmap.
So in general, we may not be able to remove intercepting those mmap and mmap64.

But for this OnMap, it should be safe because its corresponding OnUmmap is injected by our code.

So in the following change, I will only remove dfsan_set_label(0) for OnMap.

morehouse added inline comments.May 5 2021, 8:37 AM
compiler-rt/lib/dfsan/dfsan_allocator.cpp
32

Thought about the problem a bit more.

I think our discussion is based on an assumption that all unmmap will be precisely intercepted.
This may not be true since if some code calls unmmap by system call number directly like what this sanitizer allocator does, there is no way to intercept that kind of unmmap.
So in general, we may not be able to remove intercepting those mmap and mmap64.

Ah yes, good point.

But for this OnMap, it should be safe because its corresponding OnUmmap is injected by our code.

Well, I'm not 100% sure it's safe. Something else could do syscall(SYS_mmap), then taint the memory, then syscall_(SYS_munmap). Then we could end up mapping that same region for our heap and calling this OnMap callback.

So in the following change, I will only remove dfsan_set_label(0) for OnMap.