Page MenuHomePhabricator

[MemProf] Memory profiling runtime support
ClosedPublic

Authored by tejohnson on Sep 3 2020, 5:26 PM.

Details

Summary

See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Follow on companion to the clang/llvm instrumentation support in D85948
and committed earlier.

This patch adds the compiler-rt runtime support for the memory
profiling.

Note that much of this support was cloned from asan (and then greatly
simplified and renamed). For example the interactions with the
sanitizer_common allocators, error handling, interception, etc.

The bulk of the memory profiling specific code can be found in the
MemInfoBlock, MemInfoBlockCache, and related classes defined and used
in memprof_allocator.cpp.

For now, the memory profile is dumped to text (stderr by default, but
honors the sanitizer_common log_path flag). It is dumped in either a
default verbose format, or an optional terse format.

I have kept some of the same error handling as in asan, such as for
general memory allocation issues like OOM, but removed the error
handling relating to asan specific errors.

This patch also adds a set of tests for the core functionality.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vitalybuka added inline comments.Fri, Oct 2, 5:27 PM
compiler-rt/CMakeLists.txt
77–88

these flags were added recently to support some platform with memory constraints do not allow to use default vales.
I would recommend do copy them, as corresponding -mllvm flags unless memprof is actually needs them.

compiler-rt/test/memprof/TestCases/default_options.cpp
2–3

for 1:1 compile:run statements this way more convenient to copy entire testcase command line when debugging
same other tests

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

maybe better to add print_module_map=3 and document something
3 - print module map for debugging
and assume sanitizer may use it when ever it wants.

compiler-rt/test/memprof/TestCases/free_hook_realloc.cpp
4

Looks like this test belong to sanitizer_common

compiler-rt/test/memprof/TestCases/function-sections-are-bad.cpp
2 ↗(On Diff #292874)

Why this test is needed?

compiler-rt/test/memprof/TestCases/memprof_rt_conflict_test.cpp
8 ↗(On Diff #292874)

Is anyone is going to make it work for android?
If not yet, then I'd remove such UNSUPPORTED and XFAILs .

compiler-rt/test/memprof/TestCases/sleep_after_init.c
2 ↗(On Diff #292874)

I never used this sleep_*, they are so specific, not sure why they are even exist. Also no one bother to introduce them into other sanitizers.
I recommend to not copy them

compiler-rt/test/memprof/TestCases/stress_dtls.c
6

2.15 is quite old, I don't think anyone care. I guess it's copied from asan. I recommend to remove this comment to avoid confusion.

vitalybuka added inline comments.Sat, Oct 3, 12:27 AM
compiler-rt/lib/memprof/CMakeLists.txt
87–115

text is misaligned

compiler-rt/lib/memprof/memprof_allocator.cpp
17–35

clang-formated version works for me

89–94

we don't have to use bit fields

321–325

sprintf can be avoided

364

same here

555–556

magic should better be atomic, set when entire Chunk is initialized and after chunk_beg is set
as FinishAndPrint may run in another thread.
I updated asan code, see LargeChunkHeader.

Also ForEachChunk appyes to all, including non allocated and deallocated chunks.
So Deallocate needs to reset magic as well as we don't have guaranty that FinishAndPrint can see chunk_beg for deallocated block.
Probably user_requested_size needs to be atomic as well?

620

Better to call MEMPROF_FREE_HOOK before ptr is released to allocator

680

can you use "uptr chunk_beg = p - kChunkHeaderSize;" as in deallocate ?
to remove one kAllocBegMagic user

823–830

unused?

compiler-rt/lib/memprof/memprof_flags.inc
20–26

I think both these other flags copied from asan were added after asan release. So asan needed to them to avoid breaking existing users.
Existing users is not a problem for memprof so I recommend to remove them.

Also in future it's going to be easier to add them than remove without breaking users. So I advice to review the list and keep as few as possible.

28

same for sleep_*

compiler-rt/lib/memprof/memprof_init_version.h
19–25

I think it's not needed.
Only asan has this and it was not updated as expected.

compiler-rt/lib/memprof/memprof_interceptors.cpp
131

memprof does not need intercept signal related calls?

compiler-rt/lib/memprof/memprof_mapping.h
19–21

not needed

31–41

this looks like some pieces of debugging code

compiler-rt/lib/memprof/memprof_posix.cpp
20–34

number of includes in many files is suspiciously large.

42

it's already in __memprof namespace
maybe prefix can be avoided.

compiler-rt/lib/memprof/memprof_report.cpp
142–203 ↗(On Diff #292874)

Most of these errors are already in sanitizer_common. and all sanitizers call them from there.
Asan probably needs to include some additional info.

Having that memprof has no own errors, why it does not use common report functions?

I guess entire file can be removed.
ReportRssLimitExceeded and ReportOutOfMemory can be avoidede or moved into sanitizer_allocator_report

compiler-rt/lib/memprof/memprof_rtl.cpp
178

this is Darwin specific stuff SANITIZER_SUPPORTS_INIT_FOR_DLOPEN, should not be needed.

184–185

Probably not needed, I don't see similar code in other sanitizers

186

maybe not needed. it's something s390 specific and old

compiler-rt/lib/memprof/memprof_stack.cpp
33–51

most sanitizers does not need this
I suspect that it's not needed here.

compiler-rt/lib/memprof/memprof_stack.h
55

unused here and in asan

compiler-rt/lib/memprof/memprof_stats.cpp
26

I assuming that this call only one per thread and can happen early it can be replaced with internal_memset
and CHECK removed

internal_memset always works
REAL(memset) can be better optimized.

compiler-rt/lib/memprof/memprof_thread.cpp
101

I suspect no signal related code is needed for memprof.
Asan handles some signals which should not be needed there.

111

Asan version is over complicated as it expects call for stack bounds from other threads (LSAN).
I suspect it's not true for memprof.
You can take a look at corresponding code in Msan.

However all these stuff is needed for __sanitizer_start_switch_fiber implementation which is called by some libs only with asan. I see one boost.
Msan didn't need it until recent GO integration.
Unless you are going to update that software it's just a dead code. I advise to remove it and add only when needed.

compiler-rt/lib/memprof/memprof_thread.h
38

struct?

tejohnson marked 28 inline comments as done.Thu, Oct 8, 2:51 PM

Thanks for the detailed comments! I think I have addressed them all, except for a few as noted in the comments below.

compiler-rt/CMakeLists.txt
77–88

At some point we may want to play around with other shadow granularities, but that will require adjusting the mask as well. For now I have removed it, we can add something back if we want to adjust this in the future.

compiler-rt/lib/memprof/memprof_allocator.cpp
89–94

True. I was trying to keep it to the minimal number of bits, but I can leave a comment that notes these can be shrunk to provide space for additional fields if needed.

555–556

Great, I stole your LargeChunkHeader code from asan and used that approach here. =)
I added the appropriate check to FinishAndPrint that we're getting back a non-null MemprofChunk from getMemprofChunk. With that change I'm not sure that we need to make user_requested_size atomic?

680

This is only used by memprof_malloc_usable_size. Looks like it handles being invoked on a pointer that may not be the start of the block. I couldn't find any documentation about how malloc_usable_size should behave if called on a pointer in the middle of an allocation. I tried using the system version and it doesn't error (like a deallocation does), but returns 0 in that case. My guess is behavior is undefined? In any case, not sure if it is worth optimizing for this case. WDYT?

compiler-rt/lib/memprof/memprof_init_version.h
19–25

I actually prefer to keep this to have some ability to detect inconsistencies.

compiler-rt/lib/memprof/memprof_mapping.h
31–41

Yeah, not sure what this is for, was cloned from asan. Removed it and the related code.

compiler-rt/lib/memprof/memprof_report.cpp
142–203 ↗(On Diff #292874)

Ah, I didn't realize that there were baseline versions of these functions in sanitizer_common. It had all of them except ReportRssLimitExceeded, which I have added there as well. This allowed me to remove memprof_report.* and memprof_errors.*.

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

Since these seem to be checking for valid inconsistencies, I'd prefer to leave them, unless there's a reason they are not needed or problematic.

186

Ditto here. I see calls to this in the other *san as well.

compiler-rt/lib/memprof/memprof_thread.cpp
101

Just to clarify, the comment appears on the line with the call to CommitBack(), but I assume you are talking about the alternate signal stack handling below. I see that the other sanitizers don't have this handling, and I've removed it from here and ThreadStart.

compiler-rt/test/memprof/TestCases/default_options.cpp
2–3

Fixed here and in other tests where applicable.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

I'm a little unsure of what you are suggesting here. There is an existing DumpProcessMap(), which is only implemented by sanitizer_posix. This is different than the PrintModuleMap() controlled by print_module_map that only has a non-empty implementation for sanitizer_mac. Are you suggesting that instead of adding dump_process_map, I add print_module_map=3 and have it invoke DumpProcessMap? What is the intended distinction between PrintModuleMap and DumpProcessMap?

compiler-rt/test/memprof/TestCases/free_hook_realloc.cpp
4

I don't think so, since this is testing that memprof sets up the __sanitizer_free_hook and invokes it correctly. This is almost an exact copy of an asan Posix test case that does the same thing there. (I did try moving the asan/TestCases/Posix version to sanitizer_common/TestCases/Posix and it fails for other sanitizers that apparently don't set up the free hook, so I think it is better to have copies for the clients that do).

compiler-rt/test/memprof/TestCases/function-sections-are-bad.cpp
2 ↗(On Diff #292874)

This is a copy of the asan version. Did some more digging - there is a comment in compiler-rt/CMakeLists.txt that it is for an issue on powerpc64le. Since there isn't support in memprof currently for that architecture, I've deleted this test.

tejohnson updated this revision to Diff 297062.Thu, Oct 8, 2:52 PM
tejohnson marked 7 inline comments as done.

Address comments

LGTM
I'd like to accept next revision (if any).
Other reviewers, please comment if you have something to add.

compiler-rt/lib/memprof/memprof_allocator.cpp
555–556

Please check GetMemprofChunk comment below.

680

When I commented I assumed that kAllocBegMagic can be removed, but it's still used by FinishAndPrint.
So either way is fine.

702

if p == nullptr then
p is either:

  1. tiny chunk from primary allocator
  2. chunk under construction from the secondary allocator

for the [1] "return reinterpret_cast<MemprofChunk *>(alloc_beg);" makes sense, but it sill possible to have uninitialized check
for the [2] it's better to return null

asan has the following here:

AsanChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
    if (!p) {
      if (!allocator.FromPrimary(alloc_beg))
        return nullptr;
      p = reinterpret_cast<AsanChunk *>(alloc_beg);
    }

then asan does the following which is relevant for [1] ([2] must always has correct state here):

u8 state = atomic_load(&p->chunk_state, memory_order_relaxed);
// It does not guaranty that Chunk is initialized, but it's
// definitely not for any other value.
if (state == CHUNK_ALLOCATED || state == CHUNK_QUARANTINE)
  return p;
return nullptr;

memprof has no state, so maybe allocation size == 0 can be used as such sentinel, but it should be atomic.

compiler-rt/lib/memprof/memprof_mapping.h
65

clang format warning?

94

Probably "static inline -> inline" should suppress "unused" warnings

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

I suspect it was needed when asan was developed before into llvm repository. Now compiler and runtime should be always on the same version.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Looks like they are different.
However HWAsan uses

if (common_flags()->print_module_map)
    DumpProcessMap();

So I recommend to avoid new flags and use this one for now. Maybe we some update to the doc string.

tejohnson marked 7 inline comments as done.Thu, Oct 15, 6:00 PM

Thanks, I've addressed your comments. I also have implemented a couple additional fixes I found through internal testing. Added implementations of some sanitizer_* functions that need to be implemented by sanitizer_common users (e.g. sanitizer_get_heap_size) and added a test to ensure they are all defined. I fixed a bug in the timestamp computation code. I also added a flag to ensure that we don't try to access the dynamically allocated CacheSet on the MemInfoBlockCache before it is constructed (e.g. before the Allocator object is initialized).

compiler-rt/lib/memprof/memprof_allocator.cpp
680

Ok I'll leave it in for now

702

Changed user_requested_size to be atomic, and updated this code as suggested. Since we are now doing the check for user_requested_size != 0 here, I have removed that check from the callsite in FinishAndPrint. I also updated GetMemprofChunk to pass back the user_requested_size in a parameter, so we don't need to do another atomic load in the callsites.

compiler-rt/lib/memprof/memprof_mapping.h
94

Yep, fixed all of the static inline in headers to be just inline.

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

Ok removed these.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Ah, didn't see that hwasan was using DumpProcessMap with this option. I went ahead and changed memprof to do the same thing.

I tried to update the description of that flag, but it is a little hard because there are inconsistencies. E.g. HWAsan always calls DumpProcessMap, which is implemented for all OSes AFAICT. Asan, on the other hand, always calls PrintModuleMap, which is only implemented for Mac. It isn't clear to me how different PrintModuleMap will behave from DumpProcessMap on a Mac (which will pick up the posix version of DumpProcessMap). The implementations are different, but do they provide fundamentally different info? Should Asan be changed to be like HWAsan and call DumpProcessMap? In that case PrintModuleMap will have no callers. Another possibility is to change DumpProcessMap so that for SANITIZER_MAC it calls PrintModuleMap instead, then change the Asan callsites to DumpProcessMap. WDYT?

There were also 2 other places in memprof that were calling PrintModuleMap because the code was cloned from Asan (on error reports when print_module_map >= 2 and on death when print_module_map=1). HWAsan calls DumpProcessMap in both of those cases. I've gone ahead and changed memprof to also call DumpProcessMap in those cases since it is more widely implemented.

New revision has not been uploaded yet?

tejohnson updated this revision to Diff 298532.Thu, Oct 15, 6:14 PM
tejohnson marked 5 inline comments as done.

Address comments and misc fixes.

New revision has not been uploaded yet?

Sorry, had some weird issues, it should be there now.

vitalybuka accepted this revision.Thu, Oct 15, 7:32 PM

Thanks!
LGTM

compiler-rt/lib/memprof/memprof_allocator.cpp
622

could this be just exchange ?

This revision is now accepted and ready to land.Thu, Oct 15, 7:32 PM
vitalybuka added inline comments.Thu, Oct 15, 7:45 PM
compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Ah, didn't see that hwasan was using DumpProcessMap with this option. I went ahead and changed memprof to do the same thing.

I tried to update the description of that flag, but it is a little hard because there are inconsistencies. E.g. HWAsan always calls DumpProcessMap, which is implemented for all OSes AFAICT. Asan, on the other hand, always calls PrintModuleMap, which is only implemented for Mac. It isn't clear to me how different PrintModuleMap will behave from DumpProcessMap on a Mac (which will pick up the posix version of DumpProcessMap). The implementations are different, but do they provide fundamentally different info? Should Asan be changed to be like HWAsan and call DumpProcessMap? In that case PrintModuleMap will have no callers. Another possibility is to change DumpProcessMap so that for SANITIZER_MAC it calls PrintModuleMap instead, then change the Asan callsites to DumpProcessMap. WDYT?

LGTM, but I assume we are not going to do that in this patch.

There were also 2 other places in memprof that were calling PrintModuleMap because the code was cloned from Asan (on error reports when print_module_map >= 2 and on death when print_module_map=1). HWAsan calls DumpProcessMap in both of those cases. I've gone ahead and changed memprof to also call DumpProcessMap in those cases since it is more widely implemented.

Thanks for the review!

compiler-rt/lib/memprof/memprof_allocator.cpp
622

Good idea, will fix that before I commit.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

LGTM, but I assume we are not going to do that in this patch.

Yep, will send a follow on patch to clean that up.

tejohnson updated this revision to Diff 298543.Thu, Oct 15, 9:41 PM

Use atomic_exchange as suggested

This revision was landed with ongoing or failed builds.Fri, Oct 16, 9:47 AM
This revision was automatically updated to reflect the committed changes.
tejohnson added inline comments.Sat, Oct 17, 10:52 AM
compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Sent for review in D89630.