This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Memory profiling runtime support
ClosedPublic

Authored by tejohnson on Sep 3 2020, 5:26 PM.

Details

Summary

See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Follow on companion to the clang/llvm instrumentation support in D85948
and committed earlier.

This patch adds the compiler-rt runtime support for the memory
profiling.

Note that much of this support was cloned from asan (and then greatly
simplified and renamed). For example the interactions with the
sanitizer_common allocators, error handling, interception, etc.

The bulk of the memory profiling specific code can be found in the
MemInfoBlock, MemInfoBlockCache, and related classes defined and used
in memprof_allocator.cpp.

For now, the memory profile is dumped to text (stderr by default, but
honors the sanitizer_common log_path flag). It is dumped in either a
default verbose format, or an optional terse format.

I have kept some of the same error handling as in asan, such as for
general memory allocation issues like OOM, but removed the error
handling relating to asan specific errors.

This patch also adds a set of tests for the core functionality.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

It is difficult to differentiate stack/static data/heap memory (int foo(int *a) { return *a; } may access static .data). The instrumentation implementation/runtime is actually generic and the shadow memory works for stack/static data as well. Will the profiler be focused on heap so 'heapprof' will still be an appropriate name? Can .rodata/.data allocation be optimized if their access patterns are known?

When porting lib/asan/asan_linux.cpp lib/asan/asan_malloc_linux.cpp, it may make sense renaming linux to posix because the files aren't really entirely Linux specific.

One question to sanitizers folks: test/asan and test/hwasan have a TestCases subdirectory. test/tsan, test/msan and test/cfi don't. Does heapprof need a TestCases subdirectory?

compiler-rt/lib/heapprof/heapprof_flags.cpp
26 ↗(On Diff #289838)

We can get rid of such functions (Maybe*DefaultOptions). See D87175

59 ↗(On Diff #289838)

intercept_tls_get_addr is untested

How will it be used? At runtime the PT_TLS block is copied to memory allocated by mmap by individual threads, so the TLS blocks can be counted as heap memory. If the runtime is not ready to recognize such memory references, intercept_tls_get_addr should be postponed.

compiler-rt/lib/heapprof/heapprof_interface.inc
1 ↗(On Diff #289838)

This file is only needed when the Windows port is implemented.

compiler-rt/lib/heapprof/heapprof_internal.h
47 ↗(On Diff #290175)

This does not need indentation

compiler-rt/lib/heapprof/heapprof_linux.cpp
1 ↗(On Diff #290175)

The name of asan_linux.cpp is actually confusing. *BSD and Solaris implementations also share that file.

Rename this to heapprof_posix.cpp? Though, on asan side, there is also an asan_posix.cpp ...

90 ↗(On Diff #289838)

Remove internal_strstr(libname, "libheapprof.so")

There is no gcc port.

132 ↗(On Diff #289838)

This logic needs a port of asan_rt_conflict_test-1.cpp

compiler-rt/lib/heapprof/heapprof_malloc_linux.cpp
1 ↗(On Diff #290175)

heapprof_malloc_posix.cpp may be more appropriate for future platform additions.

compiler-rt/lib/heapprof/heapprof_report.cpp
83 ↗(On Diff #290175)

Does print_stats=1 have a test demonstrating how it works?

I picked one test and added HEAPPROF_OPTIONS=print_stats=1 but did not observe any difference.

143 ↗(On Diff #290175)
compiler-rt/lib/heapprof/heapprof_rtl.cpp
107 ↗(On Diff #289838)

Space after ,

313 ↗(On Diff #289838)

There needs to be a port of asan/TestCases/unaligned_loads_and_stores.cpp to test these interfaces.

compiler-rt/lib/heapprof/heapprof_stats.cpp
1 ↗(On Diff #290175)

It seems that this important file is untested.

compiler-rt/test/heapprof/TestCases/test_memintrin.cpp
11 ↗(On Diff #290175)

Just write a space, instead of using [[:space:]]

[[#%u,STACKIDP:]] can match an unsigned decimal.

compiler-rt/test/heapprof/lit.cfg.py
36 ↗(On Diff #290175)

This line is not needed

43 ↗(On Diff #290175)

Prefer single quotes for string literals. ditto below

Thanks for the comments @vitalybuka and @MaskRay. Starting with Vitaly's comments. Responses and follow up questions below.

compiler-rt/lib/heapprof/heapprof_allocator.cpp
62–113 ↗(On Diff #289838)

Right, it's an optimization to reduce the memory overhead. Since it was straightforward to implement, I think it makes sense to use this support to enable reduced alignment requirements.

561 ↗(On Diff #289838)

BTW I found that the "meta[0] = size" was removed in D86922, you might want to mark that as a parent to D86932. Question on that though. In D86922 you are using a total of at most 45 bits for holding the size. What is in fact the max size we need to represent?

78–95 ↗(On Diff #290165)

Thanks for the suggestions. You are right, I can increase the allocation size here given the extra space I currently have. Question on that further below.

And good suggestion about alloc_context_id, I see StackDepotPut returns a u32 anyway. I'm not sure why I changed this to u64 here - looks like it has been u32 in asan.

compiler-rt/lib/heapprof/heapprof_allocator.h
63–76 ↗(On Diff #289838)

Will fix

67 ↗(On Diff #290165)

Will fix

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
750

Let me try doing that, it will need to be a child revision of this so it can be used/tested.

vitalybuka added inline comments.Sep 8 2020, 1:25 PM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
561 ↗(On Diff #289838)

I guess it's is not a problem for heapprov, it has plenty of bits for full uptr?

I am not sure what is required max size should be. I hope 45bit is enough.
If not, asan has large red zones for large allocations. So I still can avoid using metadata.
Those changes are not realy important for asan. I was fixing bug with lsan, found another one with allot_tid, and metadata patches are just a cleanup.

Responses to some of @MaskRay's comments below (if no explicit response then ack).

I pushed a5d6af421d625c78bfb0f63830b51863ff0f0877 to remove annoying clang-tidy readability-identifier-naming warnings. There is a minor compilation error related to stdlib.h. After fixing it, many tests fail because -fmemprof was recently renamed to -fmemory-profile. Can you upload a new diff?

Thanks for fixing those. I'll fix the option in the tests when I upload a new diff addressing the comments.

It is difficult to differentiate stack/static data/heap memory (int foo(int *a) { return *a; } may access static .data). The instrumentation implementation/runtime is actually generic and the shadow memory works for stack/static data as well. Will the profiler be focused on heap so 'heapprof' will still be an appropriate name? Can .rodata/.data allocation be optimized if their access patterns are known?

Yes this came up on the instrumentation thread, and hence I made the option name more general (-fmemprof and now -fmemory-profile). I'll go ahead and do a mass rename to MemProf on this patch. But I will probably do that as a follow on after addressing the other comments, so that it is easier to see the more substantive changes in a single diff.

When porting lib/asan/asan_linux.cpp lib/asan/asan_malloc_linux.cpp, it may make sense renaming linux to posix because the files aren't really entirely Linux specific.

I kept the same file structure as asan, with the assumption that it made sense. I'm not an OS expert, so unfortunately not sure how to distinguish linux vs posix requirement.

One question to sanitizers folks: test/asan and test/hwasan have a TestCases subdirectory. test/tsan, test/msan and test/cfi don't. Does heapprof need a TestCases subdirectory?

compiler-rt/lib/heapprof/heapprof_flags.cpp
59 ↗(On Diff #289838)

This is a sanitizer_common flag and the handling is there. I didn't see a reason to disable it here (it seems to be enabled for all the other clients). I did find an asan test for this, will port that over.

compiler-rt/lib/heapprof/heapprof_linux.cpp
1 ↗(On Diff #290175)

Indeed, but as noted above I kept the same structure (with the same functionality mapped over) as for asan, since I assume it makes sense. Note that POSIX seems to include other OS such as Mac.

compiler-rt/lib/heapprof/heapprof_malloc_linux.cpp
1 ↗(On Diff #290175)

This again mirrors the asan naming.

compiler-rt/lib/heapprof/heapprof_report.cpp
83 ↗(On Diff #290175)

I thought so but I guess not, will add.

compiler-rt/test/heapprof/TestCases/test_memintrin.cpp
11 ↗(On Diff #290175)

I am using [[:space:]] per the instructions at:
https://llvm.org/docs/CommandGuide/FileCheck.html#matching-newline-characters

Since I need to match newlines

vitalybuka added inline comments.Sep 9 2020, 4:14 AM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
561 ↗(On Diff #289838)

I just noticed that Asan already has hardcoded check that size fits in 40bit. see kMaxAllowedMallocSize

689 ↗(On Diff #290175)

AsanChunk is used in such way that any garbage works there.
Here we return alloc_beg but we have no prove that it's initialized.
There is a data race by design, I guess, when lsan can start using that data when it's not yet fully initialized.
lsan has sanity checks for all fields so it should not crash/hang and in the worst case it just misses memory leaks.
However If block is alive for user or it's in quarantine then chunk is valid and lsan will do correct leak checking.
I can imagine that author tried to avoid locking between lsan and fast thread-local allocator cache.

Is it true for heapprof?
Looks like heapprof_mz_size can get here. Can valid, data race free, program get into GetHeapprofChunk in one thread for a memory block which is being allocated/deallocated by another thread?

BTW. Here is my another Asan patch to avoid metadata and more importantly make sure header is in consistent state D87359

These "linux" files will be ported to BSD so we might pick better names now.

tejohnson added inline comments.Sep 9 2020, 9:45 AM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
561 ↗(On Diff #289838)

Ok thanks. My inclination then is to make the size a 40 bit field, so that I can make note of the extra bits available if needed in the future for something else, and then have a check somewhere to ensure that they stay in sync.

689 ↗(On Diff #290175)

The user_requested_size is set to 0 on deallocation (min size is 1 per Allocate). All of the callers of this function do specifically check for a 0 user_requested_size and handle it specially.

vitalybuka added inline comments.Sep 9 2020, 4:57 PM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
689 ↗(On Diff #290175)

If block is allocated from primary allocator, there is not guaranty that it's zeroed. E.g. allocator may uses that mem to maintain free list.
So user_requested_size may not be 0 just after return from allocator.Allocate
If HeapprofChunk is alloc_beg then we have no kAllocBegMagic and GetHeapprofChunk will return this not yet initialized HeapprofChunk.

For the secondary it's not a problem as it's from mmap so it's zero-ed.
For the primary with Magic also should not be a problem (if we fix Allocate to set Magic after initialization)

tejohnson updated this revision to Diff 291094.Sep 10 2020, 3:19 PM
tejohnson marked 18 inline comments as done and 2 inline comments as done.

Rebase and update flag

tejohnson added a comment.EditedSep 10 2020, 3:34 PM

The next patch update should address all of the comments except:

    1. @vitalybuka's last response, I have a follow up question to help me decide what to do there (thread safety related to user_requested_size)
    2. Renaming heapprof to memprof - will do in a follow up update
    3. Renaming the *_linux files to *_posix. It's not clear to me why they are named as such on asan and whether everything in them in fact is posix compliant.
    4. Moving some code from heapprof_allocator.h into the cc file as suggested by @vitalybuka
  • will do that in a follow up update.
compiler-rt/lib/heapprof/heapprof_allocator.cpp
689 ↗(On Diff #290175)

How does asan ensure that chunk_state is CHUNK_INVALID (0) on allocation from the primary allocator?

compiler-rt/lib/heapprof/heapprof_allocator.h
44–95 ↗(On Diff #289838)

I will do this as a follow on patch update, so it is easier to see the diffs.

compiler-rt/lib/heapprof/heapprof_report.cpp
83 ↗(On Diff #290175)

This stats printing option only affects error conditions, so I have added a test of it to malloc-size-too-big.cpp. There was already some testing of the stats printing infrastructure as noted on your other comment about that.

compiler-rt/lib/heapprof/heapprof_stats.cpp
1 ↗(On Diff #290175)

Note that atexit=1 also causes the stats printing to occur (on clean exit). I was testing that in atexit_stats.cpp, but I have beefed up the matching a bit. I don't want to match exact numbers though, since a number of allocations occur from places like libc.so and the dynamic loader, which might vary in behavior across systems/versions.

I've also added another test that exercises it via print_stats.

tejohnson updated this revision to Diff 291097.Sep 10 2020, 3:50 PM

Address most of the comments

vitalybuka added inline comments.Sep 10 2020, 4:53 PM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
689 ↗(On Diff #290175)

Unfortunately It does not.
If we see kAllocBegMagic than we quite sure that it's already updated AsanChunk.
As I wrote, without kAllocBegMagic (for smallest allocations) we have no idea if this real chunk state which is in the beginning of allocated block, or it's random garbage from primary allocator. D87359 was attempt to always have kAllocBegMagic to avoid that.
But it's not important for Asan. Asan theoretically should work with any random bytes in AsanChunk.

tejohnson added inline comments.Sep 10 2020, 10:29 PM
compiler-rt/lib/heapprof/heapprof_allocator.cpp
78–95 ↗(On Diff #290165)

Actually, I don't believe I need to encode the offset. Just using a bit like asan to flag from_memalign should be sufficient. The offset I added was only needed in AllocBeg(), but I can simply use the same technique as asan here: if from_memalign then use the base allocator's GetBlockBegin mechanism to get from the header to the alloc begin. (To go from the alloc begin to the header we simply check the first uptr to see if it is kAllocBegMagic.)

689 ↗(On Diff #290175)

If we see kAllocBegMagic than we quite sure that it's already updated AsanChunk.

Is that true though? In asan_allocator.cpp's Allocate(), the first uptr is set to kAllocBegMagic before the chunk header is completely updated. E.g. the user_requested_size and some other fields are not yet initialized.

In any case, presumably the same issue could occur for asan_mz_size if there are random bytes in AsanChunk. But back to your original question: Can valid, data race free, program get into Get{Heapprof,Asan}Chunk in one thread for a memory block which is being allocated/deallocated by another thread? For an interface like asan_mz_size or heapprof_mz_size, I don't think that would be true.

The other place where I'm calling GetHeapprofChunk here is for FinishAndPrint where we are iterating all chunks. But that is called on Allocator destruction. I do plan to add an interface to dump the profile in a similar manner during the runtime, so presumably there we could invoke GetHeapprofChunk on one that is mid allocation or destruction in another thread. But in the rare case that this hits such a race at the worst we would end up with some corruption in the profile data. We also lock the Allocator for the full chunk iteration during that profile dumping which should minimize this risk.

Remove offset field and use from_memalign

Fix incorrect clang-format

Manually fixed the formatting issues causing clang-format to get confused

tejohnson added inline comments.Sep 11 2020, 12:53 PM
compiler-rt/lib/heapprof/heapprof_allocator.h
44–95 ↗(On Diff #289838)

None of this can move to the cc file. HeapprofThreadLocalMallocStorage is used by heapprof_thread and kNumberOfSizeClasses is also used by heapprof_stats

Rename from heapprof to memprof.

Remove a few extraneous files from commit. I will be splitting out the
sanitizer_common changes into another patch shortly, and will update the
patch with a note afterwards when this is ready to review again.

tejohnson retitled this revision from [HeapProf] Heap profiling runtime support to [MemProf] Memory profiling runtime support.Sep 15 2020, 11:12 AM
tejohnson edited the summary of this revision. (Show Details)
tejohnson updated this revision to Diff 292342.Sep 16 2020, 2:23 PM

Remove sanitizer common handling from this patch

PTAL

I pulled out the sanitizer_common handling for printing the stack depot into another patch. I also removed the internal_getcpu handling as I couldn't reproduce why I needed it in the first place. Calling sched_getcpu seems to work fine without it.

I believe I have addressed the comments other than renaming linux files to posix (not sure how to tell if everything there is posix compliant, since it is named linux on asan), and one outstanding question to @vitalybuka. Unfortunately the old comments did not get copied over when I renamed all the files in a recent update. To see the outstanding question look here: https://reviews.llvm.org/D87120?id=290175#inline-809785.

@vitalybuka Ping - let me know if you want me to split it up further before more review. I'm not sure the best way to split it up, since a lot of this is the basic infrastructure that works together. The main thing I can think of is to split it up onto 2 pieces: 1) with the infrastructure that just allows the heap profiler to intercept and handle the memory allocations/deallocations/intrinsics (which was largely cloned from the asan code); and 2) the part on top of that which does the actual heap tracking (the MemInfoBlock and MemInfoBlockCache stuff in memprof_allocator.cpp). If I split it I would keep most of the the chunk header part in the first patch, since it affects the memory allocations/alignment, but just not fill in the fields related to the heap tracking.

Tests mostly LGTM.
I'lll send more comments a little bit later today.

compiler-rt/CMakeLists.txt
77–88

these flags were added recently to support some platform with memory constraints do not allow to use default vales.
I would recommend do copy them, as corresponding -mllvm flags unless memprof is actually needs them.

compiler-rt/test/memprof/TestCases/default_options.cpp
2–3

for 1:1 compile:run statements this way more convenient to copy entire testcase command line when debugging
same other tests

vitalybuka added inline comments.Oct 2 2020, 5:27 PM
compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

maybe better to add print_module_map=3 and document something
3 - print module map for debugging
and assume sanitizer may use it when ever it wants.

compiler-rt/test/memprof/TestCases/free_hook_realloc.cpp
4

Looks like this test belong to sanitizer_common

compiler-rt/test/memprof/TestCases/function-sections-are-bad.cpp
3

Why this test is needed?

compiler-rt/test/memprof/TestCases/memprof_rt_conflict_test.cpp
9

Is anyone is going to make it work for android?
If not yet, then I'd remove such UNSUPPORTED and XFAILs .

compiler-rt/test/memprof/TestCases/sleep_after_init.c
3

I never used this sleep_*, they are so specific, not sure why they are even exist. Also no one bother to introduce them into other sanitizers.
I recommend to not copy them

compiler-rt/test/memprof/TestCases/stress_dtls.c
6

2.15 is quite old, I don't think anyone care. I guess it's copied from asan. I recommend to remove this comment to avoid confusion.

vitalybuka added inline comments.Oct 3 2020, 12:27 AM
compiler-rt/lib/memprof/CMakeLists.txt
87–115

text is misaligned

compiler-rt/lib/memprof/memprof_allocator.cpp
17–35

clang-formated version works for me

89–94

we don't have to use bit fields

321–325

sprintf can be avoided

364

same here

555–556

magic should better be atomic, set when entire Chunk is initialized and after chunk_beg is set
as FinishAndPrint may run in another thread.
I updated asan code, see LargeChunkHeader.

Also ForEachChunk appyes to all, including non allocated and deallocated chunks.
So Deallocate needs to reset magic as well as we don't have guaranty that FinishAndPrint can see chunk_beg for deallocated block.
Probably user_requested_size needs to be atomic as well?

620

Better to call MEMPROF_FREE_HOOK before ptr is released to allocator

680

can you use "uptr chunk_beg = p - kChunkHeaderSize;" as in deallocate ?
to remove one kAllocBegMagic user

823–830

unused?

compiler-rt/lib/memprof/memprof_flags.inc
20–26

I think both these other flags copied from asan were added after asan release. So asan needed to them to avoid breaking existing users.
Existing users is not a problem for memprof so I recommend to remove them.

Also in future it's going to be easier to add them than remove without breaking users. So I advice to review the list and keep as few as possible.

28

same for sleep_*

compiler-rt/lib/memprof/memprof_init_version.h
19–25

I think it's not needed.
Only asan has this and it was not updated as expected.

compiler-rt/lib/memprof/memprof_interceptors.cpp
131

memprof does not need intercept signal related calls?

compiler-rt/lib/memprof/memprof_mapping.h
19–21

not needed

31–41

this looks like some pieces of debugging code

compiler-rt/lib/memprof/memprof_posix.cpp
20–34

number of includes in many files is suspiciously large.

42

it's already in __memprof namespace
maybe prefix can be avoided.

compiler-rt/lib/memprof/memprof_report.cpp
143–204

Most of these errors are already in sanitizer_common. and all sanitizers call them from there.
Asan probably needs to include some additional info.

Having that memprof has no own errors, why it does not use common report functions?

I guess entire file can be removed.
ReportRssLimitExceeded and ReportOutOfMemory can be avoidede or moved into sanitizer_allocator_report

compiler-rt/lib/memprof/memprof_rtl.cpp
178

this is Darwin specific stuff SANITIZER_SUPPORTS_INIT_FOR_DLOPEN, should not be needed.

184–185

Probably not needed, I don't see similar code in other sanitizers

186

maybe not needed. it's something s390 specific and old

compiler-rt/lib/memprof/memprof_stack.cpp
33–51

most sanitizers does not need this
I suspect that it's not needed here.

compiler-rt/lib/memprof/memprof_stack.h
55

unused here and in asan

compiler-rt/lib/memprof/memprof_stats.cpp
26

I assuming that this call only one per thread and can happen early it can be replaced with internal_memset
and CHECK removed

internal_memset always works
REAL(memset) can be better optimized.

compiler-rt/lib/memprof/memprof_thread.cpp
101

I suspect no signal related code is needed for memprof.
Asan handles some signals which should not be needed there.

111

Asan version is over complicated as it expects call for stack bounds from other threads (LSAN).
I suspect it's not true for memprof.
You can take a look at corresponding code in Msan.

However all these stuff is needed for __sanitizer_start_switch_fiber implementation which is called by some libs only with asan. I see one boost.
Msan didn't need it until recent GO integration.
Unless you are going to update that software it's just a dead code. I advise to remove it and add only when needed.

compiler-rt/lib/memprof/memprof_thread.h
38

struct?

tejohnson marked 28 inline comments as done.Oct 8 2020, 2:51 PM

Thanks for the detailed comments! I think I have addressed them all, except for a few as noted in the comments below.

compiler-rt/CMakeLists.txt
77–88

At some point we may want to play around with other shadow granularities, but that will require adjusting the mask as well. For now I have removed it, we can add something back if we want to adjust this in the future.

compiler-rt/lib/memprof/memprof_allocator.cpp
89–94

True. I was trying to keep it to the minimal number of bits, but I can leave a comment that notes these can be shrunk to provide space for additional fields if needed.

555–556

Great, I stole your LargeChunkHeader code from asan and used that approach here. =)
I added the appropriate check to FinishAndPrint that we're getting back a non-null MemprofChunk from getMemprofChunk. With that change I'm not sure that we need to make user_requested_size atomic?

680

This is only used by memprof_malloc_usable_size. Looks like it handles being invoked on a pointer that may not be the start of the block. I couldn't find any documentation about how malloc_usable_size should behave if called on a pointer in the middle of an allocation. I tried using the system version and it doesn't error (like a deallocation does), but returns 0 in that case. My guess is behavior is undefined? In any case, not sure if it is worth optimizing for this case. WDYT?

compiler-rt/lib/memprof/memprof_init_version.h
19–25

I actually prefer to keep this to have some ability to detect inconsistencies.

compiler-rt/lib/memprof/memprof_mapping.h
31–41

Yeah, not sure what this is for, was cloned from asan. Removed it and the related code.

compiler-rt/lib/memprof/memprof_report.cpp
143–204

Ah, I didn't realize that there were baseline versions of these functions in sanitizer_common. It had all of them except ReportRssLimitExceeded, which I have added there as well. This allowed me to remove memprof_report.* and memprof_errors.*.

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

Since these seem to be checking for valid inconsistencies, I'd prefer to leave them, unless there's a reason they are not needed or problematic.

186

Ditto here. I see calls to this in the other *san as well.

compiler-rt/lib/memprof/memprof_thread.cpp
101

Just to clarify, the comment appears on the line with the call to CommitBack(), but I assume you are talking about the alternate signal stack handling below. I see that the other sanitizers don't have this handling, and I've removed it from here and ThreadStart.

compiler-rt/test/memprof/TestCases/default_options.cpp
2–3

Fixed here and in other tests where applicable.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

I'm a little unsure of what you are suggesting here. There is an existing DumpProcessMap(), which is only implemented by sanitizer_posix. This is different than the PrintModuleMap() controlled by print_module_map that only has a non-empty implementation for sanitizer_mac. Are you suggesting that instead of adding dump_process_map, I add print_module_map=3 and have it invoke DumpProcessMap? What is the intended distinction between PrintModuleMap and DumpProcessMap?

compiler-rt/test/memprof/TestCases/free_hook_realloc.cpp
4

I don't think so, since this is testing that memprof sets up the __sanitizer_free_hook and invokes it correctly. This is almost an exact copy of an asan Posix test case that does the same thing there. (I did try moving the asan/TestCases/Posix version to sanitizer_common/TestCases/Posix and it fails for other sanitizers that apparently don't set up the free hook, so I think it is better to have copies for the clients that do).

compiler-rt/test/memprof/TestCases/function-sections-are-bad.cpp
3

This is a copy of the asan version. Did some more digging - there is a comment in compiler-rt/CMakeLists.txt that it is for an issue on powerpc64le. Since there isn't support in memprof currently for that architecture, I've deleted this test.

tejohnson updated this revision to Diff 297062.Oct 8 2020, 2:52 PM
tejohnson marked 7 inline comments as done.

Address comments

LGTM
I'd like to accept next revision (if any).
Other reviewers, please comment if you have something to add.

compiler-rt/lib/memprof/memprof_allocator.cpp
555–556

Please check GetMemprofChunk comment below.

680

When I commented I assumed that kAllocBegMagic can be removed, but it's still used by FinishAndPrint.
So either way is fine.

702

if p == nullptr then
p is either:

  1. tiny chunk from primary allocator
  2. chunk under construction from the secondary allocator

for the [1] "return reinterpret_cast<MemprofChunk *>(alloc_beg);" makes sense, but it sill possible to have uninitialized check
for the [2] it's better to return null

asan has the following here:

AsanChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
    if (!p) {
      if (!allocator.FromPrimary(alloc_beg))
        return nullptr;
      p = reinterpret_cast<AsanChunk *>(alloc_beg);
    }

then asan does the following which is relevant for [1] ([2] must always has correct state here):

u8 state = atomic_load(&p->chunk_state, memory_order_relaxed);
// It does not guaranty that Chunk is initialized, but it's
// definitely not for any other value.
if (state == CHUNK_ALLOCATED || state == CHUNK_QUARANTINE)
  return p;
return nullptr;

memprof has no state, so maybe allocation size == 0 can be used as such sentinel, but it should be atomic.

compiler-rt/lib/memprof/memprof_mapping.h
65

clang format warning?

94

Probably "static inline -> inline" should suppress "unused" warnings

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

I suspect it was needed when asan was developed before into llvm repository. Now compiler and runtime should be always on the same version.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Looks like they are different.
However HWAsan uses

if (common_flags()->print_module_map)
    DumpProcessMap();

So I recommend to avoid new flags and use this one for now. Maybe we some update to the doc string.

tejohnson marked 7 inline comments as done.Oct 15 2020, 6:00 PM

Thanks, I've addressed your comments. I also have implemented a couple additional fixes I found through internal testing. Added implementations of some sanitizer_* functions that need to be implemented by sanitizer_common users (e.g. sanitizer_get_heap_size) and added a test to ensure they are all defined. I fixed a bug in the timestamp computation code. I also added a flag to ensure that we don't try to access the dynamically allocated CacheSet on the MemInfoBlockCache before it is constructed (e.g. before the Allocator object is initialized).

compiler-rt/lib/memprof/memprof_allocator.cpp
680

Ok I'll leave it in for now

702

Changed user_requested_size to be atomic, and updated this code as suggested. Since we are now doing the check for user_requested_size != 0 here, I have removed that check from the callsite in FinishAndPrint. I also updated GetMemprofChunk to pass back the user_requested_size in a parameter, so we don't need to do another atomic load in the callsites.

compiler-rt/lib/memprof/memprof_mapping.h
94

Yep, fixed all of the static inline in headers to be just inline.

compiler-rt/lib/memprof/memprof_rtl.cpp
184–185

Ok removed these.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Ah, didn't see that hwasan was using DumpProcessMap with this option. I went ahead and changed memprof to do the same thing.

I tried to update the description of that flag, but it is a little hard because there are inconsistencies. E.g. HWAsan always calls DumpProcessMap, which is implemented for all OSes AFAICT. Asan, on the other hand, always calls PrintModuleMap, which is only implemented for Mac. It isn't clear to me how different PrintModuleMap will behave from DumpProcessMap on a Mac (which will pick up the posix version of DumpProcessMap). The implementations are different, but do they provide fundamentally different info? Should Asan be changed to be like HWAsan and call DumpProcessMap? In that case PrintModuleMap will have no callers. Another possibility is to change DumpProcessMap so that for SANITIZER_MAC it calls PrintModuleMap instead, then change the Asan callsites to DumpProcessMap. WDYT?

There were also 2 other places in memprof that were calling PrintModuleMap because the code was cloned from Asan (on error reports when print_module_map >= 2 and on death when print_module_map=1). HWAsan calls DumpProcessMap in both of those cases. I've gone ahead and changed memprof to also call DumpProcessMap in those cases since it is more widely implemented.

New revision has not been uploaded yet?

tejohnson updated this revision to Diff 298532.Oct 15 2020, 6:14 PM
tejohnson marked 5 inline comments as done.

Address comments and misc fixes.

New revision has not been uploaded yet?

Sorry, had some weird issues, it should be there now.

vitalybuka accepted this revision.Oct 15 2020, 7:32 PM

Thanks!
LGTM

compiler-rt/lib/memprof/memprof_allocator.cpp
622

could this be just exchange ?

This revision is now accepted and ready to land.Oct 15 2020, 7:32 PM
vitalybuka added inline comments.Oct 15 2020, 7:45 PM
compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Ah, didn't see that hwasan was using DumpProcessMap with this option. I went ahead and changed memprof to do the same thing.

I tried to update the description of that flag, but it is a little hard because there are inconsistencies. E.g. HWAsan always calls DumpProcessMap, which is implemented for all OSes AFAICT. Asan, on the other hand, always calls PrintModuleMap, which is only implemented for Mac. It isn't clear to me how different PrintModuleMap will behave from DumpProcessMap on a Mac (which will pick up the posix version of DumpProcessMap). The implementations are different, but do they provide fundamentally different info? Should Asan be changed to be like HWAsan and call DumpProcessMap? In that case PrintModuleMap will have no callers. Another possibility is to change DumpProcessMap so that for SANITIZER_MAC it calls PrintModuleMap instead, then change the Asan callsites to DumpProcessMap. WDYT?

LGTM, but I assume we are not going to do that in this patch.

There were also 2 other places in memprof that were calling PrintModuleMap because the code was cloned from Asan (on error reports when print_module_map >= 2 and on death when print_module_map=1). HWAsan calls DumpProcessMap in both of those cases. I've gone ahead and changed memprof to also call DumpProcessMap in those cases since it is more widely implemented.

Thanks for the review!

compiler-rt/lib/memprof/memprof_allocator.cpp
622

Good idea, will fix that before I commit.

compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

LGTM, but I assume we are not going to do that in this patch.

Yep, will send a follow on patch to clean that up.

tejohnson updated this revision to Diff 298543.Oct 15 2020, 9:41 PM

Use atomic_exchange as suggested

This revision was landed with ongoing or failed builds.Oct 16 2020, 9:47 AM
This revision was automatically updated to reflect the committed changes.
tejohnson added inline comments.Oct 17 2020, 10:52 AM
compiler-rt/test/memprof/TestCases/dump_process_map.cpp
5

Sent for review in D89630.