Page MenuHomePhabricator

[HeapProf] Clang and LLVM support for heap profiling instrumentation

Authored by tejohnson on Aug 13 2020, 5:13 PM.



See RFC for background:

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Diff Detail

Event Timeline

tejohnson created this revision.Aug 13 2020, 5:13 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 13 2020, 5:13 PM
tejohnson requested review of this revision.Aug 13 2020, 5:13 PM

LGTM with some nits


maybe remove all static about and extend namespace {}


this could be ShadowMapping::ShadowMapping()


params are not used?


For consistency
or even "?:"


private/public can be reordered and struct converted to class


F->getIntrinsicID() == Intrinsic::masked_load
F->getIntrinsicID() == Intrinsic::masked_store


probably nicer to put output param into struct {} and rely on RVO
as is it's hard to track if they are initialized

one nit: since the same instrumentation can be used to profiling global variable accesses (especially those indirect accessed), the option name seems excluding those cases. Shall it be renamed to fmem-prof?

davidxl added inline comments.Aug 13 2020, 10:25 PM

define a macro for the version number?

Thanks for the comments @vitalybuka and @davidxl, will address those shortly.

one nit: since the same instrumentation can be used to profiling global variable accesses (especially those indirect accessed), the option name seems excluding those cases. Shall it be renamed to fmem-prof?

Hmm, good point. Do you think all the internals (pass name, runtime names, etc) need to be renamed similarly? Or just the external facing option?

MaskRay added inline comments.

OptInFFlag<"heapprof", "Enable", "Disable", " heap profiling">


If there is a default, the convention is to check just that flag:


and remove -fno-heapprof as a CC1 option.

1 ↗(On Diff #285527)

%clang++ does not work on Windows -> clang.exe++

Use %clangxx

2 ↗(On Diff #285527)

It is better surrounding the argument with double quotes

I think the user visible option needs to match the functionality. Internal naming just needs some comments to document.

tejohnson marked 12 inline comments as done.Aug 14 2020, 4:01 PM

Addressed everyone's comments as well as the clang tidy warnings.

I had hoped to get the runtime part ready and uploaded today, but it looks like I won't make it. I'll be out all next week, so will get that sent when I'm back.


I decided to put them in a struct as you suggest but make it return Optional as that seemed cleaner.

tejohnson updated this revision to Diff 285777.Aug 14 2020, 4:01 PM
tejohnson marked an inline comment as done.

address comments

davidxl accepted this revision.Aug 14 2020, 5:25 PM

lgtm. The user option should be documented after the runtime is ready.



This revision is now accepted and ready to land.Aug 14 2020, 5:25 PM
MaskRay added inline comments.Aug 14 2020, 6:01 PM

There is a tab on this line. This makes the code not indented in git diff output.


Consider constexpr int LLVM_HEAP_PROFILER_VERSION = 1;


For a constant in the source file, consider constexpr uint64_t DefaultShadowGranularity = 64; (constexpr implies const, which means internal linkage in a namespace scope, so you can avoid static)


constexpr char HeapProfInitName[] = "__heapprof_init";


Consider deleting DefaultShadowScale.

Alternatively, cl::init(DefaultShadowScale) and don't check getNumOccurrences() below.




Don’t duplicate function or class name at the beginning of the comment.


Since this is new. Perhaps not deal with legacy pass manager at all?

For example, recently there has been objection on porting CGProfile to the legacy pass manager. For an entirely new thing, not handling legacy pass managers can avoid many tests.

MaskRay added inline comments.Aug 14 2020, 6:11 PM
1 ↗(On Diff #285777)

The implementation file is HeapProfiler.cpp

Naming the test directory Heapprofiler can reduce the number of terms people have to memorize.

I am still trying to read the logic. Some comments to the tests. Some comments are applicable to other tests.

22 ↗(On Diff #285777)

It is important to add some -NEXT: here for better FileCheck diagnostics when things change.

23 ↗(On Diff #285777)


54 ↗(On Diff #285777)

FP80Test may be more suitable.

(-mlong-double-128 can change the width)

61 ↗(On Diff #285777)

CHECK-COUNT-1 here is actually identical to a CHECK. If there are two repeated lines, the pattern will still match.

Suggest deleting -COUNT-1. Change the next CHECK to a CHECK-NEXT: store x86_fp80 0xK3FFF8000000000000000, x86_fp80* %a

16 ↗(On Diff #285777)

With appropriate -NEXT: patterns, -NOT is not really needed.

It is also important to test the argument passed to __heapprof_load

13 ↗(On Diff #285777)

(Optional) One style I try to do to improve readability: when -LABEL or -NEXT is used, indent CHECK lines more than the -LABEL

; CHECK-GRAN-LABEL: @read_granularity(
; CHECK-GRAN:         and {{.*}} -32
; CHECK-GRAN-NEXT:    lshr {{.*}} 3
19 ↗(On Diff #285777)

The IR of these functions is the same. Consider using one function with different sets of CHECK lines.

4 ↗(On Diff #285777)

| is misaligned. Consider not aligning at all

11 ↗(On Diff #285777)

call void @__heapprof_version_mismatch_check_v1()

Otherwise there is no test checking the exact version.

tejohnson marked 18 inline comments as done.Aug 25 2020, 11:08 PM

I think I've addressed all the comments.


changed all of them.


Since the legacy pass manager is still the default, and adding the support was trivial, it makes sense to add the support there too.

61 ↗(On Diff #285777)

That doesn't work as that store x86_64 (the original store) is not NEXT. I just want to make sure that we have one shadow store, which was where the COUNT-1 came from. I've change that to a regular CHECK and added a CHECK-NOT store i64 before and after that sequence. Ditto elsewhere.

16 ↗(On Diff #285777)

The -NOT is needed to ensure there are no additional calls, without matching the full IR. I've expanded the testing to test the arguments though.

tejohnson marked 2 inline comments as done.

Address comments

vitalybuka accepted this revision.Aug 26 2020, 3:51 PM
MaskRay accepted this revision.Aug 26 2020, 4:16 PM
MaskRay added inline comments.

ClDebugFunc != F.getName()

tejohnson added inline comments.Aug 26 2020, 10:26 PM

Your suggestion doesn't work as that reverses the handling. But in any case, the empty check seems unnecessary (that was cloned from the asan code), so I've removed that.

Remove unnecessary check

This revision was landed with ongoing or failed builds.Aug 27 2020, 8:51 AM
This revision was automatically updated to reflect the committed changes.

This really needs some docs changes.

This really needs some docs changes.

Good point, I'll send a patch for that later today.

This really needs some docs changes.

Good point, I'll send a patch for that later today.

Actually, that might be better after the runtime is committed. Since the generated code with this option won't actually work without that part.

wenlei added a subscriber: wenlei.Sep 4 2020, 9:39 AM