This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Record accesses for all words touched in mem intrinsic
ClosedPublic

Authored by tejohnson on Sep 29 2021, 8:00 PM.

Details

Summary

Previously for mem* intrinsics we only incremented the access count for
the first word in the range. However, after thinking it through I think
it makes more sense to record an access for every word in the range.
This better matches the behavior of inlined memory intrinsics, and also
allows better analysis of utilization at a future date.

Diff Detail

Event Timeline

tejohnson requested review of this revision.Sep 29 2021, 8:00 PM
tejohnson created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2021, 8:00 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
tejohnson added inline comments.Sep 29 2021, 8:01 PM
compiler-rt/test/memprof/TestCases/test_memintrin.cpp
40–42

Fixed the size on these calls to what was intended, since I had to update the access counts anyway

snehasish added inline comments.Sep 30 2021, 1:02 PM
compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8

Update the count in the comments too?

I was trying to reason about the counts here:
For the first allocation for p = new int[10] - the allocation itself counts as 1 + memset counts for 5 (since kWordSize = 8). How do we account for the remaining 5 since the memcpy and memcmp have 2 full words and 1 half word access?

snehasish accepted this revision.Sep 30 2021, 2:37 PM

lgtm

compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8

I guess since this is a primitive, the allocation and deallocation don't have accesses and thus the number of accesses is 5 + 3 + 3, rounding up for the half word accesses since the check is addr+size on L269.

This revision is now accepted and ready to land.Sep 30 2021, 2:37 PM
tejohnson added inline comments.Sep 30 2021, 3:05 PM
compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8

Yep, the sizes are effectively rounded up by the traversal in __memprof_record_access_range, and as you noted there is no access on allocation, so the 5 +3 + 3 is correct.

Will fix the comments

tejohnson updated this revision to Diff 376378.Sep 30 2021, 3:08 PM

Update comments