This is an archive of the discontinued LLVM Phabricator instance.

Differential D110799

[MemProf] Record accesses for all words touched in mem intrinsic
ClosedPublic

Authored by tejohnson on Sep 29 2021, 8:00 PM.

Download Raw Diff

Details

Reviewers

snehasish

Commits

rG0d8bdc17862e: [MemProf] Record accesses for all words touched in mem intrinsic

Summary

Previously for mem* intrinsics we only incremented the access count for
the first word in the range. However, after thinking it through I think
it makes more sense to record an access for every word in the range.
This better matches the behavior of inlined memory intrinsics, and also
allows better analysis of utilization at a future date.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson requested review of this revision.Sep 29 2021, 8:00 PM

tejohnson created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2021, 8:00 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

tejohnson added inline comments.Sep 29 2021, 8:01 PM

compiler-rt/test/memprof/TestCases/test_memintrin.cpp
40–42	Fixed the size on these calls to what was intended, since I had to update the access counts anyway

Harbormaster completed remote builds in B126486: Diff 376094.Sep 29 2021, 8:30 PM

snehasish added inline comments.Sep 30 2021, 1:02 PM

compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8	Update the count in the comments too? I was trying to reason about the counts here: For the first allocation for p = new int[10] - the allocation itself counts as 1 + memset counts for 5 (since kWordSize = 8). How do we account for the remaining 5 since the memcpy and memcmp have 2 full words and 1 half word access?

lgtm

compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8	I guess since this is a primitive, the allocation and deallocation don't have accesses and thus the number of accesses is 5 + 3 + 3, rounding up for the half word accesses since the check is addr+size on L269.

This revision is now accepted and ready to land.Sep 30 2021, 2:37 PM

tejohnson added inline comments.Sep 30 2021, 3:05 PM

compiler-rt/test/memprof/TestCases/test_memintrin.cpp
8	Yep, the sizes are effectively rounded up by the traversal in __memprof_record_access_range, and as you noted there is no access on allocation, so the 5 +3 + 3 is correct. Will fix the comments

Update comments

Harbormaster completed remote builds in B126647: Diff 376378.Sep 30 2021, 3:08 PM

Closed by commit rG0d8bdc17862e: [MemProf] Record accesses for all words touched in mem intrinsic (authored by tejohnson). · Explain WhySep 30 2021, 3:08 PM

This revision was automatically updated to reflect the committed changes.

tejohnson added a commit: rG0d8bdc17862e: [MemProf] Record accesses for all words touched in mem intrinsic.

Revision Contents

Path

Size

compiler-rt/

lib/

memprof/

memprof_rtl.cpp

11 lines

test/

memprof/

TestCases/

test_memintrin.cpp

14 lines

unaligned_loads_and_stores.cpp

2 lines

Diff 376379

compiler-rt/lib/memprof/memprof_rtl.cpp

	Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines
	void __memprof_preinit() { MemprofInitInternal(); }			void __memprof_preinit() { MemprofInitInternal(); }

	void __memprof_version_mismatch_check_v1() {}			void __memprof_version_mismatch_check_v1() {}

	void __memprof_record_access(void const volatile *addr) {			void __memprof_record_access(void const volatile *addr) {
	__memprof::RecordAccess((uptr)addr);			__memprof::RecordAccess((uptr)addr);
	}			}

	// We only record the access on the first location in the range,			void __memprof_record_access_range(void const volatile *addr, uptr size) {
	// since we will later accumulate the access counts across the			for (uptr a = (uptr)addr; a < (uptr)addr + size; a += kWordSize)
	// full allocation, and we don't want to inflate the hotness from			__memprof::RecordAccess(a);
	// a memory intrinsic on a large range of memory.
	// TODO: Should we do something else so we can better track utilization?
	void __memprof_record_access_range(void const volatile *addr,
	UNUSED uptr size) {
	__memprof::RecordAccess((uptr)addr);
	}			}

	extern "C" SANITIZER_INTERFACE_ATTRIBUTE u16			extern "C" SANITIZER_INTERFACE_ATTRIBUTE u16
	__sanitizer_unaligned_load16(const uu16 *p) {			__sanitizer_unaligned_load16(const uu16 *p) {
	__memprof_record_access(p);			__memprof_record_access(p);
	return *p;			return *p;
	}			}

	Show All 29 Lines

compiler-rt/test/memprof/TestCases/test_memintrin.cpp

	// Check profile with calls to memory intrinsics.			// Check profile with calls to memory intrinsics.

	// RUN: %clangxx_memprof -O0 %s -o %t && %env_memprof_opts=log_path=stderr %run %t 2>&1 \| FileCheck %s			// RUN: %clangxx_memprof -O0 %s -o %t && %env_memprof_opts=log_path=stderr %run %t 2>&1 \| FileCheck %s

	// This is actually:			// This is actually:
	// Memory allocation stack id = STACKIDP			// Memory allocation stack id = STACKIDP
	// alloc_count 1, size (ave/min/max) 40.00 / 40 / 40			// alloc_count 1, size (ave/min/max) 40.00 / 40 / 40
	// access_count (ave/min/max): 3.00 / 3 / 3			// access_count (ave/min/max): 11.00 / 11 / 11
				snehasishUnsubmitted Not Done Reply Inline Actions Update the count in the comments too? I was trying to reason about the counts here: For the first allocation for p = new int[10] - the allocation itself counts as 1 + memset counts for 5 (since kWordSize = 8). How do we account for the remaining 5 since the memcpy and memcmp have 2 full words and 1 half word access? snehasish: Update the count in the comments too? I was trying to reason about the counts here: For the…
				snehasishUnsubmitted Not Done Reply Inline Actions I guess since this is a primitive, the allocation and deallocation don't have accesses and thus the number of accesses is 5 + 3 + 3, rounding up for the half word accesses since the check is addr+size on L269. snehasish: I guess since this is a primitive, the allocation and deallocation don't have accesses and thus…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Yep, the sizes are effectively rounded up by the traversal in __memprof_record_access_range, and as you noted there is no access on allocation, so the 5 +3 + 3 is correct. Will fix the comments tejohnson: Yep, the sizes are effectively rounded up by the traversal in __memprof_record_access_range…
	// but we need to look for them in the same CHECK to get the correct STACKIDP.			// but we need to look for them in the same CHECK to get the correct STACKIDP.
	// CHECK-DAG: Memory allocation stack id = [[STACKIDP:[0-9]+]]{{[[:space:]].}} alloc_count 1, size (ave/min/max) 40.00 / 40 / 40{{[[:space:]].}} access_count (ave/min/max): 3.00 / 3 / 3			// CHECK-DAG: Memory allocation stack id = [[STACKIDP:[0-9]+]]{{[[:space:]].}} alloc_count 1, size (ave/min/max) 40.00 / 40 / 40{{[[:space:]].}} access_count (ave/min/max): 11.00 / 11 / 11
	//			//
	// This is actually:			// This is actually:
	// Memory allocation stack id = STACKIDQ			// Memory allocation stack id = STACKIDQ
	// alloc_count 1, size (ave/min/max) 20.00 / 20 / 20			// alloc_count 1, size (ave/min/max) 20.00 / 20 / 20
	// access_count (ave/min/max): 2.00 / 2 / 2			// access_count (ave/min/max): 6.00 / 6 / 6
	// but we need to look for them in the same CHECK to get the correct STACKIDQ.			// but we need to look for them in the same CHECK to get the correct STACKIDQ.
	// CHECK-DAG: Memory allocation stack id = [[STACKIDQ:[0-9]+]]{{[[:space:]].}} alloc_count 1, size (ave/min/max) 20.00 / 20 / 20{{[[:space:]].}} access_count (ave/min/max): 2.00 / 2 / 2			// CHECK-DAG: Memory allocation stack id = [[STACKIDQ:[0-9]+]]{{[[:space:]].}} alloc_count 1, size (ave/min/max) 20.00 / 20 / 20{{[[:space:]].}} access_count (ave/min/max): 6.00 / 6 / 6

	#include <stdio.h>			#include <stdio.h>
	#include <stdlib.h>			#include <stdlib.h>
	#include <string.h>			#include <string.h>

	int main() {			int main() {
	// This is actually:			// This is actually:
	// Stack for id STACKIDP:			// Stack for id STACKIDP:
	// #0 {{.*}} in operator new			// #0 {{.*}} in operator new
	// #1 {{.}} in main {{.}}:@LINE+1			// #1 {{.}} in main {{.}}:@LINE+1
	// but we need to look for them in the same CHECK-DAG.			// but we need to look for them in the same CHECK-DAG.
	// CHECK-DAG: Stack for id [[STACKIDP]]:{{[[:space:]].}} #0 {{.}} in operator new{{.[[:space:]].}} #1 {{.}} in main {{.}}:[[@LINE+1]]			// CHECK-DAG: Stack for id [[STACKIDP]]:{{[[:space:]].}} #0 {{.}} in operator new{{.[[:space:]].}} #1 {{.}} in main {{.}}:[[@LINE+1]]
	int *p = new int[10];			int *p = new int[10];

	// This is actually:			// This is actually:
	// Stack for id STACKIDQ:			// Stack for id STACKIDQ:
	// #0 {{.*}} in operator new			// #0 {{.*}} in operator new
	// #1 {{.}} in main {{.}}:@LINE+1			// #1 {{.}} in main {{.}}:@LINE+1
	// but we need to look for them in the same CHECK-DAG.			// but we need to look for them in the same CHECK-DAG.
	// CHECK-DAG: Stack for id [[STACKIDQ]]:{{[[:space:]].}} #0 {{.}} in operator new{{.[[:space:]].}} #1 {{.}} in main {{.}}:[[@LINE+1]]			// CHECK-DAG: Stack for id [[STACKIDQ]]:{{[[:space:]].}} #0 {{.}} in operator new{{.[[:space:]].}} #1 {{.}} in main {{.}}:[[@LINE+1]]
	int *q = new int[5];			int *q = new int[5];

	memset(p, 1, 10);			memset(p, 1, 10 * sizeof(int));
	memcpy(q, p, 5);			memcpy(q, p, 5 * sizeof(int));
	int x = memcmp(p, q, 5);			int x = memcmp(p, q, 5 * sizeof(int));
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Fixed the size on these calls to what was intended, since I had to update the access counts anyway tejohnson: Fixed the size on these calls to what was intended, since I had to update the access counts…

	delete[] p;			delete[] p;
	delete[] q;			delete[] q;

	return x;			return x;
	}			}

compiler-rt/test/memprof/TestCases/unaligned_loads_and_stores.cpp

	// RUN: %clangxx_memprof -O0 %s -o %t && %env_memprof_opts=log_path=stderr %run %t 2>&1 \| FileCheck %s			// RUN: %clangxx_memprof -O0 %s -o %t && %env_memprof_opts=log_path=stderr %run %t 2>&1 \| FileCheck %s

	// This is actually:			// This is actually:
	// Memory allocation stack id = STACKID			// Memory allocation stack id = STACKID
	// alloc_count 1, size (ave/min/max) 128.00 / 128 / 128			// alloc_count 1, size (ave/min/max) 128.00 / 128 / 128
	// but we need to look for them in the same CHECK to get the correct STACKID.			// but we need to look for them in the same CHECK to get the correct STACKID.
	// CHECK: Memory allocation stack id = [[STACKID:[0-9]+]]{{[[:space:]].*}}alloc_count 1, size (ave/min/max) 128.00 / 128 / 128			// CHECK: Memory allocation stack id = [[STACKID:[0-9]+]]{{[[:space:]].*}}alloc_count 1, size (ave/min/max) 128.00 / 128 / 128
	// CHECK-NEXT: access_count (ave/min/max): 7.00 / 7 / 7			// CHECK-NEXT: access_count (ave/min/max): 22.00 / 22 / 22

	#include <sanitizer/memprof_interface.h>			#include <sanitizer/memprof_interface.h>

	#include <stdlib.h>			#include <stdlib.h>
	#include <string.h>			#include <string.h>
	int main(int argc, char **argv) {			int main(int argc, char **argv) {
	// CHECK: Stack for id [[STACKID]]:			// CHECK: Stack for id [[STACKID]]:
	// CHECK-NEXT: #0 {{.*}} in operator new[](unsigned long)			// CHECK-NEXT: #0 {{.*}} in operator new[](unsigned long)
	Show All 14 Lines