This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/
-
Analysis/GlobalsModRef/
-
GlobalsModRef/
3
nosync_nocallback.ll
-
CodeGen/AMDGPU/
-
AMDGPU/
1
noclobber-barrier.ll
-
TableGen/
-
intrin-side-effects.td
-
Transforms/OpenMP/
-
OpenMP/
-
barrier_removal.ll
-
utils/TableGen/
-
TableGen/
-
IntrinsicEmitter.cpp

Differential D137937

[TableGen] Represent IntrHasSideEffects using inaccessiblemem read+write
Needs ReviewPublic

Authored by nikic on Nov 14 2022, 5:23 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
nhaehnle
arsenm
foad

Summary

Map IntrHasSideEffects to a read+write of inaccessible memory, which is the usual way to represent a side effect that cannot be modelled more precisely.

This means that IntrNoMem + IntrHasSideEffects is reduced from reading and writing all memory to only accessing inaccessible memory, while IntrReadMem + IntrHasSideEffects is now no longer readonly (which is clearly incorrect as it e.g. allows simply removing the side effect).

The fact that IntrNoMem + IntrHasSideEffects is no longer an arbitrary read and write may cause issues for existing intrinsics marked as such. In particular, I think the llvm.amdgcn.s.barrier test changes here look incorrect -- if I understand the semantics of that intrinsic correctly, then moving loads/stores across the intrinsic is not legal, and the current IntrNoMem marking is simply incorrect. Can somebody from the AMDGPU side please confirm what the intended semantics are?

Diff Detail

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > Clang.CodeGen::aarch64-tme.cpp

Event Timeline

nikic created this revision.Nov 14 2022, 5:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 5:23 AM

Herald added subscribers: kosarev, kerbowa, tpr, jvesely. · View Herald Transcript

nikic requested review of this revision.Nov 14 2022, 5:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 5:23 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B197508: Diff 475113.Nov 14 2022, 5:48 AM

Map IntrHasSideEffects to a read+write of inaccessible memory, which is the usual way to represent a side effect that cannot be modelled more precisely. [emphasis mine]

Is it? Does that mean that things-with-side-effects only interact with other things-with-side-effects? Is that really the intention of the hasSideEffects flag on an intrinsic?

In D137937#3924807, @foad wrote:

Map IntrHasSideEffects to a read+write of inaccessible memory, which is the usual way to represent a side effect that cannot be modelled more precisely. [emphasis mine]

Is it? Does that mean that things-with-side-effects only interact with other things-with-side-effects? Is that really the intention of the hasSideEffects flag on an intrinsic?

It's the closest we can represent. Of course, it the intrinsic has other effects that we explicitly model (such as ordinary memory effects, unwinding effects or divergence effects) then those need to be explicitly stated, but if the effect doesn't fall into one of those categories, then we can currently only model it via inaccessible memory, which at least enforces some of the basic expectations for side-effecting instructions (no DCE, no reordering relative to other effects, no movement across control flow).

I think this is a step in the right direction.
If for intrinsics we need to change their memory effects now, we should be able to do so, I somewhat doubt we need it though.

LG from me. Others might want to chime in.

uabelho added a subscriber: uabelho.Nov 16 2022, 2:13 AM

arsenm added inline comments.Nov 16 2022, 5:13 PM

llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll
225	llvm.amdgcn.s.barrier and llvm.amdgcn.wave.barrier should both act like synchronizes without real memory effects, So I guess it should like a fence, but they're also typically emitted together with a fence

arsenm added inline comments.Nov 17 2022, 3:38 PM

llvm/test/Analysis/GlobalsModRef/nosync_nocallback.ll
34	As counter intuitive as it is, I think this is correct. Can you add a second copy of this test. that uses a fence to show it still isn't moved?

jdoerfert added inline comments.Nov 18 2022, 2:02 PM

llvm/test/Analysis/GlobalsModRef/nosync_nocallback.ll
34	If this is legal our GPU code isn't. Barrier should be `sync`, and this should not hoist. I'm confused.

arsenm added inline comments.Nov 18 2022, 2:10 PM

llvm/test/Analysis/GlobalsModRef/nosync_nocallback.ll
34	Yes, it should be sync.

Opened https://github.com/llvm/llvm-project/issues/59076. For now, to move this ahead, we need to remove nomem from all AMDGPU barriers. https://github.com/llvm/llvm-project/blob/3c36de55f5e60dee8f1bc04bd201f6dd762b3423/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L220

In D137937#3938649, @jdoerfert wrote:

Opened https://github.com/llvm/llvm-project/issues/59076. For now, to move this ahead, we need to remove nomem from all AMDGPU barriers. https://github.com/llvm/llvm-project/blob/3c36de55f5e60dee8f1bc04bd201f6dd762b3423/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L220

So I tried this, but it doesn't seem to be that simple. There's the diff: https://gist.github.com/nikic/507f6ee3276e66d76b0d4a0c2b9ad7ce

Notably, if we make the barrier read/write in IR, we also need to set mayLoad/mayStore in DAG, and this impact scheduling. I don't know enough about AMDGPU to really interpret those changes, but it looks like we were scheduling at least some loads across a barrier?

mayLoad and mayStore should be set to 0 for barriers. Setting either to one implies there should be a memory operand, which is not the case.

CodeGen has the explicit hasSideEffects for unmodeled / nonmemory side effects unlike the IR. If TableGen is complaining about requiring mayLoad/mayStore here, then that's a tablegen bug. IntrHasSideEffects should be enough to cover this

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 17 2023, 3:27 PM

Revision Contents

Path

Size

llvm/

test/

Analysis/

GlobalsModRef/

nosync_nocallback.ll

7 lines

CodeGen/

AMDGPU/

noclobber-barrier.ll

2 lines

TableGen/

intrin-side-effects.td

1 line

Transforms/

OpenMP/

barrier_removal.ll

2 lines

utils/

TableGen/

IntrinsicEmitter.cpp

7 lines

Diff 475113

llvm/test/Analysis/GlobalsModRef/nosync_nocallback.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-attributes
	; RUN: opt -aa-pipeline=basic-aa,globals-aa -passes='require<globals-aa>,gvn' -S < %s \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa,globals-aa -passes='require<globals-aa>,gvn' -S < %s \| FileCheck %s

	; Make sure we do not hoist the load before the intrinsic, unknown function, or			; Make sure we do not hoist the load before the intrinsic, unknown function, or
	; optnone function except if we know the unknown function is nosync and nocallback.			; optnone function except if we know the unknown function is nosync and nocallback.

	@G1 = internal global i32 undef			@G1 = internal global i32 undef
	@G2 = internal global i32 undef			@G2 = internal global i32 undef
	@G3 = internal global i32 undef			@G3 = internal global i32 undef
	@G4 = internal global i32 undef			@G4 = internal global i32 undef

	define void @test_barrier(i1 %c) {			define void @test_barrier(i1 %c) {
	; CHECK-LABEL: define {{[^@]+}}@test_barrier			; CHECK-LABEL: define {{[^@]+}}@test_barrier
	; CHECK-SAME: (i1 [[C:%.*]]) {			; CHECK-SAME: (i1 [[C:%.*]]) {
	; CHECK-NEXT: br i1 [[C]], label [[INIT:%.]], label [[CHECK:%.]]			; CHECK-NEXT: br i1 [[C]], label [[INIT:%.]], label [[DOTCHECK_CRIT_EDGE:%.]]
				; CHECK: .check_crit_edge:
				; CHECK-NEXT: [[V_PRE:%.*]] = load i32, ptr @G1, align 4
				; CHECK-NEXT: br label [[CHECK:%.*]]
	; CHECK: init:			; CHECK: init:
	; CHECK-NEXT: store i32 0, ptr @G1, align 4			; CHECK-NEXT: store i32 0, ptr @G1, align 4
	; CHECK-NEXT: br label [[CHECK]]			; CHECK-NEXT: br label [[CHECK]]
	; CHECK: check:			; CHECK: check:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V_PRE]], [[DOTCHECK_CRIT_EDGE]] ], [ 0, [[INIT]] ]
	; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()			; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
	; CHECK-NEXT: [[V:%.*]] = load i32, ptr @G1, align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[V]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[V]], 0
	; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])			; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	br i1 %c, label %init, label %check			br i1 %c, label %init, label %check
	init:			init:
	store i32 0, ptr @G1			store i32 0, ptr @G1
	br label %check			br label %check
	check:			check:
	call void @llvm.amdgcn.s.barrier()			call void @llvm.amdgcn.s.barrier()
				arsenmUnsubmitted Not Done Reply Inline Actions As counter intuitive as it is, I think this is correct. Can you add a second copy of this test. that uses a fence to show it still isn't moved? arsenm: As counter intuitive as it is, I think this is correct. Can you add a second copy of this test.
				jdoerfertUnsubmitted Not Done Reply Inline Actions If this is legal our GPU code isn't. Barrier should be `sync`, and this should not hoist. I'm confused. jdoerfert: If this is legal our GPU code isn't. Barrier should be `sync`, and this should not hoist. I'm…
				arsenmUnsubmitted Not Done Reply Inline Actions Yes, it should be sync. arsenm: Yes, it should be sync.
	%v = load i32, ptr @G1			%v = load i32, ptr @G1
	%cmp = icmp eq i32 %v, 0			%cmp = icmp eq i32 %v, 0
	call void @llvm.assume(i1 %cmp)			call void @llvm.assume(i1 %cmp)
	ret void			ret void
	}			}

	define void @test_unknown(i1 %c) {			define void @test_unknown(i1 %c) {
	; CHECK-LABEL: define {{[^@]+}}@test_unknown			; CHECK-LABEL: define {{[^@]+}}@test_unknown
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	if.end:
%i2 = load i32, i32 addrspace(1)* %i1, align 4		%i2 = load i32, i32 addrspace(1)* %i1, align 4
%i3 = add i32 %i2, %i		%i3 = add i32 %i2, %i
%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2		%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2
store i32 %i3, i32 addrspace(1)* %i4, align 4		store i32 %i3, i32 addrspace(1)* %i4, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}no_clobbering_loop1:		; GCN-LABEL: {{^}}no_clobbering_loop1:
; GCN: s_load_dword s		; GCN: s_load_dwordx2 s
; GCN: s_load_dword s		; GCN: s_load_dword s
; GCN-NOT: global_load_dword		; GCN-NOT: global_load_dword
; GCN: global_store_dword		; GCN: global_store_dword
define amdgpu_kernel void @no_clobbering_loop1(i32 addrspace(1)* %arg, i1 %cc) {		define amdgpu_kernel void @no_clobbering_loop1(i32 addrspace(1)* %arg, i1 %cc) {
; CHECK-LABEL: @no_clobbering_loop1(		; CHECK-LABEL: @no_clobbering_loop1(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[I:%.]] = load i32, i32 addrspace(1) [[ARG:%.*]], align 4, !amdgpu.noclobber !0		; CHECK-NEXT: [[I:%.]] = load i32, i32 addrspace(1) [[ARG:%.*]], align 4, !amdgpu.noclobber !0
; CHECK-NEXT: br label [[WHILE_COND:%.*]], !amdgpu.uniform !0		; CHECK-NEXT: br label [[WHILE_COND:%.*]], !amdgpu.uniform !0
Show All 13 Lines	bb:
br label %while.cond		br label %while.cond

while.cond:		while.cond:
%i1 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 1		%i1 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 1
%i2 = load i32, i32 addrspace(1)* %i1, align 4		%i2 = load i32, i32 addrspace(1)* %i1, align 4
%i3 = add i32 %i2, %i		%i3 = add i32 %i2, %i
%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2		%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2
store i32 %i3, i32 addrspace(1)* %i4, align 4		store i32 %i3, i32 addrspace(1)* %i4, align 4
tail call void @llvm.amdgcn.wave.barrier()		tail call void @llvm.amdgcn.wave.barrier()
		arsenmUnsubmitted Not Done Reply Inline Actions llvm.amdgcn.s.barrier and llvm.amdgcn.wave.barrier should both act like synchronizes without real memory effects, So I guess it should like a fence, but they're also typically emitted together with a fence arsenm: llvm.amdgcn.s.barrier and llvm.amdgcn.wave.barrier should both act like synchronizes without…
br i1 %cc, label %while.cond, label %end		br i1 %cc, label %while.cond, label %end

end:		end:
ret void		ret void
}		}

; GCN-LABEL: {{^}}no_clobbering_loop2:		; GCN-LABEL: {{^}}no_clobbering_loop2:
; GCN: s_load_dword s		; GCN: s_load_dword s
▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines

llvm/test/TableGen/intrin-side-effects.td

	Show All 38 Lines

	// ... this intrinsic.			// ... this intrinsic.
	def int_random_gen : Intrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrHasSideEffects]>;			def int_random_gen : Intrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrHasSideEffects]>;

	// CHECK: static AttributeSet getIntrinsicFnAttributeSet(			// CHECK: static AttributeSet getIntrinsicFnAttributeSet(
	// CHECK: case 0:			// CHECK: case 0:
	// CHECK-NEXT: return AttributeSet::get(C, {			// CHECK-NEXT: return AttributeSet::get(C, {
	// CHECK-NEXT: Attribute::get(C, Attribute::NoUnwind),			// CHECK-NEXT: Attribute::get(C, Attribute::NoUnwind),
				// CHECK-NEXT: Attribute::getWithMemoryEffects(C, MemoryEffects::createFromIntValue(12)),
	// CHECK-NEXT: });			// CHECK-NEXT: });

	// CHECK: 1, // llvm.random.gen			// CHECK: 1, // llvm.random.gen
	// CHECK: case 1:			// CHECK: case 1:
	// CHECK-NEXT: AS[0] = {AttributeList::FunctionIndex, getIntrinsicFnAttributeSet(C, 0)};			// CHECK-NEXT: AS[0] = {AttributeList::FunctionIndex, getIntrinsicFnAttributeSet(C, 0)};

llvm/test/Transforms/OpenMP/barrier_removal.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	!9 = !{void ()* @pos_priv_mem, !"kernel", i32 1}			!9 = !{void ()* @pos_priv_mem, !"kernel", i32 1}
	!10 = !{void ()* @neg_mem, !"kernel", i32 1}			!10 = !{void ()* @neg_mem, !"kernel", i32 1}
	!11 = !{void ()* @pos_multiple, !"kernel", i32 1}			!11 = !{void ()* @pos_multiple, !"kernel", i32 1}
	!12 = !{i32 7, !"openmp", i32 50}			!12 = !{i32 7, !"openmp", i32 50}
	!13 = !{i32 7, !"openmp-device", i32 50}			!13 = !{i32 7, !"openmp-device", i32 50}
	;.			;.
	; CHECK: attributes #[[ATTR0:[0-9]+]] = { "llvm.assume"="ompx_aligned_barrier" }			; CHECK: attributes #[[ATTR0:[0-9]+]] = { "llvm.assume"="ompx_aligned_barrier" }
	; CHECK: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }			; CHECK: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
	; CHECK: attributes #[[ATTR2:[0-9]+]] = { convergent nounwind willreturn }			; CHECK: attributes #[[ATTR2:[0-9]+]] = { convergent nounwind willreturn memory(inaccessiblemem: readwrite) }
	;.			;.
	; CHECK: [[META0:![0-9]+]] = !{i32 7, !"openmp", i32 50}			; CHECK: [[META0:![0-9]+]] = !{i32 7, !"openmp", i32 50}
	; CHECK: [[META1:![0-9]+]] = !{i32 7, !"openmp-device", i32 50}			; CHECK: [[META1:![0-9]+]] = !{i32 7, !"openmp-device", i32 50}
	; CHECK: [[META2:![0-9]+]] = !{void ()* @pos_empty_1, !"kernel", i32 1}			; CHECK: [[META2:![0-9]+]] = !{void ()* @pos_empty_1, !"kernel", i32 1}
	; CHECK: [[META3:![0-9]+]] = !{void ()* @pos_empty_2, !"kernel", i32 1}			; CHECK: [[META3:![0-9]+]] = !{void ()* @pos_empty_2, !"kernel", i32 1}
	; CHECK: [[META4:![0-9]+]] = !{void ()* @pos_empty_3, !"kernel", i32 1}			; CHECK: [[META4:![0-9]+]] = !{void ()* @pos_empty_3, !"kernel", i32 1}
	; CHECK: [[META5:![0-9]+]] = !{void ()* @pos_empty_4, !"kernel", i32 1}			; CHECK: [[META5:![0-9]+]] = !{void ()* @pos_empty_4, !"kernel", i32 1}
	; CHECK: [[META6:![0-9]+]] = !{void ()* @pos_empty_5, !"kernel", i32 1}			; CHECK: [[META6:![0-9]+]] = !{void ()* @pos_empty_5, !"kernel", i32 1}
	; CHECK: [[META7:![0-9]+]] = !{void ()* @pos_empty_6, !"kernel", i32 1}			; CHECK: [[META7:![0-9]+]] = !{void ()* @pos_empty_6, !"kernel", i32 1}
	; CHECK: [[META8:![0-9]+]] = !{void ()* @neg_empty_7, !"kernel", i32 1}			; CHECK: [[META8:![0-9]+]] = !{void ()* @neg_empty_7, !"kernel", i32 1}
	; CHECK: [[META9:![0-9]+]] = !{void ()* @pos_constant_loads, !"kernel", i32 1}			; CHECK: [[META9:![0-9]+]] = !{void ()* @pos_constant_loads, !"kernel", i32 1}
	; CHECK: [[META10:![0-9]+]] = !{void ()* @neg_loads, !"kernel", i32 1}			; CHECK: [[META10:![0-9]+]] = !{void ()* @neg_loads, !"kernel", i32 1}
	; CHECK: [[META11:![0-9]+]] = !{void ()* @pos_priv_mem, !"kernel", i32 1}			; CHECK: [[META11:![0-9]+]] = !{void ()* @pos_priv_mem, !"kernel", i32 1}
	; CHECK: [[META12:![0-9]+]] = !{void ()* @neg_mem, !"kernel", i32 1}			; CHECK: [[META12:![0-9]+]] = !{void ()* @neg_mem, !"kernel", i32 1}
	; CHECK: [[META13:![0-9]+]] = !{void ()* @pos_multiple, !"kernel", i32 1}			; CHECK: [[META13:![0-9]+]] = !{void ()* @pos_multiple, !"kernel", i32 1}
	;.			;.

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	for (const CodeGenIntrinsic &Intrinsic : Ints) {
if (Intrinsic.isNoMerge)		if (Intrinsic.isNoMerge)
OS << " Attribute::get(C, Attribute::NoMerge),\n";		OS << " Attribute::get(C, Attribute::NoMerge),\n";
if (Intrinsic.isConvergent)		if (Intrinsic.isConvergent)
OS << " Attribute::get(C, Attribute::Convergent),\n";		OS << " Attribute::get(C, Attribute::Convergent),\n";
if (Intrinsic.isSpeculatable)		if (Intrinsic.isSpeculatable)
OS << " Attribute::get(C, Attribute::Speculatable),\n";		OS << " Attribute::get(C, Attribute::Speculatable),\n";

MemoryEffects ME = Intrinsic.ME;		MemoryEffects ME = Intrinsic.ME;
// TODO: IntrHasSideEffects should affect not only readnone intrinsics.		// Approximate side effects as a read and write of inaccessible memory.
if (ME.doesNotAccessMemory() && Intrinsic.hasSideEffects)		// This imposes an ordering-constraint on side effects.
ME = MemoryEffects::unknown();		if (Intrinsic.hasSideEffects)
		ME \|= MemoryEffects::inaccessibleMemOnly();
if (ME != MemoryEffects::unknown()) {		if (ME != MemoryEffects::unknown()) {
OS << " Attribute::getWithMemoryEffects(C, "		OS << " Attribute::getWithMemoryEffects(C, "
<< "MemoryEffects::createFromIntValue(" << ME.toIntValue() << ")),\n";		<< "MemoryEffects::createFromIntValue(" << ME.toIntValue() << ")),\n";
}		}
OS << " });\n";		OS << " });\n";
}		}
OS << " }\n";		OS << " }\n";
OS << "}\n\n";		OS << "}\n\n";
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines