This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] always use underlying object in the pointsToConstantMemory
AbandonedPublic

Authored by rampitec on Mar 7 2022, 1:40 PM.

Details

Reviewers
arsenm
Summary

A global pointer cast to constant address space does not necessarily
mean the memory will not be modified, but may mean it will not be
modified prior to a specific use.

Fixes: SWDEV-326463

Diff Detail

Event Timeline

rampitec created this revision.Mar 7 2022, 1:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 1:40 PM
rampitec requested review of this revision.Mar 7 2022, 1:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 1:40 PM
Herald added a subscriber: wdng. · View Herald Transcript

I am not sure this is a correct approach, but in the getAliasResult() we return tells that global, flat and constant may alias, so it is not ruled out.

This is the regression after D119886. The alternative fix would be to attach amdgpu.noclobber metadata on the load there instead of casting to constant but I cannot rule out such cast may appear for a different reason. I can write a source program like that.

arsenm added inline comments.Mar 7 2022, 1:50 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
135

I think using the getUnderlyingObject result has the same fundamental problem. It looks as far back as it can, but doesn't necessarily find the object definition. e.g. you could hit a call that returns the pointer which casted the pointer inside it

rampitec added inline comments.Mar 7 2022, 1:53 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
135

A call shall return what function type prescribes. I would guess if a function returns a pointer constant address space it promises the constantness of the memory, otherwise constant will lose it whole point.

arsenm added inline comments.Mar 7 2022, 2:00 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
135

It's exactly the same logic as the direct cast. i agree constant address space is useless and should be removed

rampitec added inline comments.Mar 7 2022, 2:03 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
135

I see one exception though: a call originates at an user's code and a call signature is a promise. A direct cast can be generated. The difference is foggy but still seem to exist.

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

arsenm added a comment.Mar 7 2022, 2:13 PM

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant.

arsenm added a comment.Mar 7 2022, 2:21 PM

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant.

The point of the method is to return if the memory is modified, not if it's constant address space

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant.

The point of the method is to return if the memory is modified, not if it's constant address space

Right, but people use constant address space to tell it is not modified.

arsenm added a comment.Mar 7 2022, 2:25 PM

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant.

The point of the method is to return if the memory is modified, not if it's constant address space

Right, but people use constant address space to tell it is not modified.

Which is redundant with the constant modifier on a global variable. The address space is superfluous here. Loads can be marked with invariant metadata too

@arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate?

I would assume the base implementation handles that, but we're already checking for GlobalValue

isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant.

The point of the method is to return if the memory is modified, not if it's constant address space

Right, but people use constant address space to tell it is not modified.

Which is redundant with the constant modifier on a global variable. The address space is superfluous here. Loads can be marked with invariant metadata too

For global variable many people use just constant w/o const. For argument using const does not help at all w/o noalias, and even with noalias outside of a kernel. Then the invariant metadata will result in the same regression as an invariant load can be rescheduled virtually anywhere.

I see 2 options to fix the bug:

  1. Go with this patch potentially checking what getUnderlyingObject brought;
  2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.
arsenm added a comment.Mar 7 2022, 2:35 PM

I see 2 options to fix the bug:

  1. Go with this patch potentially checking what getUnderlyingObject brought;
  2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.

I think both are necessary, and checking for constant address space at all may be wrong

I see 2 options to fix the bug:

  1. Go with this patch potentially checking what getUnderlyingObject brought;
  2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.

I think both are necessary, and checking for constant address space at all may be wrong

Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments.

rampitec updated this revision to Diff 413639.Mar 7 2022, 2:52 PM

Moved test into right folder.

arsenm added a comment.Mar 7 2022, 2:52 PM

I see 2 options to fix the bug:

  1. Go with this patch potentially checking what getUnderlyingObject brought;
  2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.

I think both are necessary, and checking for constant address space at all may be wrong

Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments.

Where is inttoptr introduced? That should never happen

I see 2 options to fix the bug:

  1. Go with this patch potentially checking what getUnderlyingObject brought;
  2. Change AMDGPUPromoteKernelArguments to attach noclobber metadata instead of a cast. That promise will hold unlike an invariant.

I think both are necessary, and checking for constant address space at all may be wrong

Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments.

Where is inttoptr introduced? That should never happen

Agree. It is from the SILoadStoreOptimizer:

*** IR Dump After GPU Load and Store Vectorizer (load-store-vectorizer) ***
define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) #0 {
entry:
  %const_arg_does_not_alias_global.kernarg.segment = call nonnull align 16 dereferenceable(52) i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()
  %arg.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 36
  %arg.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.kernarg.offset to i32 addrspace(1)* addrspace(4)*
  %0 = bitcast i32 addrspace(1)* addrspace(4)* %arg.kernarg.offset.cast to <2 x i64> addrspace(4)*
  %1 = load <2 x i64>, <2 x i64> addrspace(4)* %0, align 4, !invariant.load !0
  %arg.load2 = extractelement <2 x i64> %1, i32 0
  %2 = inttoptr i64 %arg.load2 to i32 addrspace(1)*
  %arg.const.load3 = extractelement <2 x i64> %1, i32 1
  %3 = inttoptr i64 %arg.const.load3 to i32 addrspace(4)*
  %arg.const.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 44
  %arg.const.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.const.kernarg.offset to i32 addrspace(4)* addrspace(4)*
  %id = tail call i32 @llvm.amdgcn.workitem.id.x(), !range !1
  %idxprom = sext i32 %id to i64
  %ptr = getelementptr inbounds i32, i32 addrspace(1)* %2, i64 %idxprom
  %ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %3, i64 %idxprom
  store i32 42, i32 addrspace(1)* %ptr, align 4
  %v = load i32, i32 addrspace(4)* %ptr.const, align 4
  store i32 %v, i32* undef, align 4
  ret void
}

Sorry, it is actually LoadStoreVectorizer.cpp.

Sorry, it is actually LoadStoreVectorizer.cpp.

@arsenm believe it or not, this is actually your commit 42ad17059acc677b5d759cb02bd008e73ffacd4b from 2016! This:

+    if (LoadTy->isPtrOrPtrVectorTy()) {
+      LoadTy = Type::getIntNTy(F.getParent()->getContext(),
+                               DL.getTypeSizeInBits(LoadTy));
+      break;
+    }

Try to compile this:

define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) {
entry:
  %id = tail call i32 @llvm.amdgcn.workitem.id.x()
  %ptr = getelementptr inbounds i32, i32 addrspace(1)* %arg, i32 %id
  %ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %arg.const, i32 %id
  store i32 42, i32 addrspace(1)* %ptr
  %v = load i32, i32 addrspace(4)* %ptr.const
  store i32 %v, i32* undef
  ret void
}
arsenm added a comment.Mar 7 2022, 3:11 PM

Sorry, it is actually LoadStoreVectorizer.cpp.

@arsenm believe it or not, this is actually your commit 42ad17059acc677b5d759cb02bd008e73ffacd4b from 2016! This:

+    if (LoadTy->isPtrOrPtrVectorTy()) {
+      LoadTy = Type::getIntNTy(F.getParent()->getContext(),
+                               DL.getTypeSizeInBits(LoadTy));
+      break;
+    }

Ugh, there isn't another way to do merge a pointer with a non pointer load

rampitec updated this revision to Diff 413640.Mar 7 2022, 3:11 PM

Added test with the constant pointer argument.

Sorry, it is actually LoadStoreVectorizer.cpp.

@arsenm believe it or not, this is actually your commit 42ad17059acc677b5d759cb02bd008e73ffacd4b from 2016! This:

+    if (LoadTy->isPtrOrPtrVectorTy()) {
+      LoadTy = Type::getIntNTy(F.getParent()->getContext(),
+                               DL.getTypeSizeInBits(LoadTy));
+      break;
+    }

Ugh, there isn't another way to do merge a pointer with a non pointer load

But both are pointers!

And even if it was not vectorized we cannot get the argument after the argument lowering anymore anyway:

%arg.const.load = load i32 addrspace(4)*, i32 addrspace(4)* addrspace(4)* %arg.const.kernarg.offset.cast, align 4, !invariant.load !0

Apparently pointsToConstantMemory returns false for this.

rampitec updated this revision to Diff 413653.Mar 7 2022, 4:08 PM

Added one more test where we cannot get to the Argument with getUnderlyingObject.

rampitec abandoned this revision.Mar 11 2022, 3:18 PM