A global pointer cast to constant address space does not necessarily
mean the memory will not be modified, but may mean it will not be
modified prior to a specific use.
Fixes: SWDEV-326463
Differential D121157
[AMDGPU] always use underlying object in the pointsToConstantMemory rampitec on Mar 7 2022, 1:40 PM. Authored by
Details
A global pointer cast to constant address space does not necessarily Fixes: SWDEV-326463
Diff Detail Event TimelineComment Actions I am not sure this is a correct approach, but in the getAliasResult() we return tells that global, flat and constant may alias, so it is not ruled out. This is the regression after D119886. The alternative fix would be to attach amdgpu.noclobber metadata on the load there instead of casting to constant but I cannot rule out such cast may appear for a different reason. I can write a source program like that.
Comment Actions @arsenm Do you think it makes sense to check if getUnderlyingObject actually brought one of: Argument, GlobalValue, or ConstantAggregate? Comment Actions I would assume the base implementation handles that, but we're already checking for GlobalValue Comment Actions isConstant is not the same as constant address space though. Then for argument a qualifier const also does not replace constant. Comment Actions The point of the method is to return if the memory is modified, not if it's constant address space Comment Actions Which is redundant with the constant modifier on a global variable. The address space is superfluous here. Loads can be marked with invariant metadata too Comment Actions For global variable many people use just constant w/o const. For argument using const does not help at all w/o noalias, and even with noalias outside of a kernel. Then the invariant metadata will result in the same regression as an invariant load can be rescheduled virtually anywhere. I see 2 options to fix the bug:
Comment Actions I think both are necessary, and checking for constant address space at all may be wrong Comment Actions Ugh. 1) does not really work because of the inttoptr our BE started to produce recently to get to the arguments. Comment Actions Agree. It is from the SILoadStoreOptimizer: *** IR Dump After GPU Load and Store Vectorizer (load-store-vectorizer) *** define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) #0 { entry: %const_arg_does_not_alias_global.kernarg.segment = call nonnull align 16 dereferenceable(52) i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr() %arg.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 36 %arg.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.kernarg.offset to i32 addrspace(1)* addrspace(4)* %0 = bitcast i32 addrspace(1)* addrspace(4)* %arg.kernarg.offset.cast to <2 x i64> addrspace(4)* %1 = load <2 x i64>, <2 x i64> addrspace(4)* %0, align 4, !invariant.load !0 %arg.load2 = extractelement <2 x i64> %1, i32 0 %2 = inttoptr i64 %arg.load2 to i32 addrspace(1)* %arg.const.load3 = extractelement <2 x i64> %1, i32 1 %3 = inttoptr i64 %arg.const.load3 to i32 addrspace(4)* %arg.const.kernarg.offset = getelementptr inbounds i8, i8 addrspace(4)* %const_arg_does_not_alias_global.kernarg.segment, i64 44 %arg.const.kernarg.offset.cast = bitcast i8 addrspace(4)* %arg.const.kernarg.offset to i32 addrspace(4)* addrspace(4)* %id = tail call i32 @llvm.amdgcn.workitem.id.x(), !range !1 %idxprom = sext i32 %id to i64 %ptr = getelementptr inbounds i32, i32 addrspace(1)* %2, i64 %idxprom %ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %3, i64 %idxprom store i32 42, i32 addrspace(1)* %ptr, align 4 %v = load i32, i32 addrspace(4)* %ptr.const, align 4 store i32 %v, i32* undef, align 4 ret void } Comment Actions @arsenm believe it or not, this is actually your commit 42ad17059acc677b5d759cb02bd008e73ffacd4b from 2016! This: + if (LoadTy->isPtrOrPtrVectorTy()) { + LoadTy = Type::getIntNTy(F.getParent()->getContext(), + DL.getTypeSizeInBits(LoadTy)); + break; + } Try to compile this: define amdgpu_kernel void @const_arg_does_not_alias_global(i32 addrspace(1)* %arg, i32 addrspace(4)* %arg.const) { entry: %id = tail call i32 @llvm.amdgcn.workitem.id.x() %ptr = getelementptr inbounds i32, i32 addrspace(1)* %arg, i32 %id %ptr.const = getelementptr inbounds i32, i32 addrspace(4)* %arg.const, i32 %id store i32 42, i32 addrspace(1)* %ptr %v = load i32, i32 addrspace(4)* %ptr.const store i32 %v, i32* undef ret void } Comment Actions And even if it was not vectorized we cannot get the argument after the argument lowering anymore anyway: %arg.const.load = load i32 addrspace(4)*, i32 addrspace(4)* addrspace(4)* %arg.const.kernarg.offset.cast, align 4, !invariant.load !0 Apparently pointsToConstantMemory returns false for this. |
I think using the getUnderlyingObject result has the same fundamental problem. It looks as far back as it can, but doesn't necessarily find the object definition. e.g. you could hit a call that returns the pointer which casted the pointer inside it