This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix function pointer argument bug in AMDGPU Propagate Attributes pass.
ClosedPublic

Authored by jweightman on May 4 2021, 10:52 AM.

Details

Summary

This patch fixes a bug in the AMDGPU Propagate Attributes pass where a call
instruction with a function pointer argument is identified as a user of the
passed function, and illegally replaces the called function of the
instruction with the function argument.

For example, given functions f and g with appropriate types, the following
illegal transformation could occur without this fix:
call void @f(void ()* @g)
-->
call void @g(void ()* @g.1)

The solution introduced in this patch is to prevent the cloning and
substitution if the instruction's called function and the function which
might be cloned do not match.

Diff Detail

Event Timeline

jweightman created this revision.May 4 2021, 10:52 AM
jweightman requested review of this revision.May 4 2021, 10:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 10:52 AM
jweightman added a project: Restricted Project.
arsenm added inline comments.May 4 2021, 11:04 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

What if the callee was a bitcast of the function? Can you add another testcase with a constexpr function pointer cast?

jweightman added inline comments.May 4 2021, 2:08 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

I tried adding this to the LIT test:

    %h_cast = bitcast i32* ()* @h to i8* ()*
    %1 = call i8* %h_cast()
...
define private i32* @h() #1 {
    %1 = alloca i32
    ret i32* %1
}

and the pass does not propagate the attributes like one would probably want. However, this seems pretty orthogonal to the problem I was trying to fix (preventing the illegal callee change), because the "user" is an llvm::BitCastInstr so CI is always nullptr in that case.

In order to handle bitcasts, I think we would need new logic to detect which bitcasted functions need to be replaced, a new data structure to track them in (similar to the existing ToReplace), and new logic to do that replacement — all of which would be on a separate logical branch from the existing code for handling call instructions! Unless you see an easier way, I think such a change is beyond the scope of what I was trying to do.

arsenm added inline comments.May 4 2021, 2:11 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

This example isn't a constant expression cast. If you run opt -S -instcombine on this, it should turn it into the constant expr cast form. In that case the user will still be CallInst, it's just CI->getCalledFunction() will be a constantexpr cast of the function. stripPointerCasts should get you the base function

jweightman added inline comments.May 4 2021, 3:49 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

I see. I misunderstood your initial comment. This seems like a reasonable thing to try to support with this diff.

The instcombine pass seems to prefer to cast the argument/return value (I modified the test to try both) rather than the function pointer itself. Going back to the IR Language Reference, I saw the constant expression section (I'm still new at this...), so I tried this IR:

%1 = alloca i8
%2 = call i8* bitcast (i32* ()* @h to i8* ()*)()

However, the Propagate Attributes pass doesn't produce a clone of @h in this situation. I see that the dyn_cast<Instruction>(U) fails because the user of h is just the constexpr bitcast:

> U->dump()
void (i8*)* bitcast (void (i32*)* @g to void (i8*)*)

Sorry for the back and forth, but is there some way to find an enclosing CallBase if it exists?

Drive by only

llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
257

The usual way to deal with this is bool CallBase::isCallee(const Use *U). It might make a difference if you have foo(&foo).

arsenm added inline comments.May 4 2021, 5:41 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

I don't think there's a super easy way to do it. You would have to find the context. The normal way to check that would be to start from the instruction and look into the constantexpr for the global, which starts looking like general indirect call handling.

Ultimately this pass is a grotesque hack that should absolutely not exist, and only does because comgr still does not link the device libraries correctly for opencl. How did you run into this?

madhur13490 added inline comments.May 5 2021, 2:28 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

Ultimately this pass is a grotesque hack that should absolutely not exist, and only does because comgr still does not link the device libraries correctly for opencl. How did you run into this?

I don't think that should be the only reason. In case of indirect calls, we may want to clone functions based on conflicting register budget. So, this should exist but probably with some fixes.

252–256

User of 'bitcast' operator is CallBase, so you will have to go one level up in user "chain".

arsenm added inline comments.May 5 2021, 4:05 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

But you don't know how many layers up you would need to go. You could potentially need to do an exhaustive search which touches a lot of the program

jweightman updated this revision to Diff 343489.EditedMay 6 2021, 1:17 PM

I've attempted to handle constexpr function pointer casts. I had some issues, so I left some TODOs describing the problems I encountered in the code. This version isn't passing the new lit test. Thanks for all the help so far!

jweightman added inline comments.May 6 2021, 1:27 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

@arsenm to your question of how I ran into this: I ran into this issue while working on the Cray compiler. With the way we have things set up, it is possible to have indirect calls in the IR when this pass runs, though I'm not sure if it's cause the issue with Clang.

252–256

See the loop that I added above. Hopefully that isn't too invasive, though it's possibly not the most robust way of doing it.

This should now propagate attributes to functions called through a constexpr bitcast.

Deleted a stray #include <iostream>

I think at this point I'd prefer to just not worry about the cast stripping case and go back to just handling the direct callee.

This pass should not be required for correctness, and is only because opencl is still not linking the device libraries correctly such that the linked in functions have the correct subtarget/subtarget features. Long term I would like to move this pass into a form that does conservative merges of attributes, which can safely give up on anything indirect looking

llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
257

This doesn't consider if there could be multiple users. It's a whole expression tree. It still feels wrong to be doing this style of search for this.

I've reverted back to not handling calls to constexpr function pointers. I've also
removed it from the test.

Cleaned up stray newline

I think at this point I'd prefer to just not worry about the cast stripping case and go back to just handling the direct callee.

This pass should not be required for correctness, and is only because opencl is still not linking the device libraries correctly such that the linked in functions have the correct subtarget/subtarget features. Long term I would like to move this pass into a form that does conservative merges of attributes, which can safely give up on anything indirect looking

I think the new diff can be kept as anyway you have spent effort on it. But I am fine either way.

llvm/test/CodeGen/AMDGPU/propagate-attributes-function-pointer-argument.ll
15

Probably can have CHECK-NOT too to explicitly call out what should not be generated.

madhur13490 added inline comments.May 13 2021, 11:20 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
252–256

A comment here, please!

llvm/test/CodeGen/AMDGPU/propagate-attributes-function-pointer-argument.ll
15

CHECK-LABEL followed by CHECK-DAG would be more appropriate here.

Incorporated code review feedback from @madhur13490

arsenm accepted this revision.May 13 2021, 5:43 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
254

Add a fixme for the bitcasted callee case

This revision is now accepted and ready to land.May 13 2021, 5:43 PM

@arsenm @madhur13490 thank you both for the thorough review! I don't have commit access, so I will need someone to commit it for me.

This revision was landed with ongoing or failed builds.May 26 2021, 7:40 AM
This revision was automatically updated to reflect the committed changes.