This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
129 | ||
167 | Why not just go over CalleeAttributes here? | |
192 | might be more interesting to print them all out. | |
223 | It is odd we initialize with a call graph. I'm not sure we need this at all. The TM can be checked in the runOnModule and we might want to not support the old-PM right away. |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
134 | There should be at least a todo. We should actually look at all call sites and if that succeeds we can propagate information just fine. Address taken doesn't need to be a bad thing per se. | |
207 | I imagine we need to walk all instructions and look at all operands here, no? |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
207 | Or, you ask AAMemoryLocation for all globals that can be accessed. The downside is (potentially) that it won't track non-access uses, e.g. return &shared_mem = ptr; Unsure if that is needed. |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
28–32 | These are the attributes as they exist now, but I do think these need to be inverted to be more sound. Assuming they are present is the conservative direction, so ideally we would operate on a no-* basis | |
192 | I do not like this attribute and don't believe it's very sound, but I guess continuing with it doesn't make things worse | |
200–201 | This also depends on the subtarget since we don't need this on newer ones for addrspacecast |
@arsenm I'd suggest we get to feature parity with the existing pass, then add more features as needed, then modify the names/polarity as we want it to be. Doing it as part of this will cause too much complexity.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
28–32 | I'd recommend to do that in a follow up as this is tested on it's own first, wdyt? |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
28–32 | Sure, but I would assume inference works naturally in the negative direction? |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
28–32 | yes, it should. Not sure I would say the direction of the inference is different but the lattice we use for the state is reversed, still pulling information into kernels (transitively). |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
276–283 | The flow of the deduction here is callee -> caller. I don't really understand why this needs to be done in the first place. |
- Support more tests.
- Fix the problem with existing tests.
- Simplify logic.
- Misc changes.
Anything except the constant expression address space cast tests is supported with
attributor now.
I will add the missing feature soon.
Can you add a test:
kernel() { // uniform-work-group-size = true weak(); } weak() { // weak linkage internal(); } int G; internal() { // internal linkage G = 0; }
here internal should have uniform-work-group-size = true.
Also, add:
kernel() { // uniform-work-group-size = true internal1(); } internal1() { // internal linkage, same below. internal2(); } int G; internal2() { if (G == 0) { internal3(); internal2(); } } internal3() { G = 1; }
everything should have uniform work group size = true
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
171–176 | As I said before, this is not needed. | |
196 | you need to check the return value here, I think. | |
234 | Maybe you just use the boolean state instead? Having the additional boolean here doesn't seem to bring any benefit. You can probably just expose clampStateAndIndicateChange in the Attributor.h (AA namespace) and then use it in the CheckCallSite callback. There should not be anything to do but ask for the callee AA and clamp. This should remove 20 or so lines here. | |
357 | I don't think the address space stuff is good. We should use a dedicated AA to track global uses or reuse AAMemoryLocations to ask what globals are accessed. The latter doesn't capture non-access uses though, so maybe the former is needed. |
- Add support for constant exploration
- All tests are supported now
- Bug fix
There are some slight differences in the deduction of uniform-work-group-size
mainly in some cases the attributor is setting uniform-work-group-size to false when
the AMDGPUAnnotateKernelFeatures.cpp is not adding any attribute.
It is not possible to exactly match the behaviour without some really hacky code.
I will address some of the reviews now.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
171–176 | uniform-work-group-propagate-attribute.ll specifically checks for this the @weak_func | |
234 | I don't think this would work. In some cases we need to be able turn uniform-work-group-size attribute to false even if it is initialized with true. in uniform-work-group-attribute-missing.ll @foo get's initialized with uniform-work-group-size true. But we turn it to false because the kernel doesn't have the attribute (for kernels we assume it is false if it is not present). If we where to use the BooleanState, @foo would automatically be at a fixpoint on initialization (since it is set to true). |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
234 | Also @jdoerfert There is now several attributes that use BooleanState as a "placeholder" state. |
Did you add the tests I suggested?
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
234 | Hm, but we use the boolean in those cases don't we? To distinguish "good" and "bad". We can talk offline. | |
234 |
It does, if you online finalize kernels in the initialization everything else should not even read the attribute from the IR but just start optimistic and then go on. | |
llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll | ||
63 | it doesnt? | |
llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll | ||
50 | weak_func can be annotated with "uniform-work-group-size"="true" because there is no way it is called from a kernel with "uniform-work-group-size"="false". The test is just too conservative here. Please don't keep the linkage check as it is not needed. |
Inline assembly call sites are no longer treated as unknown callees.
This fixes some differences in deduction.
I will double check everything, but attributors deduction of attributes
should now be equievelent to annotate-kernel-features pass.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
124 | This is used for the deduction of amdgpu-queue-ptr attribute. For non entry entry functions if the function uses a ds global (even transitively) it gets the amdgpuqueue-ptr attribute. |
LGTM with nit. I still think we should invert these attributes though
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
124 | Oh right, this is in case we were to insert a trap and using the legacy trap ABI which required the queue ptr. We can now handle some cases of DS instructions in functions without emitting traps, but I guess preserving this for now is fine |
LG
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
160 | Nit: Some doxygen docu about these methods and members would be good. |
These are the attributes as they exist now, but I do think these need to be inverted to be more sound. Assuming they are present is the conservative direction, so ideally we would operate on a no-* basis