This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Deduce attributes with the Attributor
ClosedPublic

Authored by kuter on Jun 27 2021, 1:38 PM.

Details

Summary

This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Diff Detail

Event Timeline

kuter created this revision.Jun 27 2021, 1:38 PM
kuter requested review of this revision.Jun 27 2021, 1:38 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
kuter updated this revision to Diff 354769.Jun 27 2021, 1:40 PM

License header.

jdoerfert added inline comments.Jun 27 2021, 2:50 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
129
167

Why not just go over CalleeAttributes here?

192

might be more interesting to print them all out.

223

It is odd we initialize with a call graph. I'm not sure we need this at all. The TM can be checked in the runOnModule and we might want to not support the old-PM right away.

kuter updated this revision to Diff 354775.Jun 27 2021, 3:00 PM

Implement more functionality.
Support more tests.

kuter marked an inline comment as done.Jun 27 2021, 3:02 PM
jdoerfert added inline comments.Jun 27 2021, 3:11 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134

There should be at least a todo. We should actually look at all call sites and if that succeeds we can propagate information just fine. Address taken doesn't need to be a bad thing per se.

207

I imagine we need to walk all instructions and look at all operands here, no?

kuter added inline comments.Jun 27 2021, 3:47 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134

I agree, but simple-indirect-call.ll depends on this behavior.

207

yes we do.

jdoerfert added inline comments.Jun 27 2021, 3:49 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
134

Add a fixme for now then.

207

Or, you start with all globals in the interesting address spaces and make your way down the use chains. Probably cheaper.

jdoerfert added inline comments.Jun 27 2021, 10:55 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
207

Or, you ask AAMemoryLocation for all globals that can be accessed. The downside is (potentially) that it won't track non-access uses, e.g. return &shared_mem = ptr; Unsure if that is needed.

arsenm added inline comments.Jun 28 2021, 6:26 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32

These are the attributes as they exist now, but I do think these need to be inverted to be more sound. Assuming they are present is the conservative direction, so ideally we would operate on a no-* basis

192

I do not like this attribute and don't believe it's very sound, but I guess continuing with it doesn't make things worse

200–201

This also depends on the subtarget since we don't need this on newer ones for addrspacecast

@arsenm I'd suggest we get to feature parity with the existing pass, then add more features as needed, then modify the names/polarity as we want it to be. Doing it as part of this will cause too much complexity.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32

I'd recommend to do that in a follow up as this is tested on it's own first, wdyt?

@kuter Are there other tests we haven't ported to this yet? If there are things we don't do yet, just add FIXMEs for now.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
185–186
arsenm added inline comments.Jun 28 2021, 1:34 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32

Sure, but I would assume inference works naturally in the negative direction?

jdoerfert added inline comments.Jun 29 2021, 7:58 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
28–32

yes, it should. Not sure I would say the direction of the inference is different but the lattice we use for the state is reversed, still pulling information into kernels (transitively).

ormris removed a subscriber: ormris.Jun 29 2021, 10:13 AM
kuter updated this revision to Diff 355657.Jun 30 2021, 12:20 PM
  • Added Uniform Work Group deduction.
  • Addressed review.
jdoerfert added inline comments.Jul 3 2021, 9:41 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
126
276–283

don't do this. Use the regular, checkforallcallsites mechanism instead. It will detect non-call site uses just fine but also allow callbacks and implicit calls (soon).

kuter marked an inline comment as done.Jul 3 2021, 2:04 PM
kuter added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
276–283

The flow of the deduction here is callee -> caller. I don't really understand why this needs to be done in the first place.
I just assumed that there is a special reason why this is needed simple-indirect-call.ll explicitly checks for this.

kuter updated this revision to Diff 356365.Jul 3 2021, 8:47 PM
  • Support more tests.
  • Fix the problem with existing tests.
  • Simplify logic.
  • Misc changes.

Anything except the constant expression address space cast tests is supported with
attributor now.

I will add the missing feature soon.

Can you add a test:

kernel() { // uniform-work-group-size = true
  weak();
}

weak() {  // weak linkage
  internal();
}

int G;
internal() {  // internal linkage
  G = 0;
}

here internal should have uniform-work-group-size = true.

Also, add:

kernel() {  // uniform-work-group-size = true
  internal1();
}
internal1() {  // internal linkage, same below.
  internal2();
}
int G;
internal2() {
  if (G == 0) {
    internal3();
    internal2();
  }
}
internal3() {
  G = 1;
}

everything should have uniform work group size = true

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
171–176

As I said before, this is not needed.

196

you need to check the return value here, I think.

234

Maybe you just use the boolean state instead? Having the additional boolean here doesn't seem to bring any benefit. You can probably just expose clampStateAndIndicateChange in the Attributor.h (AA namespace) and then use it in the CheckCallSite callback. There should not be anything to do but ask for the callee AA and clamp. This should remove 20 or so lines here.

357

I don't think the address space stuff is good. We should use a dedicated AA to track global uses or reuse AAMemoryLocations to ask what globals are accessed. The latter doesn't capture non-access uses though, so maybe the former is needed.

kuter updated this revision to Diff 357635.Jul 9 2021, 2:14 PM
kuter marked 2 inline comments as done.
  • Add support for constant exploration
  • All tests are supported now
  • Bug fix

There are some slight differences in the deduction of uniform-work-group-size
mainly in some cases the attributor is setting uniform-work-group-size to false when
the AMDGPUAnnotateKernelFeatures.cpp is not adding any attribute.

It is not possible to exactly match the behaviour without some really hacky code.
I will address some of the reviews now.

kuter added inline comments.Jul 10 2021, 3:15 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
171–176

uniform-work-group-propagate-attribute.ll specifically checks for this the @weak_func

234

I don't think this would work. In some cases we need to be able turn uniform-work-group-size attribute to false even if it is initialized with true.

in uniform-work-group-attribute-missing.ll @foo get's initialized with uniform-work-group-size true. But we turn it to false because the kernel doesn't have the attribute (for kernels we assume it is false if it is not present).

If we where to use the BooleanState, @foo would automatically be at a fixpoint on initialization (since it is set to true).
and we wouldn't be able to turn it to false.

kuter added inline comments.Jul 10 2021, 3:21 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
234

Also @jdoerfert There is now several attributes that use BooleanState as a "placeholder" state.
I think it would be great if we had something like a VoidState to avoid confusion.
even if it was as simple as using VoidState = BooleanState;

kuter updated this revision to Diff 357760.Jul 10 2021, 3:33 PM
  • Address review.
  • Small naming change.

Did you add the tests I suggested?

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
234

Hm, but we use the boolean in those cases don't we? To distinguish "good" and "bad". We can talk offline.

234

I don't think this would work. ..

It does, if you online finalize kernels in the initialization everything else should not even read the attribute from the IR but just start optimistic and then go on.

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
63

it doesnt?

llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll
50

weak_func can be annotated with "uniform-work-group-size"="true" because there is no way it is called from a kernel with "uniform-work-group-size"="false". The test is just too conservative here. Please don't keep the linkage check as it is not needed.

kuter updated this revision to Diff 357888.EditedJul 12 2021, 3:55 AM
  • Added requested test.
  • Removed the check for linkage type
kuter updated this revision to Diff 358497.Jul 13 2021, 7:45 PM

Make the uniform-work-group-size deduction use the BooleanState.
Rebase.

kuter updated this revision to Diff 358498.Jul 13 2021, 7:50 PM

Small change.

kuter updated this revision to Diff 358834.Jul 14 2021, 8:41 PM

Inline assembly call sites are no longer treated as unknown callees.
This fixes some differences in deduction.

I will double check everything, but attributors deduction of attributes
should now be equievelent to annotate-kernel-features pass.

kuter updated this revision to Diff 359413.Jul 16 2021, 12:24 PM
kuter marked an inline comment as done.

clang-format (sorry that I forgot)
Remove the WIP tag.

kuter retitled this revision from [WIP][AMDGPU] Deduce attributes with the Attributor to [AMDGPU] Deduce attributes with the Attributor.Jul 16 2021, 12:24 PM
arsenm added inline comments.Jul 16 2021, 1:06 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124

I don't know why you are tracking this here. This isn't entirely true anymore, plus there's a dedicated pass for this?

227

Braces

kuter added inline comments.Jul 16 2021, 8:45 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124

This is used for the deduction of amdgpu-queue-ptr attribute. For non entry entry functions if the function uses a ds global (even transitively) it gets the amdgpuqueue-ptr attribute.
I tried to replicate what AnnotateKernelFeatures.cpp does since there isn't any documentation available about the attribute(as far as I know).
How do you think we should proceed ?

arsenm accepted this revision.Jul 19 2021, 4:19 PM

LGTM with nit. I still think we should invert these attributes though

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
124

Oh right, this is in case we were to insert a trap and using the legacy trap ABI which required the queue ptr. We can now handle some cases of DS instructions in functions without emitting traps, but I guess preserving this for now is fine

This revision is now accepted and ready to land.Jul 19 2021, 4:19 PM
jdoerfert accepted this revision.Jul 19 2021, 9:45 PM

LG

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
160

Nit: Some doxygen docu about these methods and members would be good.

This revision was landed with ongoing or failed builds.Jul 23 2021, 8:07 PM
This revision was automatically updated to reflect the committed changes.