This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Propagate amdgpu-waves-per-eu with attributor
ClosedPublic

Authored by arsenm on Dec 10 2021, 3:46 PM.

Details

Summary

This will do a value range merging down the callgraph, unlike the
current pass which can only propagate values to undecorated functions
from a kernel.

This one is a bit weird due to the interaction with the implied range
from amdgpu-flat-workgroup-size. At the default group range of 1,1024,
the minimum implied bounds is 4 so this ends up introducing the
attribute on undecorated functions. We could probably simplify this by
ignoring it and propagating the raw values. The subtarget interaction
and the interaction with amdgpu-flat-workgroup-size only really clamp
invalid values (plus the lower bound doesn't seem to do anything as
far as I can tell anyway).

Diff Detail

Event Timeline

arsenm created this revision.Dec 10 2021, 3:46 PM
arsenm requested review of this revision.Dec 10 2021, 3:46 PM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: wdng. · View Herald Transcript

I don't understand all the update logic but from an Attributor standpoint there are only two things that should be addressed.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
680

Somewhat surprising you intersect the known range with an assumed result, if that is on purpose probably worth a comment explaining why this is reasonable.

714

It is important to return "CHANGED" here.

arsenm updated this revision to Diff 394857.Dec 16 2021, 6:55 AM
arsenm marked an inline comment as done.

Return indicatePessimisticFixpoint result

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
680

I'm confused by your surprise. This is the initialize, so the assumed state doesn't mean anything? All the other IntegerRangeState attributes start out with intersectKnown in initialize()

jdoerfert added inline comments.Dec 23 2021, 9:00 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
680

So, at this point you move something assumed by another AA into something known by this. However, we don't know the other AAs assumed state is valid yet. Only after the fixpoint is reached it is known to hold. So what other AAs do (I hope) is use IR information to setup the known state. Here we might end up assuming the best for the FlatWorkGroupSize (line 674) then making it to something known in 683, before the values are tightened during the fixpoint iteration causing the known set to be not actually valid anymore. Generally, assumed information should only flow into known information if a fixpoint is reached (at which point the assumed information becomes the known one anyway). Before that, assumed should flow into assumed only and known can flow in either.

Does that make some sense?

arsenm updated this revision to Diff 528169.Jun 3 2023, 5:33 PM

Make flat work group interaction work by querying the correct thing.

I can convince myself the merged ranges make sense but I'm not sure, particularly on the lower bound

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2023, 5:33 PM
arsenm updated this revision to Diff 528172.Jun 3 2023, 6:08 PM
jhuber6 added a subscriber: jhuber6.Jun 5 2023, 8:17 AM

There is actually an assumption propagation AA in the Attributor already https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L11609. I don't know if that's relevant to what the AMDGPUAttributor wants to do however.

arsenm added a comment.Jun 8 2023, 4:31 PM

There is actually an assumption propagation AA in the Attributor already https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L11609. I don't know if that's relevant to what the AMDGPUAttributor wants to do however.

Don't see how assumes are involved. There isn't a way to introspect waves-per-eu

There is actually an assumption propagation AA in the Attributor already https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L11609. I don't know if that's relevant to what the AMDGPUAttributor wants to do however.

Don't see how assumes are involved. There isn't a way to introspect waves-per-eu

I was just wondering out loud if you could model this as "assume one thread" and then treat it the same way in the backend.

jdoerfert accepted this revision.Jun 14 2023, 10:06 AM

LG, I left some notes and nits below.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
526

Docs: Base class to derive different size ranges.

536–538

Nit: I doubt you need these. StateWrapper should provide them, IIRC.

541

FWIW, we should be able to track how often we manifested, hence improved, the ranges. I think that would be good to have.

559

Can you add a TODO in this class. The functionality it offers should be in some helper header. Effectively it does call site -> callee lookups and clamping. We have similar helpers in AttributorAttributes.cpp, but not this one. At some point we should move them all out into a header...

582

Nit: You don't need a list with 8 slots, { Attr } probably works just as well.

661

Nit: Probably better to only redirect one level up (AAAMDSizeRangeAttribute::isValidState())

FWIW:
Empty should mean it's dead/misconfigured, no?

This revision is now accepted and ready to land.Jun 14 2023, 10:06 AM
arsenm marked 2 inline comments as done.Jun 15 2023, 10:07 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
536–538

They're already here so I'll try to delete separately

541

It looks like you're supposed to implement this with macros private to AttributorAttributes

arsenm added inline comments.Jun 15 2023, 10:13 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
661

This was one of the struggles I forgot about. I didn't find a better way to express the valid bounds for this to infer. By default it covers 0, -1, which exceeds the maximum of 10 and it produces broken attributes