This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove function with incompatible features
ClosedPublic

Authored by Pierre-vh on Nov 30 2022, 3:55 AM.

Details

Summary

Adds a new pass that removes functions
if they use features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Diff Detail

Event Timeline

Pierre-vh created this revision.Nov 30 2022, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 3:55 AM
Pierre-vh requested review of this revision.Nov 30 2022, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 3:55 AM
foad added inline comments.Nov 30 2022, 4:14 AM
llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
80 ↗(On Diff #478884)

isDeclaration() is a bit more self-documenting than empty().

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
220

Can you make the name of the option match the name of the pass? It seems like you have three variations in this patch:

  • clear incompatible functions
  • clear incompatible functions bodies
  • incompatible features clear fns
Pierre-vh updated this revision to Diff 478891.Nov 30 2022, 4:20 AM
Pierre-vh marked 2 inline comments as done.

Comments

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
220

Matching the name of the pass is not possible because the pass already generates a CL Option I think, but I tried to make it more consistent.

foad added inline comments.Nov 30 2022, 4:55 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
220

Maybe use -amdgpu-enable-* instead of just -amdgpu-* then? I see a few examples like that already.

Pierre-vh updated this revision to Diff 478904.Nov 30 2022, 5:08 AM
Pierre-vh marked an inline comment as done.

Comments

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

I did a quick test where I passed all .bc files from the device libs to Clang for fiji (gfx8) and it still doesn't build even with this patch. The pass kicks in a few times but there's some issues with "dot" instructions.
Not sure how to address those - should it be done in this pass? For instance device libs has a few places here it uses (target("dot8-insts") which allows selection to work (because that only checks the feature) but then it fails because there is no "real" instruction for GFX8 dot8, only GFX11 (it uses the generation).

Do I just go "whack a mole" and try to build, add more checks, try to build again, etc?
I'm worried about complexity exploding if the checks need to be more intricate. e.g. I see that dot instructions have been introduced in the middle of the GFX9 generation (GFX908?) so I'd already need to change the pass completely to check for GFX908

Overall this looks pretty good. As you say, the feature checking logic is quite limited, but that's not a problem.
I think after this patch lands https://reviews.llvm.org/D123693 can be reverted. Can you try to revert that with this patch and check if device libs can be built correctly at -O0?

I did a quick test where I passed all .bc files from the device libs to Clang for fiji (gfx8) and it still doesn't build even with this patch. The pass kicks in a few times but there's some issues with "dot" instructions.
Not sure how to address those - should it be done in this pass? For instance device libs has a few places here it uses (target("dot8-insts") which allows selection to work (because that only checks the feature) but then it fails because there is no "real" instruction for GFX8 dot8, only GFX11 (it uses the generation).

Do I just go "whack a mole" and try to build, add more checks, try to build again, etc?
I'm worried about complexity exploding if the checks need to be more intricate. e.g. I see that dot instructions have been introduced in the middle of the GFX9 generation (GFX908?) so I'd already need to change the pass completely to check for GFX908

Thanks for looking deeper into this. I believe it is the right functional test for this patch if it can correctly build device libs with D123693 reverted. So I would recommend continuing the implementation till that is true.

I would recommend one level of abstraction on the attribute checks so it is easier to extend. So something like this pseudocode.

foreach ( attribute in function)
    if (isIllegal(attribute, function.subtarget))
        remove = true;

isIllegal can be similar to your current FeatureAndMinGen, since you mentioned before there is no existing api for determining if an attribute is legal on a subtarget (though I'm surprised by that and think if you build it the users will come).

Leonc added a comment.Dec 1 2022, 7:39 PM

Is there a way to detect when new feature attributes are added and raise a warning if this pass hasn't been updated?

Pierre-vh updated this revision to Diff 479596.Dec 2 2022, 4:29 AM

Reworked the feature compatibility checking logic to use TableGen data.
I think this is a lot more robust. I only check the features we're interested in though - I tried checking all of them and there's too many edge cases to handle them all so it's better to do this on an "opt-in" basis IMO.

With this, device libs _mostly_ builds at O0 on fiji. There's just some small codegen issue with fsin but I will try to look into it (even if it's not a known issue)

Overall this is looking quite good.

do this on an "opt-in" basis

Makes sense to me

llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
87 ↗(On Diff #479600)

Does this need to reach a fixed point over features implying each other? Maybe that has already been done in the Implies data structure.

Pierre-vh marked an inline comment as done.Dec 6 2022, 1:00 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUClearIncompatibleFunctions.cpp
87 ↗(On Diff #479600)

It just needs to expand everything, not sure how the Implies data structure works exactly but I _think_ you can have implied feature that imply other feature, so there's a need to be recursive.
This function is inspired by SetImpliedBits in MCSubtargetInfo.cpp (which is private)

Pierre-vh marked an inline comment as done.Dec 12 2022, 1:59 AM

ping

The last functional status I see here is.

With this, device libs _mostly_ builds at O0 on fiji.

If this is building device libs correctly on all targets with D123693 reverted then it is good by me.

OutOfCache removed a subscriber: OutOfCache.

Last time I checked (before the holidays), it builds fine without the patch reverted. I can try reverting the patch later when this lands perhaps? Or should it be done at the same time?
For reference the only build issues I experienced where with hard f64 functions like llvm.sin.f64, which are apparently not (well) supported if I understand correctly

Joe_Nash accepted this revision.Jan 4 2023, 6:26 AM

Last time I checked (before the holidays), it builds fine without the patch reverted. I can try reverting the patch later when this lands perhaps? Or should it be done at the same time?
For reference the only build issues I experienced where with hard f64 functions like llvm.sin.f64, which are apparently not (well) supported if I understand correctly

I'm fine with you landing this patch then reverting the V_ILLEGAL patch later.

LGTM! Thanks

This revision is now accepted and ready to land.Jan 4 2023, 6:26 AM
arsenm requested changes to this revision.Jan 4 2023, 6:55 AM

I think this is too much magic to apply based on target-features, and is too specific for the library usage. I think this needs to be an explicit opt-in behavior for the function, maybe a form of linkage (target_weak)? I also think the library use case is insufficient to add such a thing. It's like 1 medium-small file that requires duplication, plus maybe 5 or so microscopic functions

This revision now requires changes to proceed.Jan 4 2023, 6:55 AM

I think this is too much magic to apply based on target-features, and is too specific for the library usage. I think this needs to be an explicit opt-in behavior for the function, maybe a form of linkage (target_weak)? I also think the library use case is insufficient to add such a thing. It's like 1 medium-small file that requires duplication, plus maybe 5 or so microscopic functions

I thought we agreed this was the solution we wanted, at least for now? I remember there was quite a bit of discussion but I ended up with a green light to work on this pass based on the current use-case (device libs)

I agree this should probably be opt-in though, but adding a linkage type seems overkill no? Could we do something simpler like a target-feature perhaps? Or just make the CL option off by default?

Rebase & refactor according to offline discussion

Pierre-vh retitled this revision from [AMDGPU] Clear bodies of function with incompatible features to [AMDGPU] Remove function with incompatible features.Feb 10 2023, 6:32 AM
Pierre-vh edited the summary of this revision. (Show Details)
Joe_Nash added inline comments.Feb 10 2023, 6:38 AM
llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
11

typo: functions repeated

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1077

For the sake of compile time, can we avoid running this pass on O2 and O3? We know dead code elimination will remove the functions anyway.

foad added inline comments.Feb 10 2023, 7:05 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1077

We don't know that. DCE will only remove a function if the compiler can easily prove that it is never called.

Joe_Nash accepted this revision.Feb 10 2023, 11:01 AM

LGTM besides my previously noted typo, but please wait for @arsenm

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1077

Ok, you're right. I was thinking of device libs only, but this pass can be used more generally.

Pierre-vh updated this revision to Diff 496883.Feb 13 2023, 2:11 AM
Pierre-vh marked 3 inline comments as done.

Fix typo

arsenm added inline comments.Feb 20 2023, 5:43 AM
llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
56

I didn't know there was a skipModule, is this the same as skipFunction in that we shouldn't be doing it for required passes?

66

Should replace with ConstantPointerNull, not undef

76

Start with lowercase

105

Braces

118

Don't need to do this per function, can just not run the pass in the first place

153–155

You can build this twine directly top ass to DiagnosticInfo, you don't need to build a std::string first

156

I'm not sure this really belongs under DiagnosticInfoUnsupported, seems like its own category (also, it's not a warning if it's supposed to happen)

llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
293

Use opaque pointers and named values

845

Also should include IR check lines

856

Missing uses for the functions? In addition to some direct callers, I would like to see some address captures and constantexpr uses

Pierre-vh updated this revision to Diff 498856.Feb 20 2023, 7:46 AM
Pierre-vh marked 10 inline comments as done.

Comments

llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp
56

It's similar I think so I removed it

llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
845

Now that I think about it it's probably better to just check the IR and nothing else. I initially checked the asm because I used unreachable and wanted to check that it was correctly emitted but now it doesn't really matter.

arsenm accepted this revision.Feb 20 2023, 9:19 AM

Not my favorite but I guess we can do this for now

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1076

Extra whitespace change

llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
845

There'd be some value in testing codegen if these also were testing the failing intrinsics in the bodies

This revision is now accepted and ready to land.Feb 20 2023, 9:19 AM
Pierre-vh updated this revision to Diff 499077.Feb 21 2023, 1:39 AM
Pierre-vh marked an inline comment as done.

Comments

Pierre-vh marked an inline comment as done.Feb 21 2023, 1:39 AM
This revision was landed with ongoing or failed builds.Feb 21 2023, 1:42 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Apr 19 2023, 8:14 AM
llvm/test/CodeGen/AMDGPU/remove-incompatible-functions.ll
373

Typo GFX111

379

Typo GFX111