This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable `update/mov.dpp` selection with `+dpp`
AbandonedPublic

Authored by Pierre-vh on Oct 28 2022, 5:45 AM.

Details

Summary

ROCm device libs can emit those intrinsics w/ the +dpp attribute, and it counts on the optimizer to remove the call if the GPU is too old.
When built at O0 it caused codegen issues as Clang allowed this intrinsic to go through with just +dpp, but the backend wanted the GPU to be >=GFX8 as well.

This patch allows selecting that intrinsic with just +dpp

Depends on D136945

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 28 2022, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:45 AM
Pierre-vh requested review of this revision.Oct 28 2022, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:45 AM
foad added a comment.Oct 28 2022, 6:07 AM

Typo "gfx10" in title.

Why do you need +dpp as well as +gfx8-insts? Doesn't +gfx8-insts kind of imply +dpp already?

Pierre-vh retitled this revision from [AMDGPU] Enable `update/mov.dpp` selection with `+dpp,+gfx10-insts` to [AMDGPU] Enable `update/mov.dpp` selection with `+dpp,+gfx8-insts`.Oct 28 2022, 6:10 AM

Typo "gfx10" in title.

Why do you need +dpp as well as +gfx8-insts? Doesn't +gfx8-insts kind of imply +dpp already?

I'm not sure - I used both to be safe because the intrinsic is dpp specific. If everyone agrees we just need gfx8-insts then sure I'll remove the need for DPP.
Though, the change in pseudoToMCOpcode will still need to be DPP-specific because if I only check for GFX8-insts, it creates a mess (about 90 tests failures + various crashes). It seems like it needs to return -1 in some cases so I did a very targeted fix there (which I don't really like, if there's a better option I'd love to do something else really)

foad added a comment.Oct 28 2022, 6:22 AM

Though, the change in pseudoToMCOpcode will still need to be DPP-specific because if I only check for GFX8-insts, it creates a mess (about 90 tests failures + various crashes). It seems like it needs to return -1 in some cases so I did a very targeted fix there (which I don't really like, if there's a better option I'd love to do something else really)

I think most of those extra cases in pseudoToMCOpcode should be conditional on MCOp == (uint16_t)-1, i.e. only try them if we have so far failed to find an MC opcode. Does that help?

Pierre-vh planned changes to this revision.Nov 2 2022, 2:02 AM
arsenm added a comment.Nov 2 2022, 3:54 PM

I think we're underutilizing implied features. I do think HasDPP should be sufficient. If you're running into further trouble, I'd guess it's from generation encoding changes

Pierre-vh updated this revision to Diff 472915.Nov 3 2022, 6:09 AM

Only use hasDPP

Pierre-vh retitled this revision from [AMDGPU] Enable `update/mov.dpp` selection with `+dpp,+gfx8-insts` to [AMDGPU] Enable `update/mov.dpp` selection with `+dpp`.Nov 3 2022, 6:10 AM
Pierre-vh edited the summary of this revision. (Show Details)
Pierre-vh planned changes to this revision.Nov 4 2022, 2:07 AM

Removing from review queue until we decide whether it's the way to go

Pierre-vh abandoned this revision.Nov 15 2022, 12:25 AM