This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add new BFI intrinsic
Needs RevisionPublic

Authored by tsymalla on Jul 4 2023, 2:31 AM.

Details

Reviewers
foad
arsenm
Summary

This adds a new BFI intrinsic which can be used to emit the v_bfi instruction
directly with a mask.

Diff Detail

Event Timeline

tsymalla created this revision.Jul 4 2023, 2:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 2:31 AM
tsymalla requested review of this revision.Jul 4 2023, 2:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 2:31 AM
arsenm requested changes to this revision.Jul 4 2023, 2:43 AM

We’ve specifically avoided adding intrinsics for easy to match instructions. This needs a semantic justification over just emitting the expanded bit sequence. It’s a huge amount of work teaching every part of the compiler the equivalent bit optimizations.

llvm/test/CodeGen/AMDGPU/bfi_nested.ll
3

Why lose a test

This revision now requires changes to proceed.Jul 4 2023, 2:43 AM

We’ve specifically avoided adding intrinsics for easy to match instructions. This needs a semantic justification over just emitting the expanded bit sequence. It’s a huge amount of work teaching every part of the compiler the equivalent bit optimizations.

Unfortunately, in the case of BFI instructions, these are not so easy to match in LLVM. There is some unfortunate stuff going on that prevents us from generating nested v_bfi instructions (e. g. one BFI as base of another etc.) - which in turn lets us generate way less v_bfi instructions than we could. I have been working on that for a while on https://reviews.llvm.org/D136432, and it is not so easy to get it working properly. From my tests, it seems, that there is no real advantage in terms of codegen when trying to match the and / or patterns, so I'd thought that emitting the intrinsic directly would be sufficient and an improvement over the current state. This should not replace the few existing BFI ISel patterns but rather serve as a way to teach the middle-end to generate BFI instructions.

llvm/test/CodeGen/AMDGPU/bfi_nested.ll
3

Because that was implemented as base for the bementioned, abandoned BFI patch which was never merged.