This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
ClosedPublic

Authored by arsenm on Jun 18 2021, 3:09 PM.

Details

Summary

These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.

Diff Detail

Event Timeline

arsenm created this revision.Jun 18 2021, 3:09 PM
arsenm requested review of this revision.Jun 18 2021, 3:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 3:09 PM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Jun 21 2021, 6:05 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
447

Where is this function defined?

arsenm updated this revision to Diff 353359.Jun 21 2021, 7:06 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
447

Posted wrong version of patch. I added a subtarget feature for this initially, but the actual behavior is a bit too convoluted and would require multiple variants of it

foad accepted this revision.Jun 22 2021, 2:31 AM

Looks reasonable to me. I guess there's no way to do this as a DAGCombine instead?

This revision is now accepted and ready to land.Jun 22 2021, 2:31 AM

Looks reasonable to me. I guess there's no way to do this as a DAGCombine instead?

You would need to know exactly how the source node is going to be selected, which you can't know ahead of time We do approximate this for FP instructions already.

foad added a comment.Jun 22 2021, 7:07 AM

Looks reasonable to me. I guess there's no way to do this as a DAGCombine instead?

You would need to know exactly how the source node is going to be selected, which you can't know ahead of time We do approximate this for FP instructions already.

I was wondering if combines could be run on the MachineSDNodes immediately after selection. But I see there is nothing like that.

Looks reasonable to me. I guess there's no way to do this as a DAGCombine instead?

You would need to know exactly how the source node is going to be selected, which you can't know ahead of time We do approximate this for FP instructions already.

I was wondering if combines could be run on the MachineSDNodes immediately after selection. But I see there is nothing like that.

I think there is a post-processing hook in the selector, but the instructions would be a bit off since SIFixSGPRCopies wouldn't have run yet, and there could be more intermediate nodes. Plus why sink effort into a DAG only solution at this point