This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add 24-bit mul intrinsics
ClosedPublic

Authored by arsenm on Jul 9 2019, 5:32 PM.

Details

Reviewers
rampitec
Summary

Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an unknown read
source to the mul combine. It doesn't worth the hassle of trying to
insert an AssertZext or something to try to deal with it.

Diff Detail

Event Timeline

arsenm created this revision.Jul 9 2019, 5:32 PM

Replacing it so early means we could miss other optimizations, especially on the DAG. Was any performance evaluation performed?

Replacing it so early means we could miss other optimizations, especially on the DAG. Was any performance evaluation performed?

It's about as late as possible for the IR. This isn't really moving it that far from there this happens already, which is typically combine 1. The library function call is still using the standard IR operations, and I would expect the useful optimizations to have happened by this point. We already understand the known bits for these in the DAG, so I don't they shouldn't hurt too much (although they still may need ComputeNumSignBitsForTargetNode). It's possible combine opportunities will appear after lowering, and they will need to be implemented for the mul24 nodes.

I don't know what I would try this on, besides Jeff's benchmark that this is intended to solve. The only lit tests that broke were improvements.

This revision is now accepted and ready to land.Jul 10 2019, 4:44 PM
arsenm closed this revision.Jul 15 2019, 10:51 AM

r366094