Page MenuHomePhabricator

AMDGPU: Should always start from the first register in VGPR indexing.
Needs ReviewPublic

Authored by cfang on Mon, Dec 3, 3:47 PM.

Details

Summary

SunReg should always be AMDGPU::sub0. The 8-bit m0 field for the index is unsigned.
We can guarantee the index non-negative (if the program itself is correct) only when we
start from the very first register in the vector.

The original optimization shifts the base to AMDGPU::sub0 + Offset, which leads to the situation
that the index could be negative to address the registers to the left of the base (Offset). Thus the
optimization is invalid.

Diff Detail

Event Timeline

cfang created this revision.Mon, Dec 3, 3:47 PM
arsenm added a comment.Mon, Dec 3, 3:52 PM

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

cfang added a comment.Mon, Dec 3, 4:00 PM

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

Would you please explain how would your knownbit approach resolve the negative index issue while keep the optimization for gfx9+?
Or just post your patch. Thanks.

msearles added inline comments.Mon, Dec 3, 4:28 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3013–3017

Typo: SunReg (should be SubReg). Typo is repeated in the second comment block as well.

arsenm added a comment.Mon, Dec 3, 4:37 PM

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

Would you please explain how would your knownbit approach resolve the negative index issue while keep the optimization for gfx9+?
Or just post your patch. Thanks.

If you know the base index isn't negative, you don't need to disable this

arsenm added a comment.Mon, Dec 3, 4:44 PM

D30466 is the primitive computeKnownBits

cfang added a comment.Wed, Dec 5, 10:26 AM

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

Would you please explain how would your knownbit approach resolve the negative index issue while keep the optimization for gfx9+?
Or just post your patch. Thanks.

If you know the base index isn't negative, you don't need to disable this

Theoretically it is correct. But in the real world applications, the index should be unknown to the compiler, and most likely a variable.
Also, as the offset is a positive number and we choose to start from offset to indirect vgpr indexing, we are sure the index is negative if we are addressing
the registers left to "offset" in the vector.

I am thinking that in very rare case that the compiler can make sure that the base index is non-negative, and doubt whether it is
worthwhile to do the optimization to save one (ADD) instruction for such case.

cfang marked an inline comment as done.Wed, Dec 5, 10:28 AM
cfang added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
3013–3017

Thanks. Will correct the typo (if we keep the code this way ).

cfang added a comment.Wed, Dec 5, 11:23 AM

D30466 is the primitive computeKnownBits

Has this patch been comitted to the Trunk?

cfang updated this revision to Diff 176882.Wed, Dec 5, 2:36 PM

Fix typos.

arsenm added a comment.Fri, Dec 7, 2:18 PM

No, it's not committed. Variable + constant is a common case in general.

It would probably be better to do this fold in the DAG for now though

No, it's not committed. Variable + constant is a common case in general.

It would probably be better to do this fold in the DAG for now though

How can you guarantee, at machine instruction level, the base is non-negative even though you do this fold in the DAG?
Something may happen in between.