This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed SIInstrInfo::getOpSize to handle subregs
ClosedPublic

Authored by rampitec on Oct 1 2018, 10:36 AM.

Details

Summary

Currently it returns incorrect operand size for a target independet
node such as COPY if operand is a register with subreg. Instead of
correct subreg size it returns a size of the whole superreg.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Oct 1 2018, 10:36 AM
This revision is now accepted and ready to land.Oct 1 2018, 10:56 AM
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Oct 1 2018, 7:13 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

I think this silently will break if we start using 16-bit subregs. Can you assert that this number is consistent with the size of the super-reg?

rampitec added inline comments.Oct 1 2018, 8:27 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

In fact a lot will break if we start using 16 bit subregs. I would like to add the assert, but that is exactly the problem, one cannot know subreg's size without assuming target specific subreg layout. I.e. I do not see how to do it w/o listing all possible subregs in our target. Also note that MRI.getMaxLaneMaskForVReg() will happily return -1 for the whole register. Any ideas?

arsenm added inline comments.Oct 1 2018, 9:19 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

You can just assert against the size of the subreg class? Something like (getRegSizeInBits(getSubClassWithSubReg(getRegClass(Reg)) >= 32)?

rampitec added inline comments.Oct 1 2018, 9:21 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

A subreg can be sub0_sub1, which will pass the check.

arsenm added inline comments.Oct 1 2018, 10:11 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

Yes, that should work. The issue is if a subregindex like sub0_lo16 is used

rampitec added inline comments.Oct 1 2018, 10:16 PM
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h
735–736

As long as you are not planning to have sub0_lo16_hi16 ;) I see the point, at least cases will fail, which is sufficient to catch.