This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix wait counts in the presence of 16bit subregisters
ClosedPublic

Authored by vpykhtin on May 15 2020, 1:57 PM.

Diff Detail

Event Timeline

vpykhtin created this revision.May 15 2020, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2020, 1:57 PM
This revision is now accepted and ready to land.May 15 2020, 2:22 PM
foad added inline comments.May 16 2020, 12:42 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
508

I think there's a DivCeil function in MathExtras that you could use here.

rampitec added inline comments.May 16 2020, 1:06 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
508

Please don't. This function does not work. I had to ditch it the other day specifically because of 16 vs 32 bits.

This revision was automatically updated to reflect the committed changes.
foad added inline comments.May 26 2020, 2:46 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
508

This function does not work.

???

rampitec added inline comments.May 26 2020, 9:16 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
508

Found it: D78772
It was: "divideCeil(getSubRegIdxOffset(SubReg), 32)" changed to :(getSubRegIdxOffset(SubReg) + 31) / 32".
The problem was with offset == 16. If we operate just size this is probably OK.