This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve FLAT scratch detection
ClosedPublic

Authored by rampitec on Oct 30 2020, 2:53 PM.

Details

Summary

We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Diff Detail

Event Timeline

rampitec created this revision.Oct 30 2020, 2:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 30 2020, 2:53 PM
rampitec requested review of this revision.Oct 30 2020, 2:53 PM
arsenm added inline comments.Oct 30 2020, 2:57 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
517

What's wrong with another bit? Don't we already have it for global?

rampitec added inline comments.Oct 30 2020, 3:52 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
517

We do not have it for global, otherwise I would just check it is unset.

I just see that we may soon exhaust the bitfield width.

rampitec added inline comments.Oct 30 2020, 4:16 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
517

We are actually using 56 bits out of 64. If I would use a bit for every single feature in every singe GPU we would go over the roof a long time ago (we actually did, so I had to remove some of the bits and use some non-obvious ways to replace it).

rampitec updated this revision to Diff 302156.Nov 1 2020, 10:30 AM

Check for isSegmentSpecificFLAT() first to keep it O(1) most of the times.

foad added inline comments.Nov 2 2020, 1:49 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
525

If getFlatScratchInst is just a table lookup, is there any need to do the isSegmentSpecificFLAT test first?

rampitec added inline comments.Nov 2 2020, 7:31 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
525

I have reetored it to have O(1) in most cases, table lookup will only run if it is already known segmented flat. It is purely optimization.

foad added inline comments.Nov 2 2020, 7:40 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
525

I see. I thought the table lookup was a direct O(1) lookup. I didn't realise it is a binary chop.

rampitec added inline comments.Nov 2 2020, 7:44 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
525

Right, it is O(log(N)). N is small but still.

arsenm added inline comments.Nov 2 2020, 9:10 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.h
525

I think using a bit here is fine. We're not out, and there are other bits we could prune out that are less important

rampitec updated this revision to Diff 302355.Nov 2 2020, 11:28 AM
rampitec edited the summary of this revision. (Show Details)

Changed to use TSFlags bit.

arsenm accepted this revision.Nov 2 2020, 11:30 AM
This revision is now accepted and ready to land.Nov 2 2020, 11:30 AM
This revision was automatically updated to reflect the committed changes.