We were useing too broad check for isFLATScratch() which also
includes FLAT global.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
517 | What's wrong with another bit? Don't we already have it for global? |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
517 | We do not have it for global, otherwise I would just check it is unset. I just see that we may soon exhaust the bitfield width. |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
517 | We are actually using 56 bits out of 64. If I would use a bit for every single feature in every singe GPU we would go over the roof a long time ago (we actually did, so I had to remove some of the bits and use some non-obvious ways to replace it). |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
580 | If getFlatScratchInst is just a table lookup, is there any need to do the isSegmentSpecificFLAT test first? |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
580 | I have reetored it to have O(1) in most cases, table lookup will only run if it is already known segmented flat. It is purely optimization. |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
580 | I see. I thought the table lookup was a direct O(1) lookup. I didn't realise it is a binary chop. |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
580 | Right, it is O(log(N)). N is small but still. |
llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
580 | I think using a bit here is fine. We're not out, and there are other bits we could prune out that are less important |
What's wrong with another bit? Don't we already have it for global?