We were useing too broad check for isFLATScratch() which also
includes FLAT global.
Details
Diff Detail
Event Timeline
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 517 | What's wrong with another bit? Don't we already have it for global? | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 517 | We do not have it for global, otherwise I would just check it is unset. I just see that we may soon exhaust the bitfield width. | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 517 | We are actually using 56 bits out of 64. If I would use a bit for every single feature in every singe GPU we would go over the roof a long time ago (we actually did, so I had to remove some of the bits and use some non-obvious ways to replace it). | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 580 | If getFlatScratchInst is just a table lookup, is there any need to do the isSegmentSpecificFLAT test first? | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 580 | I have reetored it to have O(1) in most cases, table lookup will only run if it is already known segmented flat. It is purely optimization. | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 580 | I see. I thought the table lookup was a direct O(1) lookup. I didn't realise it is a binary chop. | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 580 | Right, it is O(log(N)). N is small but still. | |
| llvm/lib/Target/AMDGPU/SIInstrInfo.h | ||
|---|---|---|
| 580 | I think using a bit here is fine. We're not out, and there are other bits we could prune out that are less important | |
What's wrong with another bit? Don't we already have it for global?