This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use single cache policy operand
ClosedPublic

Authored by rampitec on Feb 10 2021, 4:36 PM.

Details

Summary

Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Diff Detail

Event Timeline

rampitec created this revision.Feb 10 2021, 4:36 PM
rampitec requested review of this revision.Feb 10 2021, 4:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 4:36 PM
arsenm added inline comments.Feb 10 2021, 4:47 PM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4904

Don't see why this needs to check the mnemonic, but I guess that's part of the temporary hack

6868

i don't think anything with cache_policy in the string should be accepted, should have test to make sure

rampitec added inline comments.
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4904

Yes, it is hack to only change small number of instructions for now.

6868

It does not, this operand type is skipped during parsing. I just need the table to have all types.

The main question I have to anyone interested: is that feasible? It is WIP anyway.

rampitec added inline comments.Feb 10 2021, 5:17 PM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4928

This is a bug BTW, I will fix it. It will erase any optional operand, cache policy or not, once first cache policy bit was met.

Another question, can we have a shorter name than "cache_policy"? "cpol" maybe? It blows formatting in many places.

t-tye added a comment.Feb 10 2021, 8:07 PM

Another question, can we have a shorter name than "cache_policy"? "cpol" maybe? It blows formatting in many places.

What about "cp"? We have "as" I think for address space.

t-tye added inline comments.Feb 10 2021, 8:10 PM
llvm/lib/Target/AMDGPU/SIDefines.h
274–279

Why is ALL needed? Does the code ever deliberately want to set them all? I would suspect places only even want to set specific bits.

Another question, can we have a shorter name than "cache_policy"? "cpol" maybe? It blows formatting in many places.

What about "cp"? We have "as" I think for address space.

Ack. Also note, it will never appear in asm, only in compiler sources.

llvm/lib/Target/AMDGPU/SIDefines.h
274–279

It is used as ~ALL for mask verification.

rampitec added a comment.EditedFeb 11 2021, 2:02 AM

Another question, can we have a shorter name than "cache_policy"? "cpol" maybe? It blows formatting in many places.

What about "cp"? We have "as" I think for address space.

I still cannot get rid of the connotations to 'cp' command. CPol looks less misleading to me.

dp added inline comments.Feb 11 2021, 3:47 AM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4904

It would be interesting to add global_ to play with hardcoded glc.

6952

Maybe it would be cleaner to parse this as a real sequence of operands instead of replacing parsed operands later?

rampitec added inline comments.Feb 11 2021, 9:04 AM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4904

Right, that's one problem to solve.

6952

I consider this a temp code. parseOptionalOperand() can create and update cache policy operand directly, but it cannot do so now because it has no access to mnemonics. When all instructions are handled it can be moved there.

rampitec updated this revision to Diff 323079.Feb 11 2021, 11:00 AM
rampitec marked an inline comment as done.

Fixed bug in combining bits into cache policy operand.
Renamed CachePolicy -> CPol, cache_policy -> cpol.
Added test for error checking (TODO: need to complain on repeating/unsupported bits).

rampitec updated this revision to Diff 323186.Feb 11 2021, 4:35 PM
rampitec marked an inline comment as done.

Added full FLAT support. Atomics with forced GLC work.

Ideally I want to have no special cpol_glc1 and cpol_glc0 operands, so that it would be alsways possible to just query named operand cpol, but that would need 2 bits in TSFlags. Otherwise it does not seem possible to make sure if we have forced 1 or 0 glc bit. At least this patch is no worse than using glc1 operand as now.

Added full FLAT support. Atomics with forced GLC work.

Ideally I want to have no special cpol_glc1 and cpol_glc0 operands, so that it would be alsways possible to just query named operand cpol, but that would need 2 bits in TSFlags. Otherwise it does not seem possible to make sure if we have forced 1 or 0 glc bit. At least this patch is no worse than using glc1 operand as now.

Actually maybe I need to burn these two bits anyway. I have realized that insert waitcount pass (maybe some other passes too) use noret/rtn map to decide if an instruction is atomic. This shall not work anymore when we have only ret or only noret versions of some of them. I will check tomorrow, but I am practically sure we have bug with waitcounts and such atomics.

rampitec updated this revision to Diff 323813.Feb 15 2021, 12:14 PM

Rebased and got rid of separate cpol_glc* operands.

dp added a comment.Feb 16 2021, 8:29 AM

Overall looks good

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
3839

Should be removed

4904

Should also combine cache policy operands on MatchOperand_NoMatch condition, otherwise these operands may break operand matcher later and may result in an invalid error position. For example:

error: invalid operand for instruction
global_atomic_add v0, v[1:2], v2, off glc 1
                                      ^

Hopefully the whole hack will be removed in final version.

rampitec updated this revision to Diff 324068.Feb 16 2021, 11:31 AM
rampitec marked 2 inline comments as done.
rampitec edited the summary of this revision. (Show Details)

Added buffer instructions. These were more difficult because ret and noret atomics only differ in glc and the rest of the instructions is the same.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4904

OK. Although it shall really migrate into parseOptionalOperand() when all instruction classes are covered.

rampitec updated this revision to Diff 324130.Feb 16 2021, 4:29 PM
rampitec edited the summary of this revision. (Show Details)

Patch now covers all instructions, added MIMG and SMRD.
Removed all individual GLC/SLC/DLC operands.
Removed hack to combine cpol operands after parsing.

It passes check-llvm-mc, the next thing is to fix codegen.

rampitec marked 2 inline comments as done.Feb 16 2021, 6:02 PM
foad added a subscriber: foad.Feb 17 2021, 12:09 AM
rampitec updated this revision to Diff 324808.Feb 18 2021, 4:37 PM

Rebased to support SCC.
I had to use bit 5 in CPol because bit 4 is used by SWZ in some buffer intrinsics which is already a public interface. Not sure why SWZ was combined with cache policy.

foad added a subscriber: piotr.Feb 19 2021, 12:51 AM

Rebased to support SCC.
I had to use bit 5 in CPol because bit 4 is used by SWZ in some buffer intrinsics which is already a public interface. Not sure why SWZ was combined with cache policy.

See @piotr's comment in https://reviews.llvm.org/D68200#1688133.

Rebased to support SCC.
I had to use bit 5 in CPol because bit 4 is used by SWZ in some buffer intrinsics which is already a public interface. Not sure why SWZ was combined with cache policy.

See @piotr's comment in https://reviews.llvm.org/D68200#1688133.

I can w/a swz by masking the bit like here, but it still doesn't belong to cache policy. We could argue that we can instead combine all optional operand bits into a single uint64_t which might make sense as it will allow free ordering in the asm, but that will lead to a huge custom code in asm parser too. Setting aside asm parsing problem from MIR representation I still think a single operand for a cache policy, and another for buf properties, and another for mimg properties is better, but I think that ship has sailed.

rampitec updated this revision to Diff 328261.Mar 4 2021, 11:40 AM
rampitec edited the summary of this revision. (Show Details)

Selection is done.
There are still ~200 mir tests to update

rampitec updated this revision to Diff 328334.Mar 4 2021, 4:22 PM
rampitec retitled this revision from [AMDGPU] WIP: use single cache policy operand to [AMDGPU] Use single cache policy operand.
rampitec edited the summary of this revision. (Show Details)

There are still 53 mir tests to update but that is mostly mechanical. I believe beyond that the change is ready.

rampitec updated this revision to Diff 328629.Mar 5 2021, 1:19 PM

The patch is ready now.

PSDB and ePSDB both passed.

This looks good to me - @critson has done a pre-emptive merge/test with this patch on our internal branch for graphics and it looks good.

dp added a comment.Mar 9 2021, 8:28 AM

MC part overall looks good with minor reservations:

  • The change breaks compatibility with existing code. The following code is no longer accepted:
buffer_load_dword v5, off, s[8:11], s3 lds dlc   // accepted if lds and dlc are swapped

Is this an issue? (Fortunately I found no other failures.)

  • With this change assembler will accept duplicate cache policy modifiers. This can be easily amended in some cases, others are more problematic:
global_atomic_add v[3:4], v5, off slc slc              # easy to detect
global_atomic_add v[3:4], v5, off slc noglc noglc      # not so easy to detect
  • Some combinations of modifiers will be accepted though they look weird:
global_atomic_add v[3:4], v5, off slc glc noglc
  • There is a case when assembler incorrectly detects an offending token:
global_atomic_add v[3:4], v5, off slc noglc glc
  • I'd have preferred to enforce cache policy modifiers go in one block (i.e. do not allow them mix with other modifiers). Unfortunately this would result in other incompatibilities with legacy code if implemented.
In D96469#2614150, @dp wrote:

MC part overall looks good with minor reservations:

  • The change breaks compatibility with existing code. The following code is no longer accepted:
buffer_load_dword v5, off, s[8:11], s3 lds dlc   // accepted if lds and dlc are swapped

Yes, although I doubt it was ever used. That was not a best idea to break cache policy blocks in the first place.

Is this an issue? (Fortunately I found no other failures.)

I hope not.

  • With this change assembler will accept duplicate cache policy modifiers. This can be easily amended in some cases, others are more problematic:
global_atomic_add v[3:4], v5, off slc slc              # easy to detect
global_atomic_add v[3:4], v5, off slc noglc noglc      # not so easy to detect
  • Some combinations of modifiers will be accepted though they look weird:
global_atomic_add v[3:4], v5, off slc glc noglc

That is possible to fix I suppose. Initialize CPolSeen = 0 variable at the beginning of the instruction parsing, set the bits for any seen modifier and complain for duplicates.

  • There is a case when assembler incorrectly detects an offending token:
global_atomic_add v[3:4], v5, off slc noglc glc

Should also be covered by CPolSeen I suppose.

  • I'd have preferred to enforce cache policy modifiers go in one block (i.e. do not allow them mix with other modifiers). Unfortunately this would result in other incompatibilities with legacy code if implemented.

In fact I would prefer to allow all modifiers in any order and mix, but that's not so easy unfortunately. We knew it from the beginning though.

dp accepted this revision.Mar 9 2021, 11:10 AM

LGTM

This revision is now accepted and ready to land.Mar 9 2021, 11:10 AM
rampitec updated this revision to Diff 329452.Mar 9 2021, 1:03 PM

Handle duplicate cache policy operands.

@arsenm any comments on the selection part?

arsenm added inline comments.Mar 9 2021, 6:52 PM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1519 ↗(On Diff #329452)

Don't need the .getNode. Also we should be removing this from the complex patterns since they're never used (I thought I made progress on this but don't remember what the issue was)

rampitec updated this revision to Diff 329661.Mar 10 2021, 7:28 AM
rampitec marked an inline comment as done.

Addressed review comment.
Addressed clang-tidy comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1519 ↗(On Diff #329452)

I could not get a value of a default operand if a ComplexPattern is used.

rampitec added inline comments.Mar 10 2021, 11:56 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1519 ↗(On Diff #329452)

Actually it seems to be possible to remove at least some of these...

rampitec updated this revision to Diff 329732.Mar 10 2021, 12:14 PM

Removed ComplexPattern custom selection code related to GLC in MUBUF atomic handling.

rampitec updated this revision to Diff 329766.Mar 10 2021, 1:57 PM

Rebased.
Added fp atomic scc validation to the asm parser.

rampitec updated this revision to Diff 329787.Mar 10 2021, 3:00 PM

Removed CPol from MUBUFAddr64 patterns in couple cases.
Fixed formatting in BUFInstructions.td.
Removed unused and unhandled format argument from some patterns.

In general I think we can remove all CPol, TFE, SWZ arguments of buffer ComplexPatterns, these are all zeroes anyway, and then delete two select functions. It just does not seem to be a part of this change any longer.

Any more comments on selection? I guess it will be easier to further clean SelectMUBUF stuff after this patch.

rampitec updated this revision to Diff 330107.Mar 11 2021, 4:52 PM

Added asm parser validation for SMRD which only can use glc or dlc bits.

kzhuravl accepted this revision.Mar 11 2021, 5:36 PM

Not sure if @arsenm has any additional comments, but it LGTM. Should simplify further merges downstream drastically, thanks!

arsenm accepted this revision.Mar 15 2021, 11:25 AM
This revision was landed with ongoing or failed builds.Mar 15 2021, 1:01 PM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Mar 17 2021, 2:27 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
2306 ↗(On Diff #330773)

What are these new template parameters for? They don't seem to be used for anything.

rampitec marked an inline comment as done.Mar 17 2021, 12:02 PM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/BUFInstructions.td
2306 ↗(On Diff #330773)

Thanks for spotting this! Removed in D98804.

uabelho added inline comments.
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
6586

gcc warns about this being unused:

23:39:40 ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp: At global scope:
23:39:40 ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:6908:20: warning: '{anonymous}::AMDGPUOperand::Ptr {anonymous}::AMDGPUAsmParser::defaultCPol_GLC1() const' defined but not used [-Wunused-function]
23:39:40  6908 | AMDGPUOperand::Ptr AMDGPUAsmParser::defaultCPol_GLC1() const {
23:39:40       |                    ^~~~~~~~~~~~~~~

Will it be used or can it be removed?
rampitec marked an inline comment as done.Mar 19 2021, 8:40 AM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
6586

It's really dead. Removed by https://reviews.llvm.org/rG57effe22050f
Unfortunately clang does not give the same warning as gcc.