Details
- Reviewers
kzhuravl msearles t-tye - Commits
- rG47bac63d3f6b: [AMDGPU] gfx940 memory model
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
8673 | I don't know how to review 2500 lines of text. How is it different from GFX90A? | |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
1486 | Typo "hava" | |
1527 | Typo "hava" | |
1568 | Doesn't really matter but I find != a bit easier to understand than ^ on booleans. | |
1574 | Space after ( | |
1680 | Typo "hava" |
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
8673 | The main change is the replacement of glc/scc/slc with sc0/sc1/nt bits. While looks functionally similar it is conceptually different. Instead of locking individual caches we now requesting coherency at a specific memory scope. |
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
8673 | One approach is to isolate the "Memory Model GFX90A" and "Memory Model GFX940" sections into their own files, and then use your preferred diff-tool with more granular highlighting (i.e. word-based rather than line-based) to compare the two models. At that point the diff is actually very concise. |
I noted some small nits, but otherwise LGTM. I won't accept, as there are others who should probably review.
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
8715 | I think this can be dropped, as the sentence starts with the tgsplit condition. | |
8717 | Is this the correct scope? This seems like the same case as the last bullet, which states sc0. The table below also seems to indicate that sc0 will invalidate L1, not sc1. | |
8893–8895 | Duplicate clause, seems like it can be deleted | |
8912 | It might be useful to note the "re-use" of the sc0 bit in atomircrmw instructions somewhere, maybe above in the long-form overview portion. To give context to the possible confusion, my original comment here was: I'm trying to piece together an understanding of the memory model changes, so bear with me, but why is this not sc0=1 sc1=1? I guess my misunderstanding boils down to the difference in the sc0/sc1 bits between the _load/_store instructions and the _atomic instructions. | |
9642 | Can collapse these two spaces | |
9697 | 3 | |
10881 | Should this be must generate? It occurs in a few places | |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
1543 | I think including a paragraph along these lines in AMDGPUUsage.rst would cover the confusion I originally had |
Scott, I appreciate your doc review. It is a lot of work. Do you mind to fix doc typos in a followup change? It is a real pain to maintain the downstream merge, it will be easier to fix when it is merged back.
I don't know how to review 2500 lines of text. How is it different from GFX90A?