Attempt to further document the intended cache policies requested
by different combinations of GLC, SLC and DLC bits.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM.
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
881 | Not really related to your patch, but why do we return here? Doesn't that mean that IsNonTemporal is effectively ignored if IsVolatile is true? Wouldn't it be both better and simpler to fall through to the IsNonTemporal handling here? |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
881 | This is a good point. From a bit setting perspective it would be fine. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
881 | I suspect this is because at one time relaxed atomics were marked as volatile. This may have been because the C/C++/OpenCL standards defined them that way, or because LLVM back then did not fully support atomics so marking them all as volatile made the existing passes "do the right thing". So this code may have been an attempt not to pessimize normal relaxed atomics. LLVM does not support non-temporal atomics currently. I am not sure if these reasons are still the case so would be good to investigate and potentially fix this code (or at least document why it is the way it is with a FIXME). That can be a separate review I think. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
844 | Please add back: /// There is no bypass control for the L2 cache at the isa level. The modified comment is only explaining the L1 cache and both caches are involved for system scope. | |
867–868 | How about: // Request L1 cache policy to be MISS_EVICT for load instructions and MISS_LRU for store instructions. Note that there is no L2 cache bypass policy at the isa level. | |
885–887 | I had not remembered that the GLC and SLC bits are used together to set the L1 cache policy. So how about: // Setting GLC and SLC both to 1 sets the L1 cache policy to MISS_EVICT for both loads and stores, and the L2 cache policy to STREAM. and delete the comment below. | |
1108 | MISS_EVICT is a policy that only applies when both SLC and GLC are set. Here only GLC is being set which specifies MISS_LRU and L2 LRU. How about: // Set the L1 cache policy to MISS_LRU. Note that there is no L2 cache bypass policy at the isa level. | |
1219–1220 | How about: // Request L1 cache policy to be MISS_EVICT for load instructions and MISS_LRU for store instructions. Note that there is no L2 cache bypass policy at the isa level. | |
1237–1240 | How about: // Setting GLC and SLC both to 1 sets the L1 cache policy to MISS_EVICT for both loads and stores, and the L2 cache policy to STREAM. | |
1400 | How about: // Set the L0 and L1 cache policies to MISS_EVICT. Note that there is no L2 cache bypass policy at the isa level. | |
1450–1452 | How about: // Request L0 and L1 cache policy to be MISS_EVICT for load instructions and MISS_LRU for store instructions. Note that there is no L2 cache coherent bypass policy at the isa level. | |
1468–1475 | It appears that this should be setting GLC=1 for stores so that L0 will be HIT_EVICT instead of MISS_EVICT. This must not be done for loads as that would make Lo MISS_EVICT. How about: // For loads setting GLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
844 | I deleted that text because there is a bypass for L2 stores and atomics on GFX10: SLC=0 DLC=1. | |
881 | Yes, a separate investigation and review. | |
1468–1475 | Do you have MISS_EVICT and HIT_EVICT flipped in your description? Do you mean: I can add the GLC bit for stores and this ceases to be NFC. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
844 | In GFX10 the bypass is only available for stores and not loads, and is not coherent so cannot be used anyway. That is why I added the word "coherent" in one of the comments below. So probably should do that here too. | |
1468–1475 | I believe I have it right according to the hardware GFX10 memory model spec. // For loads setting SLC to 1 sets the L0 and L1 cache policy to HIT_EVICT and the L2 cache policy to STREAM. For stores setting GLC and SLC both to 1 sets the L0 and L1 cache policy to MISS_EVICT and the L2 cache policy to STREAM. We have to state the policy for L1 too even though the hardware documentation does not state it. The L1 MUST be evict or a subsequent load could see stale data. Yes this changes ceases to be NFC and will need thorough testing. |
- Address reviewer comments.
- Normalise language to use "set" instead of "request".
- Add GLC to non-temporal stores on GFX10 meaning this is no longer NFC.
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
1108 |
This sounds like MISS_EVICT to me. | |
1468–1475 | Technically the L1 only has one policy described as "bypassed (but is coherent)" (RDNA1 Shader ISA p69), but on paper the behaviour of this looks the same as MISS_EVICT. So I guess I can accept just calling it that. Do you have any test cases which use non-temporal? |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
1108 | GFX90A cache policies are different to other GFX9 and GFX10. If GLC=1 then the L1 policy is MISS_LRU for loads: any existing line is invalidated, then the cache line is loaded and remains in the cache with LRU policy. | |
1468–1475 | In the GFX10 memory model spec I have, it does not state the L1 policy for stores, but my understanding is that it behaves as MISS_EVICT. To me "bypass but coherent" is semantically the same thing as MISS_EVICT and it seems odd not to use the terminology that is already well defined. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
868 | I would not add "{write-combine)" as it really is not an exact match for that term. | |
1220 | Would eliminate "(write-combine)" as mentioned above. | |
1451 | Eliminate "(write-cobine)". Add: // Note that there is no L2 cache coherent bypass policy at the isa level. |
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | ||
---|---|---|
1473 | Does this change the documented code sequences: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-memory-model-code-sequences-gfx10-table ? |
Please add back:
The modified comment is only explaining the L1 cache and both caches are involved for system scope.