FP atomics in system scope cannot be used and shall always
be expanded in a CAS loop.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11952–11957 | Is this actually target specific? I thought this might have been all targets | |
11954–11955 | Isn't there some scope greater or equal function? Should avoid directly referencing one-as |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11952–11957 | Yes, it is specific to this and only this target. | |
11954–11955 | It is, but it is located in the MMI whcih is not yet available here. In fact I don't even have MF yet to query it. Anyway since it is specific to the only target and that target does not have a higher scope I suppose it shall be OK. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Can it be moved out of the MMI? That sounds like a weird place for it |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Any good ideas where to? It needs an LLVMContext, so definitely something which has access to Module. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | It seems like its a target specific context, its own thing separate from MMI. I guess we don't have anything like that now, but I would think it would look like a function returning a struct initialized on the first call |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Technically it depends on the target (scope ordering) and not on a subtarget. At least so far and at least so far we do not envision so drastic changes that would require a subtarget differentiation of the scopes. I guess we would resist as much as we can if that is proposed. So technically it should not need MF, only the Module. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | This needs to happen regardless of scope. We implement these operations as rmw atomic instructions regardless of the scope requested, so they will always be forwarded to the L2. If the memory address happens to have an MTYPE that causes them not to happen in the L2, then the expansion must happen. Since the compiler does not know what memory may be being accessed the expansion must happen for all scopes for all accesses. Unsafe-atomics can relax this for scopes <=agent. It is a promise that such atomics will never be to memory that will not be cached in the L2. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Scopes are target specific. Whether they are hierarchical is target specific. I believe there are target functions to query if the target supports scope inclusion, and will compare them. The SIMemoryLegalizer is using them to determine the memory properties of atomic instructions. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | That's exactly what's written in the code. If unsafe atomics are not enabled it will bail to CAS two lines before. What exactly do you want to change here? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | "scopes are target specific". And? What exactly in the statement they "are target specific" precludes them from being target specific? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Sounded like a question was being asked about scopes being hierarchical so just pointing to SIMemoryLegalizer that does use that concept. But reading the code more closely I see that that is not relevant here. | |
11954–11955 | Sorry, I misread the code so my comment can be ignored. But I did have a question about why the address space is being considered in deciding if expansion is needed. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | It is more about if it is target specific, or subtarget specific. So far it is not subtarget specific, I hope it will stay that way. | |
11954–11955 | That because of the ISA. If we don't have needed instructions we need to expand it. Compare exchange is universally available, but specific instructions only in some address spaces. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | So cmpxchg is only available for FLAT address space? So what happens for GLOBAL address space, doesn't that have the same issue? |
A side note: one in fact needs to be extremely lucky or insistent to get these instructions. First a proper C(++) atomic has to be used. Then denorm mode must match. Then usafe atomics option has to be used. A return value must be ignored on gfx908. Now the scope must be non-default and less than system. I wander if anyone will ever use it at all. Stars really need to align for someone to get it, and then he/she must be careful not to violate all these promises. Oh, well.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Most of the cases end up with cmpxchg, which is universally available in isa and safe. As I said, stars have to align for someone to get it in a form different from a CAS loop. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Comment is ignored, granted. What about "revision needs change" status? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | In general I think all of this discussion about avoiding directly accessing "one-as" roots into the the issue that our scopes are strings and not standard symbolic enums and all the attached frustration. I completely agree, but this is really much bigger than this w/a. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 |
I tried to remove the "revision needs change" status, did I succeed? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
11954–11955 | Yep, thank you! |
I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.
Note what would happen if I cache this value at the first call:
SyncScope::ID LLVMContextImpl::getOrInsertSyncScopeID(StringRef SSN) { auto NewSSID = SSC.size(); assert(NewSSID < std::numeric_limits<SyncScope::ID>::max() && "Hit the maximum number of synchronization scopes allowed!"); return SSC.insert(std::make_pair(SSN, SyncScope::ID(NewSSID))).first->second; }
It will lock numeric value depending on the call history of getOrInsertSyncScopeID(). Caching without mapping to a context will prohibit any part of the compiler to ever call getOrInsertSyncScopeID() and then reuse AMDGPU BE. If the same instance of SIISelLowering will be called with a different module with a different Context cache may contain a wrong value. Besides even that depends on the internal implementation of the Context. AMDGPUMachineModuleInfo is fine do it because it has module context, but anything before MF is created not.
I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better
Isn't there some scope greater or equal function? Should avoid directly referencing one-as