This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always expand system scope fp atomics on gfx90a
ClosedPublic

Authored by rampitec on Mar 5 2021, 3:33 PM.

Details

Summary

FP atomics in system scope cannot be used and shall always
be expanded in a CAS loop.

Diff Detail

Event Timeline

rampitec created this revision.Mar 5 2021, 3:33 PM
rampitec requested review of this revision.Mar 5 2021, 3:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 3:34 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Mar 5 2021, 5:14 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11952–11957

Is this actually target specific? I thought this might have been all targets

11954–11955

Isn't there some scope greater or equal function? Should avoid directly referencing one-as

rampitec added inline comments.Mar 5 2021, 5:18 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11952–11957

Yes, it is specific to this and only this target.

11954–11955

It is, but it is located in the MMI whcih is not yet available here. In fact I don't even have MF yet to query it.

Anyway since it is specific to the only target and that target does not have a higher scope I suppose it shall be OK.

arsenm added inline comments.Mar 5 2021, 5:24 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Can it be moved out of the MMI? That sounds like a weird place for it

rampitec added inline comments.Mar 5 2021, 5:27 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Any good ideas where to? It needs an LLVMContext, so definitely something which has access to Module.

arsenm added inline comments.Mar 5 2021, 5:37 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

It seems like its a target specific context, its own thing separate from MMI. I guess we don't have anything like that now, but I would think it would look like a function returning a struct initialized on the first call

rampitec added inline comments.Mar 5 2021, 5:42 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Technically it depends on the target (scope ordering) and not on a subtarget. At least so far and at least so far we do not envision so drastic changes that would require a subtarget differentiation of the scopes. I guess we would resist as much as we can if that is proposed. So technically it should not need MF, only the Module.

t-tye requested changes to this revision.Mar 5 2021, 5:43 PM
t-tye added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

This needs to happen regardless of scope. We implement these operations as rmw atomic instructions regardless of the scope requested, so they will always be forwarded to the L2. If the memory address happens to have an MTYPE that causes them not to happen in the L2, then the expansion must happen. Since the compiler does not know what memory may be being accessed the expansion must happen for all scopes for all accesses.

Unsafe-atomics can relax this for scopes <=agent. It is a promise that such atomics will never be to memory that will not be cached in the L2.

This revision now requires changes to proceed.Mar 5 2021, 5:43 PM
t-tye added inline comments.Mar 5 2021, 5:46 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Scopes are target specific. Whether they are hierarchical is target specific. I believe there are target functions to query if the target supports scope inclusion, and will compare them. The SIMemoryLegalizer is using them to determine the memory properties of atomic instructions.

rampitec added inline comments.Mar 5 2021, 5:49 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

That's exactly what's written in the code. If unsafe atomics are not enabled it will bail to CAS two lines before. What exactly do you want to change here?

rampitec added inline comments.Mar 5 2021, 5:52 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

"scopes are target specific". And? What exactly in the statement they "are target specific" precludes them from being target specific?

t-tye added inline comments.Mar 5 2021, 6:04 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Sounded like a question was being asked about scopes being hierarchical so just pointing to SIMemoryLegalizer that does use that concept. But reading the code more closely I see that that is not relevant here.

11954–11955

Sorry, I misread the code so my comment can be ignored.

But I did have a question about why the address space is being considered in deciding if expansion is needed.

rampitec added inline comments.Mar 5 2021, 6:10 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

It is more about if it is target specific, or subtarget specific. So far it is not subtarget specific, I hope it will stay that way.

11954–11955

That because of the ISA. If we don't have needed instructions we need to expand it. Compare exchange is universally available, but specific instructions only in some address spaces.

t-tye added inline comments.Mar 5 2021, 6:19 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

So cmpxchg is only available for FLAT address space? So what happens for GLOBAL address space, doesn't that have the same issue?

A side note: one in fact needs to be extremely lucky or insistent to get these instructions. First a proper C(++) atomic has to be used. Then denorm mode must match. Then usafe atomics option has to be used. A return value must be ignored on gfx908. Now the scope must be non-default and less than system. I wander if anyone will ever use it at all. Stars really need to align for someone to get it, and then he/she must be careful not to violate all these promises. Oh, well.

rampitec added inline comments.Mar 5 2021, 6:27 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Most of the cases end up with cmpxchg, which is universally available in isa and safe. As I said, stars have to align for someone to get it in a form different from a CAS loop.

rampitec added inline comments.Mar 5 2021, 6:31 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Comment is ignored, granted. What about "revision needs change" status?

t-tye resigned from this revision.Mar 5 2021, 6:44 PM

Discussed offline.

This revision now requires review to proceed.Mar 5 2021, 6:44 PM
rampitec added inline comments.Mar 5 2021, 11:15 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

In general I think all of this discussion about avoiding directly accessing "one-as" roots into the the issue that our scopes are strings and not standard symbolic enums and all the attached frustration. I completely agree, but this is really much bigger than this w/a.

t-tye added inline comments.Mar 6 2021, 10:32 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

What about "revision needs change" status?

I tried to remove the "revision needs change" status, did I succeed?

rampitec added inline comments.Mar 6 2021, 10:35 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955

Yep, thank you!

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

Note what would happen if I cache this value at the first call:

SyncScope::ID LLVMContextImpl::getOrInsertSyncScopeID(StringRef SSN) {
  auto NewSSID = SSC.size();
  assert(NewSSID < std::numeric_limits<SyncScope::ID>::max() &&
         "Hit the maximum number of synchronization scopes allowed!");
  return SSC.insert(std::make_pair(SSN, SyncScope::ID(NewSSID))).first->second;
}

It will lock numeric value depending on the call history of getOrInsertSyncScopeID(). Caching without mapping to a context will prohibit any part of the compiler to ever call getOrInsertSyncScopeID() and then reuse AMDGPU BE. If the same instance of SIISelLowering will be called with a different module with a different Context cache may contain a wrong value. Besides even that depends on the internal implementation of the Context. AMDGPUMachineModuleInfo is fine do it because it has module context, but anything before MF is created not.

arsenm added a comment.Mar 9 2021, 6:00 AM

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

Note what would happen if I cache this value at the first call:

SyncScope::ID LLVMContextImpl::getOrInsertSyncScopeID(StringRef SSN) {
  auto NewSSID = SSC.size();
  assert(NewSSID < std::numeric_limits<SyncScope::ID>::max() &&
         "Hit the maximum number of synchronization scopes allowed!");
  return SSC.insert(std::make_pair(SSN, SyncScope::ID(NewSSID))).first->second;
}

It will lock numeric value depending on the call history of getOrInsertSyncScopeID(). Caching without mapping to a context will prohibit any part of the compiler to ever call getOrInsertSyncScopeID() and then reuse AMDGPU BE. If the same instance of SIISelLowering will be called with a different module with a different Context cache may contain a wrong value. Besides even that depends on the internal implementation of the Context. AMDGPUMachineModuleInfo is fine do it because it has module context, but anything before MF is created not.

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

I completely agree, and this really does not belong to this patch.

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

I completely agree, and this really does not belong to this patch.

Probably something like this: D98304?

arsenm accepted this revision.Mar 10 2021, 11:54 AM
This revision is now accepted and ready to land.Mar 10 2021, 11:54 AM
This revision was automatically updated to reflect the committed changes.