This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enable AtomicExpandPass for NVPTX
ClosedPublic

Authored by tianshilei1992 on May 15 2022, 11:23 AM.

Details

Summary

This patch enables AtomicExpandPass for NVPTX.

Depend on D125652.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM
tianshilei1992 requested review of this revision.May 15 2022, 11:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
257 ↗(On Diff #429550)

Some targets, such as NVPTX, supports atomic load and store of floating-point variables. We don't need to convert it to integer.

tianshilei1992 edited the summary of this revision. (Show Details)May 15 2022, 6:09 PM
tra added inline comments.May 16 2022, 11:23 AM
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5159

According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom

64-bit atom.{and,or,xor,min,max} require sm_32 or higher.

This must be conditional on the GPU variant we're compiling for, similar to how we handle f64 above.

llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll
7 ↗(On Diff #429584)

Your code also adds support for atomics on integers, those should be verified, too.

In general we do want to have tests for all types native to PTX.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5159

Thanks for pointing it out. Will do.

I got a question actually. What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function hasAtomAddF32, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

tra added a comment.May 16 2022, 1:06 PM

What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function hasAtomAddF32, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

SM20 is the current minimum. Note that NVIDIA has already stopped supporting pre-SM35 GPUs and we should probably start considering removing support for sm_2x in LLVM, too.

tianshilei1992 marked an inline comment as done.May 16 2022, 2:07 PM
tra added inline comments.May 16 2022, 4:17 PM
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160

Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them (i8/i16/f16/i128). If not, this should probably be llvm_unreachable().

llvm/test/CodeGen/NVPTX/atomic-expand.ll
1 ↗(On Diff #429842)

Nit: You can specify multiple labels as one option --check-prefixes=A,B,C

I'd also rename the labels.

CHECK->COMMON or ALL -- makes the intent clear.

CHECK-SM30 -> SM30 -- reduces unnecessary clutter and makes it a bit easier to spot the differences between SM30 and SM60.

tianshilei1992 added inline comments.May 17 2022, 9:17 AM
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160

This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can. However, for other types, like i16, with the atomic expand, it's just converted to AtomicCmpSwap of type i16, and then the isel crashed. For now, I guess the best way is to use llvm_unreachable for other cases. In the long term, we may want to support the expansion of types with different bitwidth, but I don't have a clear idea how that would work. Otherwise, we will have to tell the front end that some types are not supported, which kind of breaks the assumption that front end can emit target agnostic code.

tra added inline comments.May 17 2022, 10:51 AM
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160

I don't think CUDA can generate that kind of code,

Why do you believe that's the case? I would assume that if a C++ compilation can, so would CUDA, though we may be sort of lucky and may currently not have appropriate GPU-side overloads for std::atomic functions. We do want to make it work, so it's likely to get fixed sooner or later.

It may still be a good idea to add these extra test cases to the test, too, possibly commented out with a comment explaining what's going on.

tianshilei1992 marked 2 inline comments as done.May 17 2022, 7:58 PM
tianshilei1992 added inline comments.
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160

I added tests for i8 and i16. Another types are not covered because based on https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#type-system they are not supported at the IR level.

tra added inline comments.May 18 2022, 11:19 AM
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160

they are not supported [by NVVM] at the IR level.

NVVM != LLVM, so it's not a particularly strong reason.

LLVM does have limited support for i128 and we should cover it, even if the answer is "it does not work (yet?)".

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

tianshilei1992 marked an inline comment as done.

add i128

tianshilei1992 marked an inline comment as done.May 20 2022, 11:43 AM
tra accepted this revision.May 20 2022, 12:00 PM
This revision is now accepted and ready to land.May 20 2022, 12:00 PM

use setMinCmpXchgSizeInBits(32)

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

Yes, that works. Thanks for the info.

This revision was automatically updated to reflect the committed changes.