Page MenuHomePhabricator

Add intrinsics and builtins for PTX atomics with semantic orders
Needs ReviewPublic

Authored by t4c1 on Oct 28 2021, 6:59 AM.

Details

Diff Detail

Unit TestsFailed

TimeTest
70 msx64 debian > LLVM.Bindings/Go::go.test
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-go go=/usr/bin/go test llvm.org/llvm/bindings/go/llvm

Event Timeline

t4c1 created this revision.Oct 28 2021, 6:59 AM
t4c1 requested review of this revision.Oct 28 2021, 6:59 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 28 2021, 6:59 AM
bader added a reviewer: tra.Oct 28 2021, 7:30 AM
t4c1 updated this revision to Diff 397047.Jan 3 2022, 5:26 AM

Updated to the latest version that was merged into intel/llvm repo.

tra added inline comments.Jan 6 2022, 2:16 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

We need to figure out how address-space-specific builtins are supposed to work.
Right now two competing approaches.

This patch declares builtins with generic pointer as an argument, but, according to the test, expects to be used with the AS-specific pointer.
It probably does not catch a wrong-AS pointer passed as an argument, either.
It does happen to work, but I think it's mostly due to the fact that LLVM intrinsics are overloaded and we happen to end up addrspacecast'ing things correctly if the builtin gets the right kind of pointer.

The other approach is to declare the pointer with the expected AS. E.g:

TARGET_BUILTIN(__nvvm_mbarrier_init_shared, "vWi*3i", "", AND(SM_80,PTX70))

IMO, this is the correct way to do it, but it is also rather inconvenient to use from CUDA code as it operates on generic pointers.

@jdoerfert - WDYT?

clang/test/CodeGen/builtins-nvptx.c
557

What happens if I pass a wrong pointer kind? E.g. a generic or shared pointer?

t4c1 added inline comments.Jan 6 2022, 11:38 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

TARGET_BUILTIN(__nvvm_mbarrier_init_shared, "vWi*3i", "", AND(SM_80,PTX70))
IMO, this is the correct way to do it, but it is also rather inconvenient to use from CUDA code as it operates on generic pointers.

I tried doing this, however it is also completely unusable from OpenCL code (which is our use case). Trying to use it gives you errors about how casting pointers to different address spaces - for example from local to AS3 is not allowed.

clang/test/CodeGen/builtins-nvptx.c
557

It will silently accept it. I can look into how to output appropriate error message.

tra added inline comments.Jan 7 2022, 10:36 AM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Hmm. It should've worked. It would need the same explicit cast to a pointer in AS(3) as in your tests, but it would safeguard against attempts to pass it a generic pointer. E.g. https://godbolt.org/z/qE6oxzheM

Explicit casting to AS(X) for AS-specific variants is annoying, but it's probably unavoidable at the moment regardless of whether we declare the pointer argument to be AS-specific or not. LLVM will not always be able to infer that a pointer is in particular AS.
Using specific AS in the declaration has a minor benefit of safeguarding at compile time against unintentional use of generic pointers.

Ideally we may want to convert generic variant of the builtin to AS-specific one, if LLVM does know the AS. We currently do this for loads/stores, but not for other instructions.

clang/test/CodeGen/builtins-nvptx.c
557

I guest the bast we can do here is to safeguard against unintentional use of generic pointer.
There's not much we can do if someone explicitly casts a pointer to a wrong AS.

I think declaring pointer arg to be in specific AS would be sufficient.

t4c1 added inline comments.Jan 9 2022, 11:58 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Well, it does not work. See: https://godbolt.org/z/vM6bKncc4. Declaring the pointer to be in generic AS is the only way I got it to work. If you know a way to call a builtin declared with AS numbers in OpenCL, I am happy to do that instead.

tra added inline comments.Jan 10 2022, 11:47 AM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Could you help me understand how different address spaces are handled by OpenCL in clang and LLVM?
What exactly are __local or __private in OpenCL? Do they map to LLVM's address spaces?
Clang does complain that we're trying to change AS, but I do not see AS used in the IR: https://godbolt.org/z/soajoE991

AFAICT what happens is:

  • OpenCL does use explicit AS for pointers (__local, __private seem to pop up in the error messages)
  • __attribute__((address_space(3))) does not match the AS of __local and that leads to an error.
  • generic pointer argument works because we are allowed to cast any specific AS to generic one.

In other words, having specific-AS arguemt works as intended, we just have a mismatch between AS number used by OpenCL and AS number used by NVPTX and we're not allowed to cast between two specific AS.

If that's indeed the case, then what we appear to be missing is the mapping from OpenCL AS to the target-specific AS, which should, ideally, be done by clang itself. So, if NVPTX-specific builtin operating on shared pointer is called with a pointer in OpenCL's equivalent of CUDA's __shared__, it should be mapped to AS(3).

Naghasan added inline comments.
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Could you help me understand how different address spaces are handled by OpenCL in clang and LLVM?

clang makes a strong distinction between language and target address spaces since ~3.6 (was more loose before). Each target in clang defines a map between language and target address space (e.g. https://github.com/llvm/llvm-project/blob/main/clang/lib/Basic/Targets/SPIR.h#L25). The map is used in clang codegen to lower __local to the right target address space.

Clang does complain that we're trying to change AS, but I do not see AS used in the IR: https://godbolt.org/z/soajoE991

Same code but targeting SPIR: https://godbolt.org/z/4E3brzodq (ARM maps all languages address spaces to 0)

having specific-AS arguemt works as intended, we just have a mismatch between AS number used by OpenCL and AS number used by NVPTX and we're not allowed to cast between two specific AS

OpenCL languages do map to nvptx target address space, it is just form clang's semantic a language asp is strongly distinct from target ones (lang asp simply isn't understood as a number). So you can't mix and match without an explicit cast, even if the lang asp eventually lowers to that target.

What is missing is the ability for the target builtin to accept a language asp if it lowers to the target asp.

tra added inline comments.Jan 11 2022, 10:59 AM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

What is missing is the ability for the target builtin to accept a language asp if it lowers to the target asp.

Agreed. I think this should be a sensible thing to support.
Once it's fixed you should be able to both use AS-specific builtins w/o explicit pointer casts and have error checks for the wrong pointer types.

t4c1 added inline comments.Jan 12 2022, 7:38 AM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Ok, I will look into that, but it might take some time.

tra added inline comments.Jan 12 2022, 9:56 AM
clang/include/clang/Basic/BuiltinsNVPTX.def
1057

Thank you for looking into this. These patches will help a lot to fill in the gap we currently have in support for atomics on GPU.