As long as they don't have an address space explicitly defined.
This allows builtins with pointer arguments to be used with OpenCL.
Shall TODO/comment be added somewhere saying that handling address space number of pointers in builtins is still to be implemented?
Also what the number should be for OpenCL? Should it be taken from LangAS::ID enum?
Not sure if this cast might create a problem in some OpenCL-GPU architectures, because spec generally disallows conversion between constant and any other address spaces (see OpenCL C v2.0 s6.5.5).
I feel like a better way to handle this would be to create separate builtins overloads for constant and generic address space in OpenCL v2.0 and for all address spaces in OpenCL <v2.0. But this seems more work to me.
This appears to be implemented already, but there are no builtins using it. What does need to be done is add a way to
specify a named address space, like constant or local, because right now the builtins only accept integer address spaces.
I'm not sure about the cast issue. I do see that for memcpy addrspacecast IR instructions are emitted.
I'm open to adding separate overloads, I just wanted to try a generic solution first.
Could you please look into this. The solution doesn't seem generic to me, because in OpenCL conversions from constant to other address spaces are not allowed. Which means this would most likely be a problem to handle correctly on the later compiler steps. In Clang however, we can avoid this issue by creating different overloads for such builtins and calling directly the right overload instead of having one overload and adding conversions to match it.
not sure how to align this with C, but for OpenCL code we would need something like:
if address space of ptr not given, create multiple signatures of the corresponding builtin.
For example for this builtin:
BUILTIN(builtin_test, "vcC*" , "nc"), Clang would have to add the following to the list of known declarations in CL2.0:
void builtin_test(generic const char* ptr);
and for all earlier OpenCL versions:
void builtin_test(local const char* ptr);
However, your fix might be acceptable for C.
Here is an updated patch with an entirely different approach. Instead of inserting addressspace casts, this patch rewrites the function declaration to match the address space of the call arguments.
Much better approach, a few comments inline.
We need something similar for C. We're already using (in an out-of-tree target) the specified-address-space notation for some things that only apply in one address space, but this causes some issues. For example, we want all of the atomics builtins to support pointers in multiple address spaces, but will need to emit very different machine code (and therefore, different IR) for atomics in different address spaces.
Address space casts have well-defined semantics on our architecture and are not always permitted for arbitrary pointers, so just casting to a different AS would work.
Why not a SmallVector? I don't think this will ever have more than 4 entries (or is it 6 for the atomic compare and exchange?). A SmallVector<QualType, 8> should avoid any heap allocation.
Why not use FT->param_types()?
This check looks expensive. Have you profiled this on large codebases with a lot of builtin calls? I'd imagine that adding a DenseMap cache would be faster.
Is it worth having a fast path for builtins that don't have any pointer params?
Hi, here is an updated patch. I've addressed all comments except I did not add a DensMap to rewriteBuiltinFunctionDecl(), I wasn't sure exactly how to use this to speed up the function. I can still add this if people think it will help. I did some profiling of a function with 100,000 __builtin_memcpy() calls and this patch doesn't seem to increase compile time.
I also did not add handling for return types. I wasn't able to figure out how to get the actual call expression which had the return type used in the code. I added a TODO for this.