Masked load/store are overloaded intrinsics, the only generic type is the type of the value being loaded/stored. The signature of the intrinsic is generated based on this type. The type of the pointer argument is generated as a pointer to the return type with default addrspace. E.g.:
declare <8 x i32> @llvm.masked.load.v8i32(<8 x i32>*, i32, <8 x i1>, <8 x i32>)
The problem occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The proposed fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. So, there will be something like for an addrspace(1) pointer:
declare <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1)*, i32, <8 x i1>, <8 x i32>)
This patch updates:
- Intrinsics description in Intrinsics.td
- LangRef.rst
- IRBuilder
- Existing testcases
- Adds a test case to masked_load_store.ll which check that a loop with non-default addrspace pointers is correctly vectorized
The problem is relevant for gather/scatter intrinsics as well. This patch doesn't address them because loop-vectorize doesn't emit them yet.
Hm, looking at this mangling, the later bit implies the former. Is there a way to rewrite the definition such that the vector type is inferred from the pointer type?
(The code is correct, just slightly verbose.)