Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Does gcc use the same builtin name? Our general policy is to have the same interface as gcc if we have a builtin. So if gcc has these builtins the should work the same way.
For IFMA I think I made them commutable by swapping the operands between the builtin and the internal intrinsic using the handling for the X86IntrinsicTable
No. We don't sync with GCC on the builtin name during the development. We had a disscussion and decided to not keep them aligned due to 1) target specific builtins are compiler private names that no need to keep it compatible with other compilers; and 2) we already differentiate the target builtins with GCC long ago on the naming, masking etc. Currently, regardless the name, GCC uses the same C, A, B order with our existing implementation. https://gitlab.com/x86-gcc/gcc/-/blob/users/intel/liuhongt/independentfp16_wip/gcc/config/i386/avx512fp16intrin.h#L6672
For IFMA I think I made them commutable by swapping the operands between the builtin and the internal intrinsic using the handling for the X86IntrinsicTable
Is this one https://github.com/llvm/llvm-project/commit/80c8b80919e0049da32f018d98e4d75ff562cfa8? Do you mean I should add Commutative in IntrinsicsX86.td too? What's this flag used for? I saw we only add them in a few intrinsics.
I thought we were pretty consistent on names with gcc for most of sse and avx and most of avx512. The names aren't completely private occasionally users due try to use them. If we happen to have the same name we should have the same behavior to avoid confusion.
For IFMA I think I made them commutable by swapping the operands between the builtin and the internal intrinsic using the handling for the X86IntrinsicTable
Is this one https://github.com/llvm/llvm-project/commit/80c8b80919e0049da32f018d98e4d75ff562cfa8? Do you mean I should add Commutative in IntrinsicsX86.td too? What's this flag used for? I saw we only add them in a few intrinsics.
I was referring to 47e14ead54d229aa8ac5436594e91bfb8943230e for AVX512IFMA
The flag in IntrinsicsX86.td is used if the int_x86_* is used in the isel pattern, but I don't know if we have any of those any more.
I'm not so optimistic. I had a coarse-grained statistics on the use of x86 builtins in Clang and GCC. It shows Clang only defines 2/3 of GCC's builtins and 1/4 of Clang builtins have different names with GCC's. Command below:
cat gcc/config/i386/*.h | grep -o "\b__builtin_ia32_\w\+" |sort|uniq|tee gcc.txt|wc -l 2788 ls clang/lib/Headers/*.h |grep -v fp16 |xargs cat |grep -o "\b__builtin_ia32_\w\+" |sort|uniq|tee clang.txt|wc -l 1808 comm -12 gcc.txt clang.txt |wc -l 1347
Regarding this case, we already have a different name with GCC, I think it worthwhile to use a different order for the swapping optimization.
With a bit research on AVX512IFMA, I found:
- The use of C, A, B order in GCC is not consistent on its AVX512IFMA builtins. It supposes GCC should change to A, B, C order if considering consistency;
- We aren't consistent on AVX512IFMA builtins with GCC either due to the use of select.
By the way, GCC folks told me GCC has ability to specify arbitrary operands that can be commutative. But I found both SDNode and MI only have ability on the first 2 operands, which is insufficient for instruction like CFMA. Do you know if we have other mechanism for commutable operands?
Regarding this case, we already have a different name with GCC, I think it worthwhile to use a different order for the swapping optimization.
With a bit research on AVX512IFMA, I found:
- The use of C, A, B order in GCC is not consistent on its AVX512IFMA builtins. It supposes GCC should change to A, B, C order if considering consistency;
- We aren't consistent on AVX512IFMA builtins with GCC either due to the use of select.
Do we have any builtins with the same name as gcc but different operands/behaviours? Those are the only ones that I'd be worried about.
By the way, GCC folks told me GCC has ability to specify arbitrary operands that can be commutative. But I found both SDNode and MI only have ability on the first 2 operands, which is insufficient for instruction like CFMA. Do you know if we have other mechanism for commutable operands?
Doesn't X86InstrInfo::findCommutedOpIndices handle these cases?
Not implementing something that gcc does at least gives a compile error if some tries to use the gcc name so I’m fine with that.
Using a different name in clang for something gcc also has is kinda silly. But I guess we’re worse at this then I thought.
Using the same name and having different behavior should be avoided if the it won’t give a compile error.
Regarding this case, we already have a different name with GCC, I think it worthwhile to use a different order for the swapping optimization.
With a bit research on AVX512IFMA, I found:
- The use of C, A, B order in GCC is not consistent on its AVX512IFMA builtins. It supposes GCC should change to A, B, C order if considering consistency;
- We aren't consistent on AVX512IFMA builtins with GCC either due to the use of select.
By the way, GCC folks told me GCC has ability to specify arbitrary operands that can be commutative. But I found both SDNode and MI only have ability on the first 2 operands, which is insufficient for instruction like CFMA. Do you know if we have other mechanism for commutable operands?
That’s true for SDNode, but you can always manually add more isel patterns. We do this for FMA3.
MI uses two target specific hooks in X86InstrInfo.cpp. findCommutedOpIndices and commuteInstructionImpl are the names if I remember right.
It seems in this patch the builtins interface is aligned to intrinsics interface. Since AVX512FP16 is pretty new, I assume nobody is using the GCC builtin. Can we ask GCC guys change their builtin interface?
Do we have any builtins with the same name as gcc but different operands/behaviours? Those are the only ones that I'd be worried about.
I think it's rare in existing intrinsics. 1) The builtins are always straightforward passed the arguments in intrinsics in the same order; 2) We do have a few that the the order between builtins and arguments is different, I checked several manually and didn't find any problem.
I think it's a good question for new enabled ISAs, I will check FP16 when GCC patches landed.
Using a different name in clang for something gcc also has is kinda silly. But I guess we’re worse at this then I thought.
Using the same name and having different behavior should be avoided if the it won’t give a compile error.
We may diversify a few on the names with GCC in FP16, I think we can fix them once GCC landed as a nice to have.
For IFMA I think I made them commutable by swapping the operands between the builtin and the internal intrinsic using the handling for the X86IntrinsicTable
I think this is a simply approach if we want to keep the builtin in a special C, A, B order. But I don't prefer this way because we need to swapped the order 3 times, intrinsic to builtin, builtin to SDNode, SDNode to MI. Which may result confusion for future developer.
It seems in this patch the builtins interface is aligned to intrinsics interface. Since AVX512FP16 is pretty new, I assume nobody is using the GCC builtin. Can we ask GCC guys change their builtin interface?
I think we should make an agreement with them. Just found we are using different order on cfmul builtins already.
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2972 | Why is this intrinsic written like this? We can't evaluate macro arguments like A and U twice. It can cause surprising results. |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2972 | Thanks Craig! You are right, we can't evaluate them twice. But we have to use them twice, because we want to provide user the same functionality as former FMA intrinsics. Maybe we need to define and use temp variables. |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2972 | Is it possible to just have single builtin call and create all the IR from CGBuiltin? |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2972 | Yeah, that's a good idea. Thanks! |
By the way, we synced with GCC and we are using the same order in the builtins now. https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=db3b96df03fdbe2fb770729501e2e9b65e66c2da;hp=ed643e9f171e99b0aa1453b3f29ed1103e9b5c80
We still have some different builtin names due to historical reasons, e.g. __builtin_ia32_vfmulcsh_mask_round (gcc), __builtin_ia32_vfcmulcsh_mask (llvm). I think it's OK for now.
Can we let this patch in? I will solve the multi evaluation problem in another patch.
Why is this intrinsic written like this? We can't evaluate macro arguments like A and U twice. It can cause surprising results.