This patch attempts to fix a regression caused by https://github.com/llvm/llvm-project/commit/d6f994acb3d545b80161e24ab742c9c69d4bbf33. In particular, always_inline should work on callees without VLX from VLX functions. I found this testing clang-17-rc2. If accepted, please also apply this to the LLVM 17.0.0 branch.
Diff Detail
Event Timeline
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | I assume that this is the actually failing check? In that case, should the adjustment be in that function? |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | I am not sure. That function is checking that the used types are ABI compatible, but what about available target instructions? Is the idea to first only check the types, then later during CodeGen the target does a more exhaustive check? Fixing areTypesABICompatible() would be more general I guess. Do you have an idea how much work is involved properly fixing that function? I am not familiar with this code base at all. Do you have anything against applying this patch as an incremental improvement? |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | I guess the feature subset check covers all cases of available target instructions. So just making sure all the types are compatible, like you suggested, would be the proper fix. |
There is also another related issue: When areInlineCompatible() returns false here, a callee with attribute always_inline is silently not inlined. It seems like clang should emit and error or at least a warning in this case. I only noticed this because my code was running 10x slower compiling with clang 17.
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | The function is designed to check if both caller and callee allow ZMM codegen or not when passing a 512-bit vector type. The logic to check ZMM codegen is:
So has AVX512VL or not does affect the ABI when prefer-vector-width and min-legal-vector-width are set 256 so less, though it shouldn't happen since we have guarantee they are set properly in the FE. |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | I am a bit confused. A function with both AVX512F and AVX512VL features obviously allows ZMM codegen in the function body. Why should enabling AVX512VL change how 512-bit vector types are passed? Even if that were the case for an opaque function, why should that disabling inlining? Also I don't see how (1)-(3) should ever affect`always_inline`. |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6082 | We have an option -mprefer-vector-width=128/256, which prefers to 128/256 vector instructions under AVX512VL. Under such situation, 512-bit vector may be split into 256-bit when passing. No sure if it is the reason we disable inlining. |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6058 | Thinking about this more. Why shouldn't this function just end at this line? ISTM that only features should control if inlining is _possible_ or not. I don't see how checking for ABI compatibility between types is helpful. Once the callee gets inlined, it can take on the ABI of the caller anyway. |
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
6058 | It's good for caller has FeatureVLX but callee doesn't, but maybe we need to check ABI compatibility for the both direction. The reason we need to check type is only vector type affected by feature difference. |
Thinking about this more. Why shouldn't this function just end at this line? ISTM that only features should control if inlining is _possible_ or not. I don't see how checking for ABI compatibility between types is helpful. Once the callee gets inlined, it can take on the ABI of the caller anyway.