This is WIP.
G_SHL, G_ASHR, and G_LSHR opcodes have a single type index at the moment, forcing all the operands to have the same LLT.
This is consistent with the corresponding LLVM IR opcodes, but not so much with the apparent target requirements and pre-existing *.td-defined selection rules / instruction patterns.
For instance, most (if not all) AArch64 patterns fall into one of the following categories:
- scalar shifts, having the second source operand ("number of bits to shift") of type i64 (regardless of the type of the main operands, being i32 or i64 most of the time)
- vector shifts, having the second source operand of type i32 (immediate or variable both), regardless of the type of the main operands (number of vector lanes and size of the vector elements both)
In other words, the type of the second source operand is not fixed (it could be i64 or i32), it could be larger or smaller than the type of the main operands, and it could be scalar even if the rest of the instruction operates on vectors.
The existing instruction selection artifacts include (but not limited to):
a) patterns, like the following:
multiclass SIMDVectorLShiftLongBySizeBHSPats<SDPatternOperator ext> { def : Pat<(AArch64vshl (v8i16 (ext (v8i8 V64:$Rn))), (i32 8)), (SHLLv8i8 V64:$Rn)>;
b) immediate predicates, like GIPFP_I64_Predicate_imm0_31
X86 patterns, on the other hand, fall into one of the following categories:
- scalar shifts with the second source typed as i8 (as before, regardless of the types of the rest of the operands)
- vector shifts following LLVM IR scheme with all the operands having the same type
A typical pattern from (1) looks like this:
// x << (32 - y) >> (32 - y) def : Pat<(srl (shl GR32:$src, (i8 (trunc (sub 32, GR32:$lz)))), (i8 (trunc (sub 32, GR32:$lz)))), (BZHI32rr GR32:$src, GR32:$lz)>; def : Pat<(srl (shl (loadi32 addr:$src), (i8 (trunc (sub 32, GR32:$lz)))), (i8 (trunc (sub 32, GR32:$lz)))), (BZHI32rm addr:$src, GR32:$lz)>;
Mips appears to have the second source operand typed as i32 or i64, independent of the rest of the types, as usual.
This discrepancy creates the following issues:
- Non-optimized and mildly-optimized (pre- https://reviews.llvm.org/D44700 patch) Tablegen'erated InstructionSelect's MatchTable contains rules that could not possibly match, forcing targets to implement shifts' selection by hand in C++
- Aggressively optimized MatchTable (post- https://reviews.llvm.org/D44700 patch) contains rules that can and will actually match, but then execute renderers that expect the values having different types (from original SelectionDAG ISel patterns), resulting in miscopmiles.
- Testgen (https://reviews.llvm.org/D43962) generates test-cases that don't represent the actual contents of the MatchTable
(2) and (3) due to the fact that aggressive optimizations and test-generation exploits type constrains as defined by Tablegen'erated MCInstrDesc to reduce the number of type checks performed during selection and properly handle partially optimized match tables respectively.
Issue (1) for x86 is mentioned in the following commit message: https://github.com/llvm-mirror/llvm/commit/5b113a2c3b054e1d894ab9e44a6a08e1d0cd7ff3 (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327499 91177308-0d34-0410-b5e6-96231b3b80d8, https://reviews.llvm.org/D44395), adding as much hand-written C++ selecting shifts as could be seen here: https://github.com/llvm-mirror/llvm/blob/5b113a2c3b054e1d894ab9e44a6a08e1d0cd7ff3/lib/Target/X86/X86InstructionSelector.cpp#L1405-L1482
AArch64 ended up having manually written C++ for selecting all the shifts and G_GEP for GPR RegBank, and the majority of binary ops for FPR RegBank.
I see 2 ways of solving the problem:
- Change GlobalISel Emitter the Tablegen backend so it would intelligently adapt the patterns being imported and re-write them so they would work with existing G_* shifts
- Relax the type constraints for G_* shifts and allow the second source operand to have an independent type
Given the diversity between targets (1) will have to be target-specific, and in any way it will most likely end up being quite complicated and fragile. Also, vector shifts with a scalar shift amount (AArch64) won't be possible, therefore selecting an efficient opcode will only be possible for vector shifts if the vector operand for shift amounts have the same vreg for every vector element, which is again, fragile and will probably require an additional combine to happen more often. It makes more sense, IMO, to allow such mixed shifts on MIR level explicitly.
Also, shifts aren't regular arithmetic/logical binary ops anyway, they aren't commutative nor associative, their second source operand is always an unsigned integer type regardless of the rest of the operands being signed or unsigned, and the corresponding LLVM IR opcodes have special rules regarding poison values WRT that operand. Therefore they require special handling across the selector anyway.
So this patch is to track progress on implementing the solution (2) at the moment, and get it reviewed as soon as it's done.
Even though this may be true, I'd rather we not use unreachable here. If for some reason Tablegen fails to select the user we want to have SDAG try it.