This commit adds opcodes for ADD, MUL, AND, ORR, and EOR Base/SIMD/SVE instructions and missing opcodes for FADD and FMUL FP/SIMD/SVE instructions to the isAssociativeAndCommutative function. Also, it removes opcodes for the FMULX instruction, which is not associative (bug fix).
This helps increasing instruction-level parallelism by the existing Machine InstCombiner pass. This supersedes D132828, which implements tree height reduction in a new LLVM IR pass. Advantages of using the existing Machine InstCombiner pass are (1) more precise cost estimation, (2) no redundant process, and (3) less compile-time impact. Disadvantages are (4) per-target isAssociativeAndCommutative implementation and (4) constraints by the instruction set (see comment for MULWrr in AArch64InstrInfo::isAssociativeAndCommutative). In addition, (5) the sequence of instructions may not be optimal in some cases in terms of ILP because the algorithm in TargetInstrInfo::getMachineCombinerPatterns in the Machine InstCombiner pass is simpler than that of D132828. Nonetheless, it generates a fairly good sequence of instructions.
I run C/C++ benchmarks in SPECrate 2017 on Fujitsu A64FX processor, which has two pipelines for integer operations and SIMD/FP operations each. 511.povray_r had 4% improvement. Other benchmarks (int: 500, 502, 505, 520, 523, 525, 531, 541, 557; fp: 508, 510, 519, 538, 544) were within 1% up/down. For a synthetic benchmark, it doubled the performance.