This patch combines improves add/sub instructions that have 24-bit
immediates by turning the MOV-MOV-ADD/SUB into ADDI/SUBI-ADDI/SUBI using the
high and low 12-bit portions of the immediate.
For example, the following code:
int addi(int A) { return A + 0x111333; }
results in the assembly:
addi: // Without combine mov w8, #4915 mov w8, #17, lsl #16 add w0, w0, w8 ret addi: // With combine add w8, w0, #273, lsl #12 add w0, w8, #819 ret
This was implemented by adding patterns to MachineCombinerPattern and
handling the patterns in AArch64InstrInfo::genAlternativeCodeSequence and
AArch64InstrInfo::getMachineCombinerPatterns. The patterns match for scenarios
where the moved-immediate is in operand 1 or 2 of the ADD/SUB, the immediate can
be negated to produce a 24-bit immediate which will change the ADD to SUB and
SUB to ADD, and where a SUBREG_TO_REG is used to promote the i32 register to
a i64 register.
I originally implemented this combine through a TableGen Pat however this
caused some of the MADD combines to fail. With the ADD/SUB combine residing
in the MachineCombiner, MADD combines can be prioritizes when both patterns
exist.
If this design is accepted, ADDS/SUBS patterns could be added in another patch.
Testing:
Each MachineCombinerPattern is tested aarch64-combine-addsub-24bit-imm.mir,
a new file.
The addsub.ll test file has the typical scenarios in LLVM IR.
Other AArch64 test files had to be updated since this new combine was encountered
in those tests.
I ran ninja check-all on the code.
Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use expandMOVImm for that, and check the number of instructions.
The rules for what makes a single instruction are difficult to represent simply, and can be more than a 16bit imm. As in https://godbolt.org/z/KjKhMjb7v.