This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineCombinerPattern.h
-
lib/
-
CodeGen/
-
MachineCombiner.cpp
-
Target/AArch64/
-
AArch64/
3
AArch64InstrInfo.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
2
aarch64-combine-addsub-24bit-imm.mir
1
aarch64-combine-addsub-imm-reject-loop.mir
-
addsub.ll
-
Transforms/CodeGenPrepare/AArch64/
-
CodeGenPrepare/
-
AArch64/
-
large-offset-gep.ll

Differential D116468

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.
AbandonedPublic

Authored by red1bluelost on Dec 31 2021, 5:58 PM.

Download Raw Diff

Details

Reviewers

dmgreen
asavonic
SjoerdMeijer
paquette
benshi001

Summary

This patch combines improves add/sub instructions that have 24-bit
immediates by turning the MOV-MOV-ADD/SUB into ADDI/SUBI-ADDI/SUBI using the
high and low 12-bit portions of the immediate.

For example, the following code:

int addi(int A) { return A + 0x111333; }

results in the assembly:

addi:                        // Without combine
        mov w8, #4915
        mov w8, #17, lsl #16
        add w0, w0, w8
        ret
addi:                        // With combine
        add w8, w0, #273, lsl #12
        add w0, w8, #819
        ret

This was implemented by adding patterns to MachineCombinerPattern and
handling the patterns in AArch64InstrInfo::genAlternativeCodeSequence and
AArch64InstrInfo::getMachineCombinerPatterns. The patterns match for scenarios
where the moved-immediate is in operand 1 or 2 of the ADD/SUB, the immediate can
be negated to produce a 24-bit immediate which will change the ADD to SUB and
SUB to ADD, and where a SUBREG_TO_REG is used to promote the i32 register to
a i64 register.

I originally implemented this combine through a TableGen Pat however this
caused some of the MADD combines to fail. With the ADD/SUB combine residing
in the MachineCombiner, MADD combines can be prioritizes when both patterns
exist.

If this design is accepted, ADDS/SUBS patterns could be added in another patch.

Testing:

Each MachineCombinerPattern is tested aarch64-combine-addsub-24bit-imm.mir,
a new file.

The addsub.ll test file has the typical scenarios in LLVM IR.

Other AArch64 test files had to be updated since this new combine was encountered
in those tests.

I ran ninja check-all on the code.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

red1bluelost created this revision.Dec 31 2021, 5:58 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptDec 31 2021, 5:58 PM

red1bluelost requested review of this revision.Dec 31 2021, 5:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2021, 5:58 PM

red1bluelost edited the summary of this revision. (Show Details)Dec 31 2021, 6:01 PM

Harbormaster completed remote builds in B141158: Diff 396842.Dec 31 2021, 6:41 PM

Oh interesting. This is similar to D111034, but that was reverted again. I'm not sure why, apparently MIPeephole optimizations are all too easy to get wrong.

The problem with doing this is when it is done in a loop. Something like this example: https://godbolt.org/z/e5f4hWGcq, where preferably the loop invariant mov can be hoisted out of the loop, leaving a single add. Can we make sure there is a test case for that example, and try and guard against it?

Also, I think a ADD+MOVi16 might be slightly better than a ADD+ADD. (As in - the MOV is a 16bit imm that can be materialized with a single instruction). The two ADDs might make a longer critical path. It's if the MOV pseudo would need multiple instructions that the add becomes beneficial.

Thanks Dave! I didn't know about all that.

I'll give those a look, add some test cases, and update this patch if it can work out.

Addresses feedback for patch.

A test file, aarch64-combine-addsub-imm-reject-loop.mir, was added to check
if the ADD/SUB combine would affect loop invariants hoisting, showing a regression.
To fix the regression, a CombinerObjective was added for these
MachineCombinerPatterns that they must not exist in a loop. During machine
combining, these patterns will check if the Root is inside a loop, skipping the
combine if that condition holds.

To address the priority to keep 16-bit immediates as MOV-ADD, a few tests were
added to aarch64-combine-addsub-24bit-imm.mir and code modifications. Now, 24-bit
immediates must use bits 23-16 so that it guarentees the combine only reduces
MOV-MOV-ADD to ADDI-ADDI.

Forgot one of the files in the update.

Sorry, I'm getting used to patches and arcanist.

Harbormaster completed remote builds in B141428: Diff 397195.Jan 3 2022, 9:52 PM

dmgreen added a reviewer: benshi001.Jan 5 2022, 8:39 AM

dmgreen added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
4795	Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use expandMOVImm for that, and check the number of instructions. The rules for what makes a single instruction are difficult to represent simply, and can be more than a 16bit imm. As in https://godbolt.org/z/KjKhMjb7v.
4838	Should this be a uint64_t?
5560	LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like MF/MRI/TII.
llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir
2	I don't think this needs -O0
343	I think this might be negative of the correct result. (But equally I don't think that SUBXrr will be present at this point in the pipeline. It will be speculatively emitted as a SUBSXrr as that may be used as a compare).
llvm/test/CodeGen/AArch64/aarch64-combine-addsub-imm-reject-loop.mir
2	This one might be better as a .ll file. That way we test the entire backend, and can see obvious regressions when the instruction count increases.

In D116468#3217217, @dmgreen wrote:

Oh interesting. This is similar to D111034, but that was reverted again. I'm not sure why, apparently MIPeephole optimizations are all too easy to get wrong.

The problem with doing this is when it is done in a loop. Something like this example: https://godbolt.org/z/e5f4hWGcq, where preferably the loop invariant mov can be hoisted out of the loop, leaving a single add. Can we make sure there is a test case for that example, and try and guard against it?

Also, I think a ADD+MOVi16 might be slightly better than a ADD+ADD. (As in - the MOV is a 16bit imm that can be materialized with a single instruction). The two ADDs might make a longer critical path. It's if the MOV pseudo would need multiple instructions that the add becomes beneficial.

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

In D116468#3230617, @benshi001 wrote:

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

Ok. I will try to see if I can find the bug with the Peephole if possible. I have been testing it this weekend through the Linaro Docker container mentioned in D111034 but using QEMU emulation of AArch64. It seems like it can replicate the aarch64-linux host environment but is just really slow. (I don't have an aarch64 host either).

I'll also see about finishing up the MachineCombine approach and testing it too, incase it would be easier to just finish this approach.

If I manage to fix the Peephole approach, should I edit D111034? Or should I start a new patch review for the Peephole?

In D116468#3233108, @red1bluelost wrote:

In D116468#3230617, @benshi001 wrote:

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

Ok. I will try to see if I can find the bug with the Peephole if possible. I have been testing it this weekend through the Linaro Docker container mentioned in D111034 but using QEMU emulation of AArch64. It seems like it can replicate the aarch64-linux host environment but is just really slow. (I don't have an aarch64 host either).

I'll also see about finishing up the MachineCombine approach and testing it too, incase it would be easier to just finish this approach.

If I manage to fix the Peephole approach, should I edit D111034? Or should I start a new patch review for the Peephole?

I suggest you start a new patch review request and mention D111034 :)

Since we already have https://reviews.llvm.org/D117429, can we abandon this ?

Abandoning revision in favor of D117429.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineCombinerPattern.h

25 lines

lib/

CodeGen/

MachineCombiner.cpp

32 lines

Target/

AArch64/

AArch64InstrInfo.cpp

339 lines

test/

CodeGen/

AArch64/

aarch64-combine-addsub-24bit-imm.mir

351 lines

aarch64-combine-addsub-imm-reject-loop.mir

123 lines

addsub.ll

40 lines

Transforms/

CodeGenPrepare/

AArch64/

large-offset-gep.ll

5 lines

Diff 397195

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	enum class MachineCombinerPattern {
MULADDWI_OP1,		MULADDWI_OP1,
MULSUBWI_OP1,		MULSUBWI_OP1,
MULADDX_OP1,		MULADDX_OP1,
MULADDX_OP2,		MULADDX_OP2,
MULSUBX_OP1,		MULSUBX_OP1,
MULSUBX_OP2,		MULSUBX_OP2,
MULADDXI_OP1,		MULADDXI_OP1,
MULSUBXI_OP1,		MULSUBXI_OP1,
		// 24-bit imm add/sub patterns matched by the AArch64 machine combiner.
		ADDW_MOVi32imm_OP1,
		ADDW_MOVi32imm_OP2,
		ADDW_negMOVi32imm_OP1,
		ADDW_negMOVi32imm_OP2,
		ADDX_StR_MOVi32imm_OP1,
		ADDX_StR_MOVi32imm_OP2,
		ADDX_StR_negMOVi32imm_OP1,
		ADDX_StR_negMOVi32imm_OP2,
		ADDX_MOVi64imm_OP1,
		ADDX_MOVi64imm_OP2,
		ADDX_negMOVi64imm_OP1,
		ADDX_negMOVi64imm_OP2,
		SUBW_MOVi32imm_OP1,
		SUBW_MOVi32imm_OP2,
		SUBW_negMOVi32imm_OP1,
		SUBW_negMOVi32imm_OP2,
		SUBX_StR_MOVi32imm_OP1,
		SUBX_StR_MOVi32imm_OP2,
		SUBX_StR_negMOVi32imm_OP1,
		SUBX_StR_negMOVi32imm_OP2,
		SUBX_MOVi64imm_OP1,
		SUBX_MOVi64imm_OP2,
		SUBX_negMOVi64imm_OP1,
		SUBX_negMOVi64imm_OP2,
// NEON integers vectors		// NEON integers vectors
MULADDv8i8_OP1,		MULADDv8i8_OP1,
MULADDv8i8_OP2,		MULADDv8i8_OP2,
MULADDv16i8_OP1,		MULADDv16i8_OP1,
MULADDv16i8_OP2,		MULADDv16i8_OP2,
MULADDv4i16_OP1,		MULADDv4i16_OP1,
MULADDv4i16_OP2,		MULADDv4i16_OP2,
MULADDv8i16_OP1,		MULADDv8i16_OP1,
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	unsigned MachineCombiner::getLatency(MachineInstr Root, MachineInstr NewRoot,
return NewRootLatency;		return NewRootLatency;
}		}

/// The combiner's goal may differ based on which pattern it is attempting		/// The combiner's goal may differ based on which pattern it is attempting
/// to optimize.		/// to optimize.
enum class CombinerObjective {		enum class CombinerObjective {
MustReduceDepth, // The data dependency chain must be improved.		MustReduceDepth, // The data dependency chain must be improved.
MustReduceRegisterPressure, // The register pressure must be reduced.		MustReduceRegisterPressure, // The register pressure must be reduced.
		MustNotExistInLoop, // The pattern must exist outside a loop
Default // The critical path must not be lengthened.		Default // The critical path must not be lengthened.
};		};

static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {		static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {
// TODO: If C++ ever gets a real enum class, make this part of the		// TODO: If C++ ever gets a real enum class, make this part of the
// MachineCombinerPattern class.		// MachineCombinerPattern class.
switch (P) {		switch (P) {
case MachineCombinerPattern::REASSOC_AX_BY:		case MachineCombinerPattern::REASSOC_AX_BY:
case MachineCombinerPattern::REASSOC_AX_YB:		case MachineCombinerPattern::REASSOC_AX_YB:
case MachineCombinerPattern::REASSOC_XA_BY:		case MachineCombinerPattern::REASSOC_XA_BY:
case MachineCombinerPattern::REASSOC_XA_YB:		case MachineCombinerPattern::REASSOC_XA_YB:
case MachineCombinerPattern::REASSOC_XY_AMM_BMM:		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
return CombinerObjective::MustReduceDepth;		return CombinerObjective::MustReduceDepth;
case MachineCombinerPattern::REASSOC_XY_BCA:		case MachineCombinerPattern::REASSOC_XY_BCA:
case MachineCombinerPattern::REASSOC_XY_BAC:		case MachineCombinerPattern::REASSOC_XY_BAC:
return CombinerObjective::MustReduceRegisterPressure;		return CombinerObjective::MustReduceRegisterPressure;
		case MachineCombinerPattern::ADDW_MOVi32imm_OP1:
		case MachineCombinerPattern::ADDW_MOVi32imm_OP2:
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP1:
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP2:
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP1:
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP2:
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP1:
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP2:
		case MachineCombinerPattern::ADDX_MOVi64imm_OP1:
		case MachineCombinerPattern::ADDX_MOVi64imm_OP2:
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP1:
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP2:
		case MachineCombinerPattern::SUBW_MOVi32imm_OP1:
		case MachineCombinerPattern::SUBW_MOVi32imm_OP2:
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP1:
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP2:
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP1:
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP2:
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP1:
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP2:
		case MachineCombinerPattern::SUBX_MOVi64imm_OP1:
		case MachineCombinerPattern::SUBX_MOVi64imm_OP2:
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP1:
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP2:
		return CombinerObjective::MustNotExistInLoop;
default:		default:
return CombinerObjective::Default;		return CombinerObjective::Default;
}		}
}		}

/// Estimate the latency of the new and original instruction sequence by summing		/// Estimate the latency of the new and original instruction sequence by summing
/// up the latencies of the inserted and deleted instructions. This assumes		/// up the latencies of the inserted and deleted instructions. This assumes
/// that the inserted and deleted instructions are dependent instruction chains,		/// that the inserted and deleted instructions are dependent instruction chains,
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	while (BlockIter != MBB->end()) {

if (!TII->getMachineCombinerPatterns(MI, Patterns, DoRegPressureReduce))		if (!TII->getMachineCombinerPatterns(MI, Patterns, DoRegPressureReduce))
continue;		continue;

if (VerifyPatternOrder)		if (VerifyPatternOrder)
verifyPatternOrder(MBB, MI, Patterns);		verifyPatternOrder(MBB, MI, Patterns);

for (auto P : Patterns) {		for (auto P : Patterns) {
		// Skip this pattern when inside a loop since it might detriment lifting
		// of loop invariants
		if (getCombinerObjective(P) == CombinerObjective::MustNotExistInLoop &&
		ML && ML->contains(&MI))
		continue;

SmallVector<MachineInstr *, 16> InsInstrs;		SmallVector<MachineInstr *, 16> InsInstrs;
SmallVector<MachineInstr *, 16> DelInstrs;		SmallVector<MachineInstr *, 16> DelInstrs;
DenseMap<unsigned, unsigned> InstrIdxForVirtReg;		DenseMap<unsigned, unsigned> InstrIdxForVirtReg;
TII->genAlternativeCodeSequence(MI, P, InsInstrs, DelInstrs,		TII->genAlternativeCodeSequence(MI, P, InsInstrs, DelInstrs,
InstrIdxForVirtReg);		InstrIdxForVirtReg);
unsigned NewInstCount = InsInstrs.size();		unsigned NewInstCount = InsInstrs.size();
unsigned OldInstCount = DelInstrs.size();		unsigned OldInstCount = DelInstrs.size();
// Found pattern, but did not generate alternative sequence.		// Found pattern, but did not generate alternative sequence.
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,765 Lines • ▼ Show 20 Lines	case AArch64::SUBv4i32:
setVFound(AArch64::MULv4i32, 1, MCP::MULSUBv4i32_OP1);		setVFound(AArch64::MULv4i32, 1, MCP::MULSUBv4i32_OP1);
setVFound(AArch64::MULv4i32, 2, MCP::MULSUBv4i32_OP2);		setVFound(AArch64::MULv4i32, 2, MCP::MULSUBv4i32_OP2);
setVFound(AArch64::MULv4i32_indexed, 1, MCP::MULSUBv4i32_indexed_OP1);		setVFound(AArch64::MULv4i32_indexed, 1, MCP::MULSUBv4i32_indexed_OP1);
setVFound(AArch64::MULv4i32_indexed, 2, MCP::MULSUBv4i32_indexed_OP2);		setVFound(AArch64::MULv4i32_indexed, 2, MCP::MULSUBv4i32_indexed_OP2);
break;		break;
}		}
return Found;		return Found;
}		}

		/// getAddSub24Patterns - Find instructions ADD/SUB instructions that have a
		/// 24-bit immediate moved into its operand and change those to make two ADD/SUB
		/// instructions with 12-bit immediates encoded.
		/// \param Root the current instruction to check if it is an ADD/SUB that can be
		/// combined
		/// \param [out] Patterns the list of patterns for the pattern evaluator
		/// \return true iff there is an ADD/SUB that can be combined
		static bool
		getAddSub24Patterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns) {
		unsigned Opc = Root.getOpcode();
		MachineBasicBlock &MBB = *Root.getParent();
		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
		bool Found = false;

		using MCP = MachineCombinerPattern;

		auto MatchImm = [&](unsigned Imm, MCP Pat, MCP NPat) {
		// Only check bits 23:16 (not 23:12) so that a single MOV16-ADD is preferred
		// over ADDI-ADDI
		if (!(Imm & ~0x00ffffff) && (Imm & 0x00ff0000) && (Imm & 0x00000fff)) {
		dmgreenUnsubmitted Not Done Reply Inline Actions Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use expandMOVImm for that, and check the number of instructions. The rules for what makes a single instruction are difficult to represent simply, and can be more than a 16bit imm. As in https://godbolt.org/z/KjKhMjb7v. dmgreen: Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use…
		Patterns.push_back(Pat);
		return true;
		}
		if (!(-Imm & ~0x00ffffff) && (-Imm & 0x00ff0000) && (-Imm & 0x00000fff)) {
		Patterns.push_back(NPat);
		return true;
		}
		return false;
		};

		// Match (ADD/SUBW WN (MOVi32imm <24-bit>)) ->
		// (ADD/SUBW (ADD/SUBW WN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchW = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::MOVi32imm))
		return false;
		unsigned Imm =
		MRI.getUniqueVRegDef(AddSubOprd.getReg())->getOperand(1).getImm();
		return MatchImm(Imm, Pat, NPat);
		};

		// Match (ADD/SUBX XN (SUBREG_TO_REG (MOVi32imm <24-bit>))) ->
		// (ADD/SUBX (ADD/SUBX XN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchXStR = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::SUBREG_TO_REG))
		return false;
		MachineInstr &SubToReg = *MRI.getUniqueVRegDef(AddSubOprd.getReg());
		MachineOperand &SubToRegOprd = SubToReg.getOperand(2);
		if (!canCombine(MBB, SubToRegOprd, AArch64::MOVi32imm))
		return false;
		unsigned Imm =
		MRI.getUniqueVRegDef(SubToRegOprd.getReg())->getOperand(1).getImm();
		return MatchImm(Imm, Pat, NPat);
		};

		// Match (ADD/SUBX XN (MOVi64imm <24-bit>)) ->
		// (ADD/SUBX (ADD/SUBX XN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchXM64 = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::MOVi64imm))
		return false;
		unsigned Imm =
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be a uint64_t? dmgreen: Should this be a uint64_t?
		MRI.getUniqueVRegDef(AddSubOprd.getReg())->getOperand(1).getImm();
		return MatchImm(Imm, Pat, NPat);
		};

		switch (Opc) {
		default:
		break;
		case AArch64::ADDWrr:
		Found \|= MatchW(1, MCP::ADDW_MOVi32imm_OP1, MCP::ADDW_negMOVi32imm_OP1);
		Found \|= MatchW(2, MCP::ADDW_MOVi32imm_OP2, MCP::ADDW_negMOVi32imm_OP2);
		break;
		case AArch64::ADDXrr:
		Found \|= MatchXM64(1, MCP::ADDX_MOVi64imm_OP1, MCP::ADDX_negMOVi64imm_OP1);
		Found \|= MatchXM64(2, MCP::ADDX_MOVi64imm_OP2, MCP::ADDX_negMOVi64imm_OP2);
		Found \|= MatchXStR(1, MCP::ADDX_StR_MOVi32imm_OP1,
		MCP::ADDX_StR_negMOVi32imm_OP1);
		Found \|= MatchXStR(2, MCP::ADDX_StR_MOVi32imm_OP2,
		MCP::ADDX_StR_negMOVi32imm_OP2);
		break;
		case AArch64::SUBWrr:
		Found \|= MatchW(1, MCP::SUBW_MOVi32imm_OP1, MCP::SUBW_negMOVi32imm_OP1);
		Found \|= MatchW(2, MCP::SUBW_MOVi32imm_OP2, MCP::SUBW_negMOVi32imm_OP2);
		break;
		case AArch64::SUBXrr:
		Found \|= MatchXM64(1, MCP::SUBX_MOVi64imm_OP1, MCP::SUBX_negMOVi64imm_OP1);
		Found \|= MatchXM64(2, MCP::SUBX_MOVi64imm_OP2, MCP::SUBX_negMOVi64imm_OP2);
		Found \|= MatchXStR(1, MCP::SUBX_StR_MOVi32imm_OP1,
		MCP::SUBX_StR_negMOVi32imm_OP1);
		Found \|= MatchXStR(2, MCP::SUBX_StR_MOVi32imm_OP2,
		MCP::SUBX_StR_negMOVi32imm_OP2);
		break;
		}
		return Found;
		}

/// Floating-Point Support		/// Floating-Point Support

/// Find instructions that can be turned into madd.		/// Find instructions that can be turned into madd.
static bool getFMAPatterns(MachineInstr &Root,		static bool getFMAPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns) {		SmallVectorImpl<MachineCombinerPattern> &Patterns) {

if (!isCombineInstrCandidateFP(Root))		if (!isCombineInstrCandidateFP(Root))
return false;		return false;
▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
/// pattern evaluator stops checking as soon as it finds a faster sequence.		/// pattern evaluator stops checking as soon as it finds a faster sequence.

bool AArch64InstrInfo::getMachineCombinerPatterns(		bool AArch64InstrInfo::getMachineCombinerPatterns(
MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,		MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const {		bool DoRegPressureReduce) const {
// Integer patterns		// Integer patterns
if (getMaddPatterns(Root, Patterns))		if (getMaddPatterns(Root, Patterns))
return true;		return true;
		if (getAddSub24Patterns(Root, Patterns))
		return true;
// Floating point patterns		// Floating point patterns
if (getFMULPatterns(Root, Patterns))		if (getFMULPatterns(Root, Patterns))
return true;		return true;
if (getFMAPatterns(Root, Patterns))		if (getFMAPatterns(Root, Patterns))
return true;		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
DoRegPressureReduce);		DoRegPressureReduce);
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	MachineInstrBuilder MIB =
.addReg(SrcReg0, getKillRegState(Src0IsKill))		.addReg(SrcReg0, getKillRegState(Src0IsKill))
.addReg(SrcReg1, getKillRegState(Src1IsKill))		.addReg(SrcReg1, getKillRegState(Src1IsKill))
.addReg(VR);		.addReg(VR);
// Insert the MADD		// Insert the MADD
InsInstrs.push_back(MIB);		InsInstrs.push_back(MIB);
return MUL;		return MUL;
}		}

		/// genAddSub24BitImm - Creates two (ADD\|SUB)(W\|X)ri instructions that take the
		/// high and low bits respectively of a 24-bit immediate. Constrains the
		/// register class as needed. Adds the new instructions to the insert list and
		/// returns the move immediate instruction pointer so that the caller add it to
		/// the delete list.
		/// \param MF Containing MachineFunction
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param ImmInst is the MOVi(32\|64)imm instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param Imm is the immediate value which uses at least 13-bits and at most
		/// 24-bits
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *
		genAddSub24BitImm(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		MachineInstr &ImmInst, unsigned IdxRootOpd, unsigned Imm,
		unsigned NewOpc, const TargetRegisterClass *RC,
		SmallVectorImpl<MachineInstr *> &InsInstrs) {
		unsigned ImmHi = (Imm >> 12) & 0x0fff, ImmLo = Imm & 0x0fff;
		unsigned IdxOtherOpd = IdxRootOpd == 1 ? 2 : 1;
		Register ResultReg = Root.getOperand(0).getReg();
		Register ImmReg = Root.getOperand(IdxRootOpd).getReg();
		bool ImmIsKill = Root.getOperand(IdxRootOpd).isKill();
		Register SrcReg = Root.getOperand(IdxOtherOpd).getReg();
		bool SrcIsKill = Root.getOperand(IdxOtherOpd).isKill();

		if (Register::isVirtualRegister(ResultReg))
		MRI.constrainRegClass(ResultReg, RC);
		if (Register::isVirtualRegister(ImmReg))
		MRI.constrainRegClass(ImmReg, RC);
		if (Register::isVirtualRegister(SrcReg))
		MRI.constrainRegClass(SrcReg, RC);

		MachineInstrBuilder MIB1 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(NewOpc), ImmReg)
		.addReg(SrcReg, getKillRegState(SrcIsKill))
		.addImm(ImmHi)
		.addImm(12);
		MachineInstrBuilder MIB2 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(NewOpc), ResultReg)
		.addReg(ImmReg, getKillRegState(ImmIsKill))
		.addImm(ImmLo)
		.addImm(0);
		InsInstrs.push_back(MIB1);
		InsInstrs.push_back(MIB2);
		return &ImmInst;
		}

		/// genAddSubMovImm - Generate two ADD/SUB immediate instructions from an
		/// ADD/SUB instruction has a 24-bit value moved into one of the operands. This
		/// reduces the final assembly when the 24-bit immediate would have required two
		/// MOV immediate instructions.
		/// This function extracts the move immediate instruction then delegates work to
		/// genAddSub24BitImm.
		/// \example
		/// \code
		/// I = MOVi(32\|64)imm N:<24-bit imm>
		/// V = (ADD\|SUB)(W\|X)rr Rn I
		/// ==> Tmp = (ADD\|SUB)(W\|X)rr Rn N:<23:12> lsl.12
		/// ==> V = (ADD\|SUB)(W\|X)rr Rn N:<11:0> lsl.0
		/// \endcode
		/// \param MF Containing MachineFunction
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param Negate is true if the immediate must be negated to become 24-bits
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *
		genAddSubMovImm(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		unsigned IdxRootOpd, unsigned NewOpc,
		const TargetRegisterClass *RC, bool Negate,
		SmallVectorImpl<MachineInstr *> &InsInstrs) {
		MachineInstr &ImmInst = *MRI.getVRegDef(Root.getOperand(IdxRootOpd).getReg());
		unsigned Imm = ImmInst.getOperand(1).getImm();
		if (Negate)
		Imm = -Imm;
		return genAddSub24BitImm(MF, MRI, TII, Root, ImmInst, IdxRootOpd, Imm, NewOpc,
		RC, InsInstrs);
		}

		/// genAddSubStR - Generate two ADD/SUB immediate instructions from an ADD/SUB
		/// instruction has a 24-bit value moved into one of the operands with an
		/// intermediate SUBREG_TO_REG step. This reduces the final assembly when the
		/// 24-bit immediate would have required two MOV immediate instructions.
		/// This function extracts the SUBREG_TO_REG and move immediate instructions,
		/// deletes the SUBREG_TO_REG, then delegates work to genAddSub24BitImm.
		/// \example
		/// \code
		/// I = MOVi32imm N:<24-bit imm>
		/// S = SUBREG_TO_REG I
		/// V = (ADD\|SUB)Xrr Rn S
		/// ==> Tmp = (ADD\|SUB)Xrr Rn N:<23:12> lsl.12
		/// ==> V = (ADD\|SUB)Xrr Rn N:<11:0> lsl.0
		/// \endcode
		/// \param MF Containing MachineFunction
		dmgreenUnsubmitted Not Done Reply Inline Actions LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like MF/MRI/TII. dmgreen: LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like…
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param Negate is true if the immediate must be negated to become 24-bits
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \param [out] DelInstrs is a vector that will contain the SUBREG_TO_REG
		/// instruction that could be removed
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *genAddSubStR(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII,
		MachineInstr &Root, unsigned IdxRootOpd,
		unsigned NewOpc,
		const TargetRegisterClass *RC, bool Negate,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs) {
		MachineInstr &SubToReg =
		*MRI.getVRegDef(Root.getOperand(IdxRootOpd).getReg());
		MachineInstr &ImmInst = *MRI.getVRegDef(SubToReg.getOperand(2).getReg());
		DelInstrs.push_back(&SubToReg);
		unsigned Imm = ImmInst.getOperand(1).getImm();
		if (Negate)
		Imm = -Imm;
		return genAddSub24BitImm(MF, MRI, TII, Root, ImmInst, IdxRootOpd, Imm, NewOpc,
		RC, InsInstrs);
		}

/// When getMachineCombinerPatterns() finds potential patterns,		/// When getMachineCombinerPatterns() finds potential patterns,
/// this function generates the instructions that could replace the		/// this function generates the instructions that could replace the
/// original code sequence		/// original code sequence
void AArch64InstrInfo::genAlternativeCodeSequence(		void AArch64InstrInfo::genAlternativeCodeSequence(
MachineInstr &Root, MachineCombinerPattern Pattern,		MachineInstr &Root, MachineCombinerPattern Pattern,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	MachineInstrBuilder MIB1 =
.addReg(ZeroReg)		.addReg(ZeroReg)
.addImm(Encoding);		.addImm(Encoding);
InsInstrs.push_back(MIB1);		InsInstrs.push_back(MIB1);
InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));		InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR, RC);		MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR, RC);
break;		break;
}		}

		case MachineCombinerPattern::ADDW_MOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_MOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_MOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_MOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_MOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_MOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_MOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_MOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;

case MachineCombinerPattern::MULADDv8i8_OP1:		case MachineCombinerPattern::MULADDv8i8_OP1:
Opc = AArch64::MLAv8i8;		Opc = AArch64::MLAv8i8;
RC = &AArch64::FPR64RegClass;		RC = &AArch64::FPR64RegClass;
MUL = genFusedMultiplyAcc(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC);		MUL = genFusedMultiplyAcc(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC);
break;		break;
case MachineCombinerPattern::MULADDv8i8_OP2:		case MachineCombinerPattern::MULADDv8i8_OP2:
Opc = AArch64::MLAv8i8;		Opc = AArch64::MLAv8i8;
RC = &AArch64::FPR64RegClass;		RC = &AArch64::FPR64RegClass;
▲ Show 20 Lines • Show All 2,216 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -O0 -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -verify-machineinstrs %s \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions I don't think this needs -O0 dmgreen: I don't think this needs -O0

				---
				name: reject_16bit
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: reject_16bit
				; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0
				; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 4369
				; CHECK-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[COPY]], killed [[MOVi32imm]]
				; CHECK-NEXT: $w0 = COPY [[ADDWrr]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 4369
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: reject_16bit_neg
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: reject_16bit_neg
				; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0
				; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm -4369
				; CHECK-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[COPY]], killed [[MOVi32imm]]
				; CHECK-NEXT: $w0 = COPY [[ADDWrr]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -4369
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: reject_16bit_X
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: reject_16bit_X
				; CHECK: [[COPY:%[0-9]+]]:gpr64 = COPY $x0
				; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 4369
				; CHECK-NEXT: [[SUBREG_TO_REG:%[0-9]+]]:gpr64 = SUBREG_TO_REG 0, killed [[MOVi32imm]], %subreg.sub_32
				; CHECK-NEXT: [[ADDXrr:%[0-9]+]]:gpr64 = ADDXrr [[COPY]], killed [[SUBREG_TO_REG]]
				; CHECK-NEXT: $x0 = COPY [[ADDXrr]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 4369
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = ADDXrr %0, killed %2
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: reject_25bit
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: reject_25bit
				; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0
				; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 17895697
				; CHECK-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[COPY]], killed [[MOVi32imm]]
				; CHECK-NEXT: $w0 = COPY [[ADDWrr]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 17895697
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_flip
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = ADDWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_flip_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = ADDWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addl
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = ADDXrr %0, killed %2
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_flip
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = ADDXrr killed %2, %0
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = ADDXrr %0, killed %1
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_flip_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = ADDXrr killed %1, %0
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...


				---
				name: subi
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = SUBWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_flip
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = SUBWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = SUBWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_flip_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = SUBWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subl
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = SUBXrr %0, killed %2
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_flip
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = SUBXrr killed %2, %0
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = SUBXrr %0, killed %1
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_flip_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this might be negative of the correct result. (But equally I don't think that SUBXrr will be present at this point in the pipeline. It will be speculatively emitted as a SUBSXrr as that may be used as a compare). dmgreen: I think this might be negative of the correct result. (But equally I don't think that SUBXrr…
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = SUBXrr killed %1, %0
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-imm-reject-loop.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -O0 -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -verify-machineinstrs %s \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions This one might be better as a .ll file. That way we test the entire backend, and can see obvious regressions when the instruction count increases. dmgreen: This one might be better as a .ll file. That way we test the entire backend, and can see…

				# Check to ensure that an add/sub with a 24-bit immediate won't be turned into
				# two addi/subi instructions when inside the loop.
				# The important check is that the following MIR will still generate the following:
				# [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 9961536
				# [[ADDWrr:%[0-9]+]]:gpr32 = nsw ADDWrr killed %13, killed [[MOVi32imm]]


				--- \|
				define dso_local void @reject_inside_loop(i32 %n, i32* nocapture readonly %x, i32* noalias nocapture writeonly %y) local_unnamed_addr #0 {
				entry:
				%cmp7 = icmp sgt i32 %n, 0
				br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %n to i64
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv2 = phi i32* [ %scevgep3, %for.body ], [ %x, %for.body.preheader ]
				%lsr.iv1 = phi i32* [ %scevgep, %for.body ], [ %y, %for.body.preheader ]
				%lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ %wide.trip.count, %for.body.preheader ]
				%0 = load i32, i32* %lsr.iv2, align 4, !tbaa !8
				%add = add nsw i32 %0, 9961536
				store i32 %add, i32* %lsr.iv1, align 4, !tbaa !8
				%lsr.iv.next = add nsw i64 %lsr.iv, -1
				%scevgep = getelementptr i32, i32* %lsr.iv1, i64 1
				%scevgep3 = getelementptr i32, i32* %lsr.iv2, i64 1
				%exitcond.not = icmp eq i64 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !12
				}

				!8 = !{!9, !9, i64 0}
				!9 = !{!"int", !10, i64 0}
				!10 = !{!"omnipotent char", !11, i64 0}
				!11 = !{!"Simple C/C++ TBAA"}
				!12 = distinct !{!12, !13, !14}
				!13 = !{!"llvm.loop.mustprogress"}
				!14 = !{!"llvm.loop.unroll.disable"}

				...
				---
				name: reject_inside_loop
				tracksRegLiveness: true
				body: \|
				; CHECK-LABEL: name: reject_inside_loop
				; CHECK: bb.0.entry:
				; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
				; CHECK-NEXT: liveins: $w0, $x1, $x2
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr64 = COPY $x2
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr64 = COPY $x1
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY2]], 1, 0, implicit-def $nzcv
				; CHECK-NEXT: Bcc 11, %bb.2, implicit $nzcv
				; CHECK-NEXT: B %bb.1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.1.for.body.preheader:
				; CHECK-NEXT: successors: %bb.3(0x80000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[ORRWrs:%[0-9]+]]:gpr32 = ORRWrs $wzr, [[COPY2]], 0
				; CHECK-NEXT: [[SUBREG_TO_REG:%[0-9]+]]:gpr64all = SUBREG_TO_REG 0, killed [[ORRWrs]], %subreg.sub_32
				; CHECK-NEXT: B %bb.3
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.2.for.cond.cleanup:
				; CHECK-NEXT: RET_ReallyLR
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.3.for.body:
				; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.3(0x40000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr64sp = PHI [[COPY1]], %bb.1, %7, %bb.3
				; CHECK-NEXT: [[PHI1:%[0-9]+]]:gpr64sp = PHI [[COPY]], %bb.1, %9, %bb.3
				; CHECK-NEXT: [[PHI2:%[0-9]+]]:gpr64sp = PHI [[SUBREG_TO_REG]], %bb.1, %11, %bb.3
				; CHECK-NEXT: early-clobber %12:gpr64sp, %13:gpr32 = LDRWpost [[PHI]], 4 :: (load (s32) from %ir.lsr.iv2, !tbaa !0)
				; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 9961536
				; CHECK-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = nsw ADDWrr killed %13, killed [[MOVi32imm]]
				; CHECK-NEXT: early-clobber %16:gpr64sp = STRWpost killed [[ADDWrr]], [[PHI1]], 4 :: (store (s32) into %ir.lsr.iv1, !tbaa !0)
				; CHECK-NEXT: [[SUBSXri:%[0-9]+]]:gpr64 = nsw SUBSXri [[PHI2]], 1, 0, implicit-def dead $nzcv
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr64all = COPY [[SUBSXri]]
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:gpr64all = COPY %16
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:gpr64all = COPY %12
				; CHECK-NEXT: CBZX [[SUBSXri]], %bb.2
				; CHECK-NEXT: B %bb.3
				bb.0.entry:
				successors: %bb.1, %bb.2
				liveins: $w0, $x1, $x2
				%9:gpr64 = COPY $x2
				%8:gpr64 = COPY $x1
				%7:gpr32common = COPY $w0
				%10:gpr32 = SUBSWri %7, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3
				%11:gpr32 = ORRWrs $wzr, %7, 0
				%0:gpr64all = SUBREG_TO_REG 0, killed %11, %subreg.sub_32
				B %bb.3

				bb.2.for.cond.cleanup:
				RET_ReallyLR

				bb.3.for.body:
				successors: %bb.2, %bb.3
				%1:gpr64sp = PHI %8, %bb.1, %6, %bb.3
				%2:gpr64sp = PHI %9, %bb.1, %5, %bb.3
				%3:gpr64sp = PHI %0, %bb.1, %4, %bb.3
				early-clobber %12:gpr64sp, %13:gpr32 = LDRWpost %1, 4 :: (load (s32) from %ir.lsr.iv2, !tbaa !8)
				%14:gpr32 = MOVi32imm 9961536
				%15:gpr32 = nsw ADDWrr killed %13, killed %14
				early-clobber %16:gpr64sp = STRWpost killed %15, %2, 4 :: (store (s32) into %ir.lsr.iv1, !tbaa !8)
				%17:gpr64 = nsw SUBSXri %3, 1, 0, implicit-def dead $nzcv
				%4:gpr64all = COPY %17
				%5:gpr64all = COPY %16
				%6:gpr64all = COPY %12
				CBZX %17, %bb.2
				B %bb.3
				...

llvm/test/CodeGen/AArch64/addsub.ll

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
store i64 %newval64, i64* @var_i64		store i64 %newval64, i64* @var_i64

ret void		ret void
}		}

define i64 @add_two_parts_imm_i64(i64 %a) {		define i64 @add_two_parts_imm_i64(i64 %a) {
; CHECK-LABEL: add_two_parts_imm_i64:		; CHECK-LABEL: add_two_parts_imm_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i64 %a, 11183445		%b = add i64 %a, 11183445
ret i64 %b		ret i64 %b
}		}

define i32 @add_two_parts_imm_i32(i32 %a) {		define i32 @add_two_parts_imm_i32(i32 %a) {
; CHECK-LABEL: add_two_parts_imm_i32:		; CHECK-LABEL: add_two_parts_imm_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i32 %a, 11183445		%b = add i32 %a, 11183445
ret i32 %b		ret i32 %b
}		}

define i64 @add_two_parts_imm_i64_neg(i64 %a) {		define i64 @add_two_parts_imm_i64_neg(i64 %a) {
; CHECK-LABEL: add_two_parts_imm_i64_neg:		; CHECK-LABEL: add_two_parts_imm_i64_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-42325		; CHECK-NEXT: sub x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk x8, #65365, lsl #16		; CHECK-NEXT: sub x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i64 %a, -11183445		%b = add i64 %a, -11183445
ret i64 %b		ret i64 %b
}		}

define i32 @add_two_parts_imm_i32_neg(i32 %a) {		define i32 @add_two_parts_imm_i32_neg(i32 %a) {
; CHECK-LABEL: add_two_parts_imm_i32_neg:		; CHECK-LABEL: add_two_parts_imm_i32_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #23211		; CHECK-NEXT: sub w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #65365, lsl #16		; CHECK-NEXT: sub w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i32 %a, -11183445		%b = add i32 %a, -11183445
ret i32 %b		ret i32 %b
}		}

define i64 @sub_two_parts_imm_i64(i64 %a) {		define i64 @sub_two_parts_imm_i64(i64 %a) {
; CHECK-LABEL: sub_two_parts_imm_i64:		; CHECK-LABEL: sub_two_parts_imm_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-42325		; CHECK-NEXT: sub x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk x8, #65365, lsl #16		; CHECK-NEXT: sub x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i64 %a, 11183445		%b = sub i64 %a, 11183445
ret i64 %b		ret i64 %b
}		}

define i32 @sub_two_parts_imm_i32(i32 %a) {		define i32 @sub_two_parts_imm_i32(i32 %a) {
; CHECK-LABEL: sub_two_parts_imm_i32:		; CHECK-LABEL: sub_two_parts_imm_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #23211		; CHECK-NEXT: sub w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #65365, lsl #16		; CHECK-NEXT: sub w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i32 %a, 11183445		%b = sub i32 %a, 11183445
ret i32 %b		ret i32 %b
}		}

define i64 @sub_two_parts_imm_i64_neg(i64 %a) {		define i64 @sub_two_parts_imm_i64_neg(i64 %a) {
; CHECK-LABEL: sub_two_parts_imm_i64_neg:		; CHECK-LABEL: sub_two_parts_imm_i64_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i64 %a, -11183445		%b = sub i64 %a, -11183445
ret i64 %b		ret i64 %b
}		}

define i32 @sub_two_parts_imm_i32_neg(i32 %a) {		define i32 @sub_two_parts_imm_i32_neg(i32 %a) {
; CHECK-LABEL: sub_two_parts_imm_i32_neg:		; CHECK-LABEL: sub_two_parts_imm_i32_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i32 %a, -11183445		%b = sub i32 %a, -11183445
ret i32 %b		ret i32 %b
}		}

define void @testing() {		define void @testing() {
; CHECK-LABEL: testing:		; CHECK-LABEL: testing:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/large-offset-gep.ll

	Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	}			}

	declare i32 @__FrameHandler(...)			declare i32 @__FrameHandler(...)

	define void @test5([65536 x i32]** %s, i32 %n) {			define void @test5([65536 x i32]** %s, i32 %n) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ldr x9, [x0]			; CHECK-NEXT: ldr x9, [x0]
	; CHECK-NEXT: mov w10, #14464
	; CHECK-NEXT: movk w10, #1, lsl #16
	; CHECK-NEXT: mov w8, wzr			; CHECK-NEXT: mov w8, wzr
	; CHECK-NEXT: add x9, x9, x10			; CHECK-NEXT: add x9, x9, #19, lsl #12 // =77824
				; CHECK-NEXT: add x9, x9, #2176
	; CHECK-NEXT: cmp w8, w1			; CHECK-NEXT: cmp w8, w1
	; CHECK-NEXT: b.ge .LBB4_2			; CHECK-NEXT: b.ge .LBB4_2
	; CHECK-NEXT: .LBB4_1: // %while_body			; CHECK-NEXT: .LBB4_1: // %while_body
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: add w10, w8, #1			; CHECK-NEXT: add w10, w8, #1
	; CHECK-NEXT: stp w10, w8, [x9]			; CHECK-NEXT: stp w10, w8, [x9]
	; CHECK-NEXT: mov w8, w10			; CHECK-NEXT: mov w8, w10
	; CHECK-NEXT: cmp w8, w1			; CHECK-NEXT: cmp w8, w1
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 397195

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-imm-reject-loop.mir

llvm/test/CodeGen/AArch64/addsub.ll

llvm/test/Transforms/CodeGenPrepare/AArch64/large-offset-gep.ll

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.
AbandonedPublic