This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineCombinerPattern.h
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3
AArch64InstrInfo.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
2
aarch64-combine-addsub-24bit-imm.mir
-
addsub.ll
-
and-mask-removal.ll
-
arm64-srl-and.ll
-
fast-isel-gep.ll
-
nontemporal.ll
-
srem-vector-lkk.ll
-
Transforms/CodeGenPrepare/AArch64/
-
CodeGenPrepare/
-
AArch64/
-
large-offset-gep.ll

Differential D116468

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.
AbandonedPublic

Authored by red1bluelost on Dec 31 2021, 5:58 PM.

Download Raw Diff

Details

Reviewers

dmgreen
asavonic
SjoerdMeijer
paquette
benshi001

Summary

This patch combines improves add/sub instructions that have 24-bit
immediates by turning the MOV-MOV-ADD/SUB into ADDI/SUBI-ADDI/SUBI using the
high and low 12-bit portions of the immediate.

For example, the following code:

int addi(int A) { return A + 0x111333; }

results in the assembly:

addi:                        // Without combine
        mov w8, #4915
        mov w8, #17, lsl #16
        add w0, w0, w8
        ret
addi:                        // With combine
        add w8, w0, #273, lsl #12
        add w0, w8, #819
        ret

This was implemented by adding patterns to MachineCombinerPattern and
handling the patterns in AArch64InstrInfo::genAlternativeCodeSequence and
AArch64InstrInfo::getMachineCombinerPatterns. The patterns match for scenarios
where the moved-immediate is in operand 1 or 2 of the ADD/SUB, the immediate can
be negated to produce a 24-bit immediate which will change the ADD to SUB and
SUB to ADD, and where a SUBREG_TO_REG is used to promote the i32 register to
a i64 register.

I originally implemented this combine through a TableGen Pat however this
caused some of the MADD combines to fail. With the ADD/SUB combine residing
in the MachineCombiner, MADD combines can be prioritizes when both patterns
exist.

If this design is accepted, ADDS/SUBS patterns could be added in another patch.

Testing:

Each MachineCombinerPattern is tested aarch64-combine-addsub-24bit-imm.mir,
a new file.

The addsub.ll test file has the typical scenarios in LLVM IR.

Other AArch64 test files had to be updated since this new combine was encountered
in those tests.

I ran ninja check-all on the code.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

red1bluelost created this revision.Dec 31 2021, 5:58 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptDec 31 2021, 5:58 PM

red1bluelost requested review of this revision.Dec 31 2021, 5:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2021, 5:58 PM

red1bluelost edited the summary of this revision. (Show Details)Dec 31 2021, 6:01 PM

Harbormaster completed remote builds in B141158: Diff 396842.Dec 31 2021, 6:41 PM

Oh interesting. This is similar to D111034, but that was reverted again. I'm not sure why, apparently MIPeephole optimizations are all too easy to get wrong.

The problem with doing this is when it is done in a loop. Something like this example: https://godbolt.org/z/e5f4hWGcq, where preferably the loop invariant mov can be hoisted out of the loop, leaving a single add. Can we make sure there is a test case for that example, and try and guard against it?

Also, I think a ADD+MOVi16 might be slightly better than a ADD+ADD. (As in - the MOV is a 16bit imm that can be materialized with a single instruction). The two ADDs might make a longer critical path. It's if the MOV pseudo would need multiple instructions that the add becomes beneficial.

Thanks Dave! I didn't know about all that.

I'll give those a look, add some test cases, and update this patch if it can work out.

Addresses feedback for patch.

A test file, aarch64-combine-addsub-imm-reject-loop.mir, was added to check
if the ADD/SUB combine would affect loop invariants hoisting, showing a regression.
To fix the regression, a CombinerObjective was added for these
MachineCombinerPatterns that they must not exist in a loop. During machine
combining, these patterns will check if the Root is inside a loop, skipping the
combine if that condition holds.

To address the priority to keep 16-bit immediates as MOV-ADD, a few tests were
added to aarch64-combine-addsub-24bit-imm.mir and code modifications. Now, 24-bit
immediates must use bits 23-16 so that it guarentees the combine only reduces
MOV-MOV-ADD to ADDI-ADDI.

Forgot one of the files in the update.

Sorry, I'm getting used to patches and arcanist.

Harbormaster completed remote builds in B141428: Diff 397195.Jan 3 2022, 9:52 PM

dmgreen added a reviewer: benshi001.Jan 5 2022, 8:39 AM

dmgreen added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
4795	Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use expandMOVImm for that, and check the number of instructions. The rules for what makes a single instruction are difficult to represent simply, and can be more than a 16bit imm. As in https://godbolt.org/z/KjKhMjb7v.
4838	Should this be a uint64_t?
5559	LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like MF/MRI/TII.
llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir
3	I don't think this needs -O0
344	I think this might be negative of the correct result. (But equally I don't think that SUBXrr will be present at this point in the pipeline. It will be speculatively emitted as a SUBSXrr as that may be used as a compare).
llvm/test/CodeGen/AArch64/aarch64-combine-addsub-imm-reject-loop.mir
2 ↗	(On Diff #397195)	This one might be better as a .ll file. That way we test the entire backend, and can see obvious regressions when the instruction count increases.

In D116468#3217217, @dmgreen wrote:

Oh interesting. This is similar to D111034, but that was reverted again. I'm not sure why, apparently MIPeephole optimizations are all too easy to get wrong.

The problem with doing this is when it is done in a loop. Something like this example: https://godbolt.org/z/e5f4hWGcq, where preferably the loop invariant mov can be hoisted out of the loop, leaving a single add. Can we make sure there is a test case for that example, and try and guard against it?

Also, I think a ADD+MOVi16 might be slightly better than a ADD+ADD. (As in - the MOV is a 16bit imm that can be materialized with a single instruction). The two ADDs might make a longer critical path. It's if the MOV pseudo would need multiple instructions that the add becomes beneficial.

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

In D116468#3230617, @benshi001 wrote:

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

Ok. I will try to see if I can find the bug with the Peephole if possible. I have been testing it this weekend through the Linaro Docker container mentioned in D111034 but using QEMU emulation of AArch64. It seems like it can replicate the aarch64-linux host environment but is just really slow. (I don't have an aarch64 host either).

I'll also see about finishing up the MachineCombine approach and testing it too, incase it would be easier to just finish this approach.

If I manage to fix the Peephole approach, should I edit D111034? Or should I start a new patch review for the Peephole?

In D116468#3233108, @red1bluelost wrote:

In D116468#3230617, @benshi001 wrote:

In D116468#3228702, @red1bluelost wrote:

In D116468#3224907, @dmgreen wrote:

In D116468#3224507, @benshi001 wrote:

I am quite sorry for that. Though make check-all passed and everything was good on amd64, the clang built into aarch64 ELF and running on aarch64-linux crashed. And I have no aarch64 host machine to debug that (although I guess it should be a minor bug).

Oh I see. Sorry for not seeing that message. I hadn't realized it was reverted.

I have no strong opinion whether this is best done as an AArch64MIPeephole or in MachineCombine. They are probably both fine for this kind of thing. Whichever @red1bluelost thinks will be easiest to extend to SUBS instructions that write nzcv sounds OK to me, if the goal here is to address https://github.com/llvm/llvm-project/issues/51482.

Either method (Peephole or MachineCombine) would be easy to extend for the SUBS instructions. I think the Peephole approach would be better, it seems more encapsulated than my approach.

What is the status of D111034 @benshi001, is it likely to be committed or is that bug still holding it up? The buildbot logs don't appear anymore so I can't see where that bug happened. Could that bug also be present with my MachineCombine patch?

@red1bluelost, you are appreciated to continue this optimization along with the peephole approach, if you are sure it no longer crashes on aarch64-linux host. As I have explained, I have no aarch64-linux host and can not gurantee that. Thanks! I am glad if anybody can add this optmization to the main branch ASAP.

Ok. I will try to see if I can find the bug with the Peephole if possible. I have been testing it this weekend through the Linaro Docker container mentioned in D111034 but using QEMU emulation of AArch64. It seems like it can replicate the aarch64-linux host environment but is just really slow. (I don't have an aarch64 host either).

I'll also see about finishing up the MachineCombine approach and testing it too, incase it would be easier to just finish this approach.

If I manage to fix the Peephole approach, should I edit D111034? Or should I start a new patch review for the Peephole?

I suggest you start a new patch review request and mention D111034 :)

Since we already have https://reviews.llvm.org/D117429, can we abandon this ?

Abandoning revision in favor of D117429.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineCombinerPattern.h

25 lines

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

338 lines

test/

CodeGen/

AArch64/

aarch64-combine-addsub-24bit-imm.mir

281 lines

40 lines

20 lines

4 lines

4 lines

11 lines

38 lines

Transforms/

CodeGenPrepare/

AArch64/

large-offset-gep.ll

54 lines

Diff 396842

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	enum class MachineCombinerPattern {
MULADDWI_OP1,		MULADDWI_OP1,
MULSUBWI_OP1,		MULSUBWI_OP1,
MULADDX_OP1,		MULADDX_OP1,
MULADDX_OP2,		MULADDX_OP2,
MULSUBX_OP1,		MULSUBX_OP1,
MULSUBX_OP2,		MULSUBX_OP2,
MULADDXI_OP1,		MULADDXI_OP1,
MULSUBXI_OP1,		MULSUBXI_OP1,
		// 24-bit imm add/sub patterns matched by the AArch64 machine combiner.
		ADDW_MOVi32imm_OP1,
		ADDW_MOVi32imm_OP2,
		ADDW_negMOVi32imm_OP1,
		ADDW_negMOVi32imm_OP2,
		ADDX_StR_MOVi32imm_OP1,
		ADDX_StR_MOVi32imm_OP2,
		ADDX_StR_negMOVi32imm_OP1,
		ADDX_StR_negMOVi32imm_OP2,
		ADDX_MOVi64imm_OP1,
		ADDX_MOVi64imm_OP2,
		ADDX_negMOVi64imm_OP1,
		ADDX_negMOVi64imm_OP2,
		SUBW_MOVi32imm_OP1,
		SUBW_MOVi32imm_OP2,
		SUBW_negMOVi32imm_OP1,
		SUBW_negMOVi32imm_OP2,
		SUBX_StR_MOVi32imm_OP1,
		SUBX_StR_MOVi32imm_OP2,
		SUBX_StR_negMOVi32imm_OP1,
		SUBX_StR_negMOVi32imm_OP2,
		SUBX_MOVi64imm_OP1,
		SUBX_MOVi64imm_OP2,
		SUBX_negMOVi64imm_OP1,
		SUBX_negMOVi64imm_OP2,
// NEON integers vectors		// NEON integers vectors
MULADDv8i8_OP1,		MULADDv8i8_OP1,
MULADDv8i8_OP2,		MULADDv8i8_OP2,
MULADDv16i8_OP1,		MULADDv16i8_OP1,
MULADDv16i8_OP2,		MULADDv16i8_OP2,
MULADDv4i16_OP1,		MULADDv4i16_OP1,
MULADDv4i16_OP2,		MULADDv4i16_OP2,
MULADDv8i16_OP1,		MULADDv8i16_OP1,
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,765 Lines • ▼ Show 20 Lines	case AArch64::SUBv4i32:
setVFound(AArch64::MULv4i32, 1, MCP::MULSUBv4i32_OP1);		setVFound(AArch64::MULv4i32, 1, MCP::MULSUBv4i32_OP1);
setVFound(AArch64::MULv4i32, 2, MCP::MULSUBv4i32_OP2);		setVFound(AArch64::MULv4i32, 2, MCP::MULSUBv4i32_OP2);
setVFound(AArch64::MULv4i32_indexed, 1, MCP::MULSUBv4i32_indexed_OP1);		setVFound(AArch64::MULv4i32_indexed, 1, MCP::MULSUBv4i32_indexed_OP1);
setVFound(AArch64::MULv4i32_indexed, 2, MCP::MULSUBv4i32_indexed_OP2);		setVFound(AArch64::MULv4i32_indexed, 2, MCP::MULSUBv4i32_indexed_OP2);
break;		break;
}		}
return Found;		return Found;
}		}

		/// getAddSub24Patterns - Find instructions ADD/SUB instructions that have a
		/// 24-bit immediate moved into its operand and change those to make two ADD/SUB
		/// instructions with 12-bit immediates encoded.
		/// \param Root the current instruction to check if it is an ADD/SUB that can be
		/// combined
		/// \param [out] Patterns the list of patterns for the pattern evaluator
		/// \return true iff there is an ADD/SUB that can be combined
		static bool
		getAddSub24Patterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns) {
		unsigned Opc = Root.getOpcode();
		MachineBasicBlock &MBB = *Root.getParent();
		bool Found = false;

		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

		using MCP = MachineCombinerPattern;

		auto MatchImm = [&](unsigned Imm, MCP Pat, MCP NPat) {
		if (!(Imm & ~0x00ffffff) && (Imm & 0x00fff000) && (Imm & 0x00000fff)) {
		Patterns.push_back(Pat);
		dmgreenUnsubmitted Not Done Reply Inline Actions Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use expandMOVImm for that, and check the number of instructions. The rules for what makes a single instruction are difficult to represent simply, and can be more than a 16bit imm. As in https://godbolt.org/z/KjKhMjb7v. dmgreen: Can we generalize this to "Any MOV that will be > 1 instruction"? It may be possible to use…
		return true;
		}
		if (!(-Imm & ~0x00ffffff) && (-Imm & 0x00fff000) && (-Imm & 0x00000fff)) {
		Patterns.push_back(NPat);
		return true;
		}
		return false;
		};

		// Match (ADD/SUBW WN (MOVi32imm <24-bit>)) ->
		// (ADD/SUBW (ADD/SUBW WN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchW = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::MOVi32imm))
		return false;
		unsigned Imm =
		MRI.getUniqueVRegDef(AddSubOprd.getReg())->getOperand(1).getImm();
		return MatchImm(Imm, Pat, NPat);
		};

		// Match (ADD/SUBX XN (SUBREG_TO_REG (MOVi32imm <24-bit>))) ->
		// (ADD/SUBX (ADD/SUBX XN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchXStR = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::SUBREG_TO_REG))
		return false;
		MachineInstr &SubToReg = *MRI.getUniqueVRegDef(AddSubOprd.getReg());
		MachineOperand &SubToRegOprd = SubToReg.getOperand(2);
		if (!canCombine(MBB, SubToRegOprd, AArch64::MOVi32imm))
		return false;
		unsigned Imm =
		MRI.getUniqueVRegDef(SubToRegOprd.getReg())->getOperand(1).getImm();
		return MatchImm(Imm, Pat, NPat);
		};

		// Match (ADD/SUBX XN (MOVi64imm <24-bit>)) ->
		// (ADD/SUBX (ADD/SUBX XN <12-bit> shift.12) <12-bit> shift.0)
		auto MatchXM64 = [&](unsigned Oprd, MCP Pat, MCP NPat) {
		MachineOperand &AddSubOprd = Root.getOperand(Oprd);
		if (!canCombine(MBB, AddSubOprd, AArch64::MOVi64imm))
		return false;
		unsigned Imm =
		MRI.getUniqueVRegDef(AddSubOprd.getReg())->getOperand(1).getImm();
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be a uint64_t? dmgreen: Should this be a uint64_t?
		return MatchImm(Imm, Pat, NPat);
		};

		switch (Opc) {
		default:
		break;
		case AArch64::ADDWrr:
		Found \|= MatchW(1, MCP::ADDW_MOVi32imm_OP1, MCP::ADDW_negMOVi32imm_OP1);
		Found \|= MatchW(2, MCP::ADDW_MOVi32imm_OP2, MCP::ADDW_negMOVi32imm_OP2);
		break;
		case AArch64::ADDXrr:
		Found \|= MatchXM64(1, MCP::ADDX_MOVi64imm_OP1, MCP::ADDX_negMOVi64imm_OP1);
		Found \|= MatchXM64(2, MCP::ADDX_MOVi64imm_OP2, MCP::ADDX_negMOVi64imm_OP2);
		Found \|= MatchXStR(1, MCP::ADDX_StR_MOVi32imm_OP1,
		MCP::ADDX_StR_negMOVi32imm_OP1);
		Found \|= MatchXStR(2, MCP::ADDX_StR_MOVi32imm_OP2,
		MCP::ADDX_StR_negMOVi32imm_OP2);
		break;
		case AArch64::SUBWrr:
		Found \|= MatchW(1, MCP::SUBW_MOVi32imm_OP1, MCP::SUBW_negMOVi32imm_OP1);
		Found \|= MatchW(2, MCP::SUBW_MOVi32imm_OP2, MCP::SUBW_negMOVi32imm_OP2);
		break;
		case AArch64::SUBXrr:
		Found \|= MatchXM64(1, MCP::SUBX_MOVi64imm_OP1, MCP::SUBX_negMOVi64imm_OP1);
		Found \|= MatchXM64(2, MCP::SUBX_MOVi64imm_OP2, MCP::SUBX_negMOVi64imm_OP2);
		Found \|= MatchXStR(1, MCP::SUBX_StR_MOVi32imm_OP1,
		MCP::SUBX_StR_negMOVi32imm_OP1);
		Found \|= MatchXStR(2, MCP::SUBX_StR_MOVi32imm_OP2,
		MCP::SUBX_StR_negMOVi32imm_OP2);
		break;
		}
		return Found;
		}

/// Floating-Point Support		/// Floating-Point Support

/// Find instructions that can be turned into madd.		/// Find instructions that can be turned into madd.
static bool getFMAPatterns(MachineInstr &Root,		static bool getFMAPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns) {		SmallVectorImpl<MachineCombinerPattern> &Patterns) {

if (!isCombineInstrCandidateFP(Root))		if (!isCombineInstrCandidateFP(Root))
return false;		return false;
▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
/// pattern evaluator stops checking as soon as it finds a faster sequence.		/// pattern evaluator stops checking as soon as it finds a faster sequence.

bool AArch64InstrInfo::getMachineCombinerPatterns(		bool AArch64InstrInfo::getMachineCombinerPatterns(
MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,		MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const {		bool DoRegPressureReduce) const {
// Integer patterns		// Integer patterns
if (getMaddPatterns(Root, Patterns))		if (getMaddPatterns(Root, Patterns))
return true;		return true;
		if (getAddSub24Patterns(Root, Patterns))
		return true;
// Floating point patterns		// Floating point patterns
if (getFMULPatterns(Root, Patterns))		if (getFMULPatterns(Root, Patterns))
return true;		return true;
if (getFMAPatterns(Root, Patterns))		if (getFMAPatterns(Root, Patterns))
return true;		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
DoRegPressureReduce);		DoRegPressureReduce);
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	MachineInstrBuilder MIB =
.addReg(SrcReg0, getKillRegState(Src0IsKill))		.addReg(SrcReg0, getKillRegState(Src0IsKill))
.addReg(SrcReg1, getKillRegState(Src1IsKill))		.addReg(SrcReg1, getKillRegState(Src1IsKill))
.addReg(VR);		.addReg(VR);
// Insert the MADD		// Insert the MADD
InsInstrs.push_back(MIB);		InsInstrs.push_back(MIB);
return MUL;		return MUL;
}		}

		/// genAddSub24BitImm - Creates two (ADD\|SUB)(W\|X)ri instructions that take the
		/// high and low bits respectively of a 24-bit immediate. Constrains the
		/// register class as needed. Adds the new instructions to the insert list and
		/// returns the move immediate instruction pointer so that the caller add it to
		/// the delete list.
		/// \param MF Containing MachineFunction
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param ImmInst is the MOVi(32\|64)imm instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param Imm is the immediate value which uses at least 13-bits and at most
		/// 24-bits
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *
		genAddSub24BitImm(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		MachineInstr &ImmInst, unsigned IdxRootOpd, unsigned Imm,
		unsigned NewOpc, const TargetRegisterClass *RC,
		SmallVectorImpl<MachineInstr *> &InsInstrs) {
		unsigned ImmHi = (Imm >> 12) & 0x0fff, ImmLo = Imm & 0x0fff;
		unsigned IdxOtherOpd = IdxRootOpd == 1 ? 2 : 1;
		Register ResultReg = Root.getOperand(0).getReg();
		Register ImmReg = Root.getOperand(IdxRootOpd).getReg();
		bool ImmIsKill = Root.getOperand(IdxRootOpd).isKill();
		Register SrcReg = Root.getOperand(IdxOtherOpd).getReg();
		bool SrcIsKill = Root.getOperand(IdxOtherOpd).isKill();

		if (Register::isVirtualRegister(ResultReg))
		MRI.constrainRegClass(ResultReg, RC);
		if (Register::isVirtualRegister(ImmReg))
		MRI.constrainRegClass(ImmReg, RC);
		if (Register::isVirtualRegister(SrcReg))
		MRI.constrainRegClass(SrcReg, RC);

		MachineInstrBuilder MIB1 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(NewOpc), ImmReg)
		.addReg(SrcReg, getKillRegState(SrcIsKill))
		.addImm(ImmHi)
		.addImm(12);
		MachineInstrBuilder MIB2 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(NewOpc), ResultReg)
		.addReg(ImmReg, getKillRegState(ImmIsKill))
		.addImm(ImmLo)
		.addImm(0);
		InsInstrs.push_back(MIB1);
		InsInstrs.push_back(MIB2);
		return &ImmInst;
		}

		/// genAddSubMovImm - Generate two ADD/SUB immediate instructions from an
		/// ADD/SUB instruction has a 24-bit value moved into one of the operands. This
		/// reduces the final assembly when the 24-bit immediate would have required two
		/// MOV immediate instructions.
		/// This function extracts the move immediate instruction then delegates work to
		/// genAddSub24BitImm.
		/// \example
		/// \code
		/// I = MOVi(32\|64)imm N:<24-bit imm>
		/// V = (ADD\|SUB)(W\|X)rr Rn I
		/// ==> Tmp = (ADD\|SUB)(W\|X)rr Rn N:<23:12> lsl.12
		/// ==> V = (ADD\|SUB)(W\|X)rr Rn N:<11:0> lsl.0
		/// \endcode
		/// \param MF Containing MachineFunction
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param Negate is true if the immediate must be negated to become 24-bits
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *
		genAddSubMovImm(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		unsigned IdxRootOpd, unsigned NewOpc,
		const TargetRegisterClass *RC, bool Negate,
		SmallVectorImpl<MachineInstr *> &InsInstrs) {
		MachineInstr &ImmInst = *MRI.getVRegDef(Root.getOperand(IdxRootOpd).getReg());
		unsigned Imm = ImmInst.getOperand(1).getImm();
		if (Negate)
		Imm = -Imm;
		return genAddSub24BitImm(MF, MRI, TII, Root, ImmInst, IdxRootOpd, Imm, NewOpc,
		RC, InsInstrs);
		}

		/// genAddSubStR - Generate two ADD/SUB immediate instructions from an ADD/SUB
		/// instruction has a 24-bit value moved into one of the operands with an
		/// intermediate SUBREG_TO_REG step. This reduces the final assembly when the
		/// 24-bit immediate would have required two MOV immediate instructions.
		/// This function extracts the SUBREG_TO_REG and move immediate instructions,
		/// deletes the SUBREG_TO_REG, then delegates work to genAddSub24BitImm.
		/// \example
		/// \code
		/// I = MOVi32imm N:<24-bit imm>
		/// S = SUBREG_TO_REG I
		/// V = (ADD\|SUB)Xrr Rn S
		/// ==> Tmp = (ADD\|SUB)Xrr Rn N:<23:12> lsl.12
		/// ==> V = (ADD\|SUB)Xrr Rn N:<11:0> lsl.0
		/// \endcode
		/// \param MF Containing MachineFunction
		dmgreenUnsubmitted Not Done Reply Inline Actions LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like MF/MRI/TII. dmgreen: LLVM tends not to go too heavy on the doxygen comments, especially for obvious parameters like…
		/// \param MRI Register information
		/// \param TII Target information
		/// \param Root is the (ADD\|SUB)(W\|X)rr instruction
		/// \param IdxRootOpd is the index of the operand that has the SUBREG_TO_REG
		/// result
		/// \param NewOpc The opcode for the two (ADD\|SUB)(W\|X)ri instructions
		/// \param RC Register class of operands (ADD\|SUB)(W\|X)ri instructions
		/// \param Negate is true if the immediate must be negated to become 24-bits
		/// \param [out] InsInstrs is a vector of machine instructions and will
		/// contain the generated (ADD\|SUB)(W\|X)ri instructions
		/// \param [out] DelInstrs is a vector that will contain the SUBREG_TO_REG
		/// instruction that could be removed
		/// \return the address of the MOVi(32\|64)imm instruction that could be removed
		static MachineInstr *genAddSubStR(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII,
		MachineInstr &Root, unsigned IdxRootOpd,
		unsigned NewOpc,
		const TargetRegisterClass *RC, bool Negate,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs) {
		MachineInstr &SubToReg =
		*MRI.getVRegDef(Root.getOperand(IdxRootOpd).getReg());
		MachineInstr &ImmInst = *MRI.getVRegDef(SubToReg.getOperand(2).getReg());
		DelInstrs.push_back(&SubToReg);
		unsigned Imm = ImmInst.getOperand(1).getImm();
		if (Negate)
		Imm = -Imm;
		return genAddSub24BitImm(MF, MRI, TII, Root, ImmInst, IdxRootOpd, Imm, NewOpc,
		RC, InsInstrs);
		}

/// When getMachineCombinerPatterns() finds potential patterns,		/// When getMachineCombinerPatterns() finds potential patterns,
/// this function generates the instructions that could replace the		/// this function generates the instructions that could replace the
/// original code sequence		/// original code sequence
void AArch64InstrInfo::genAlternativeCodeSequence(		void AArch64InstrInfo::genAlternativeCodeSequence(
MachineInstr &Root, MachineCombinerPattern Pattern,		MachineInstr &Root, MachineCombinerPattern Pattern,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	MachineInstrBuilder MIB1 =
.addReg(ZeroReg)		.addReg(ZeroReg)
.addImm(Encoding);		.addImm(Encoding);
InsInstrs.push_back(MIB1);		InsInstrs.push_back(MIB1);
InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));		InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR, RC);		MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR, RC);
break;		break;
}		}

		case MachineCombinerPattern::ADDW_MOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_MOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDW_negMOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_MOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_MOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_negMOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_MOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_MOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBWri,
		&AArch64::GPR32spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBW_negMOVi32imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDWri,
		&AArch64::GPR32spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_MOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_MOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP1:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::SUBX_negMOVi64imm_OP2:
		MUL = genAddSubMovImm(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_MOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::ADDX_StR_negMOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_MOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::SUBXri,
		&AArch64::GPR64spRegClass, false, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP1:
		MUL = genAddSubStR(MF, MRI, TII, Root, 1, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;
		case MachineCombinerPattern::SUBX_StR_negMOVi32imm_OP2:
		MUL = genAddSubStR(MF, MRI, TII, Root, 2, AArch64::ADDXri,
		&AArch64::GPR64spRegClass, true, InsInstrs, DelInstrs);
		break;

case MachineCombinerPattern::MULADDv8i8_OP1:		case MachineCombinerPattern::MULADDv8i8_OP1:
Opc = AArch64::MLAv8i8;		Opc = AArch64::MLAv8i8;
RC = &AArch64::FPR64RegClass;		RC = &AArch64::FPR64RegClass;
MUL = genFusedMultiplyAcc(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC);		MUL = genFusedMultiplyAcc(MF, MRI, TII, Root, InsInstrs, 1, Opc, RC);
break;		break;
case MachineCombinerPattern::MULADDv8i8_OP2:		case MachineCombinerPattern::MULADDv8i8_OP2:
Opc = AArch64::MLAv8i8;		Opc = AArch64::MLAv8i8;
RC = &AArch64::FPR64RegClass;		RC = &AArch64::FPR64RegClass;
▲ Show 20 Lines • Show All 2,216 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -O0 -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -verify-machineinstrs %s \| FileCheck %s

				dmgreenUnsubmitted Not Done Reply Inline Actions I don't think this needs -O0 dmgreen: I don't think this needs -O0
				---
				name: addi
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_flip
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = ADDWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = ADDWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addi_flip_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: addi_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = ADDWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: addl
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = ADDXrr %0, killed %2
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_flip
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = ADDXrr killed %2, %0
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = ADDXrr %0, killed %1
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...
				---
				name: addl_flip_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: addl_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = ADDXrr killed %1, %0
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...


				---
				name: subi
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = SUBWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_flip
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[SUBWri:%[0-9]+]]:gpr32common = SUBWri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBWri1:%[0-9]+]]:gpr32common = SUBWri killed [[SUBWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[SUBWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr32 = SUBWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = SUBWrr %0, killed %1
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subi_flip_negate
				body: \|
				bb.0.entry:
				liveins: $w0
				; CHECK-LABEL: name: subi_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK-NEXT: [[ADDWri:%[0-9]+]]:gpr32common = ADDWri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDWri1:%[0-9]+]]:gpr32common = ADDWri killed [[ADDWri]], 3549, 0
				; CHECK-NEXT: $w0 = COPY [[ADDWri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $w0
				%0:gpr32 = COPY $w0
				%1:gpr32 = MOVi32imm -1121757
				%2:gpr32 = SUBWrr killed %1, %0
				$w0 = COPY %2
				RET_ReallyLR implicit $w0
				...
				---
				name: subl
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = SUBXrr %0, killed %2
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_flip
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_flip
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[SUBXri:%[0-9]+]]:gpr64common = SUBXri [[COPY]], 273, 12
				; CHECK-NEXT: [[SUBXri1:%[0-9]+]]:gpr64common = SUBXri killed [[SUBXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[SUBXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr32 = MOVi32imm 1121757
				%2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
				%3:gpr64 = SUBXrr killed %2, %0
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = SUBXrr %0, killed %1
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...
				---
				name: subl_flip_negate
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: subl_flip_negate
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK-NEXT: [[ADDXri:%[0-9]+]]:gpr64common = ADDXri [[COPY]], 273, 12
				; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = ADDXri killed [[ADDXri]], 3549, 0
				; CHECK-NEXT: $x0 = COPY [[ADDXri1]]
				; CHECK-NEXT: RET_ReallyLR implicit $x0
				%0:gpr64 = COPY $x0
				%1:gpr64 = MOVi64imm -1121757
				%2:gpr64 = SUBXrr killed %1, %0
				$x0 = COPY %2
				RET_ReallyLR implicit $x0
				...
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this might be negative of the correct result. (But equally I don't think that SUBXrr will be present at this point in the pipeline. It will be speculatively emitted as a SUBSXrr as that may be used as a compare). dmgreen: I think this might be negative of the correct result. (But equally I don't think that SUBXrr…

llvm/test/CodeGen/AArch64/addsub.ll

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
store i64 %newval64, i64* @var_i64		store i64 %newval64, i64* @var_i64

ret void		ret void
}		}

define i64 @add_two_parts_imm_i64(i64 %a) {		define i64 @add_two_parts_imm_i64(i64 %a) {
; CHECK-LABEL: add_two_parts_imm_i64:		; CHECK-LABEL: add_two_parts_imm_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i64 %a, 11183445		%b = add i64 %a, 11183445
ret i64 %b		ret i64 %b
}		}

define i32 @add_two_parts_imm_i32(i32 %a) {		define i32 @add_two_parts_imm_i32(i32 %a) {
; CHECK-LABEL: add_two_parts_imm_i32:		; CHECK-LABEL: add_two_parts_imm_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i32 %a, 11183445		%b = add i32 %a, 11183445
ret i32 %b		ret i32 %b
}		}

define i64 @add_two_parts_imm_i64_neg(i64 %a) {		define i64 @add_two_parts_imm_i64_neg(i64 %a) {
; CHECK-LABEL: add_two_parts_imm_i64_neg:		; CHECK-LABEL: add_two_parts_imm_i64_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-42325		; CHECK-NEXT: sub x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk x8, #65365, lsl #16		; CHECK-NEXT: sub x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i64 %a, -11183445		%b = add i64 %a, -11183445
ret i64 %b		ret i64 %b
}		}

define i32 @add_two_parts_imm_i32_neg(i32 %a) {		define i32 @add_two_parts_imm_i32_neg(i32 %a) {
; CHECK-LABEL: add_two_parts_imm_i32_neg:		; CHECK-LABEL: add_two_parts_imm_i32_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #23211		; CHECK-NEXT: sub w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #65365, lsl #16		; CHECK-NEXT: sub w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = add i32 %a, -11183445		%b = add i32 %a, -11183445
ret i32 %b		ret i32 %b
}		}

define i64 @sub_two_parts_imm_i64(i64 %a) {		define i64 @sub_two_parts_imm_i64(i64 %a) {
; CHECK-LABEL: sub_two_parts_imm_i64:		; CHECK-LABEL: sub_two_parts_imm_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-42325		; CHECK-NEXT: sub x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk x8, #65365, lsl #16		; CHECK-NEXT: sub x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i64 %a, 11183445		%b = sub i64 %a, 11183445
ret i64 %b		ret i64 %b
}		}

define i32 @sub_two_parts_imm_i32(i32 %a) {		define i32 @sub_two_parts_imm_i32(i32 %a) {
; CHECK-LABEL: sub_two_parts_imm_i32:		; CHECK-LABEL: sub_two_parts_imm_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #23211		; CHECK-NEXT: sub w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #65365, lsl #16		; CHECK-NEXT: sub w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i32 %a, 11183445		%b = sub i32 %a, 11183445
ret i32 %b		ret i32 %b
}		}

define i64 @sub_two_parts_imm_i64_neg(i64 %a) {		define i64 @sub_two_parts_imm_i64_neg(i64 %a) {
; CHECK-LABEL: sub_two_parts_imm_i64_neg:		; CHECK-LABEL: sub_two_parts_imm_i64_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add x8, x0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add x0, x8, #1365
; CHECK-NEXT: add x0, x0, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i64 %a, -11183445		%b = sub i64 %a, -11183445
ret i64 %b		ret i64 %b
}		}

define i32 @sub_two_parts_imm_i32_neg(i32 %a) {		define i32 @sub_two_parts_imm_i32_neg(i32 %a) {
; CHECK-LABEL: sub_two_parts_imm_i32_neg:		; CHECK-LABEL: sub_two_parts_imm_i32_neg:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42325		; CHECK-NEXT: add w8, w0, #2730, lsl #12 // =11182080
; CHECK-NEXT: movk w8, #170, lsl #16		; CHECK-NEXT: add w0, w8, #1365
; CHECK-NEXT: add w0, w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = sub i32 %a, -11183445		%b = sub i32 %a, -11183445
ret i32 %b		ret i32 %b
}		}

define void @testing() {		define void @testing() {
; CHECK-LABEL: testing:		; CHECK-LABEL: testing:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/and-mask-removal.ll

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	ret_false:
ret i1 false		ret i1 false
ret_true:		ret_true:
ret i1 true		ret i1 true
}		}

define zeroext i1 @test16_2(i16 zeroext %x) align 2 {		define zeroext i1 @test16_2(i16 zeroext %x) align 2 {
; CHECK-LABEL: test16_2:		; CHECK-LABEL: test16_2:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: mov w8, #16882		; CHECK-NEXT: add w8, w0, #4, lsl #12 ; =16384
; CHECK-NEXT: mov w9, #40700		; CHECK-NEXT: mov w9, #40700
; CHECK-NEXT: add w8, w0, w8		; CHECK-NEXT: add w8, w8, #498
; CHECK-NEXT: cmp w9, w8, uxth		; CHECK-NEXT: cmp w9, w8, uxth
; CHECK-NEXT: cset w0, hi		; CHECK-NEXT: cset w0, hi
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%0 = add i16 %x, 16882		%0 = add i16 %x, 16882
%1 = icmp ule i16 %0, -24837		%1 = icmp ule i16 %0, -24837
br i1 %1, label %ret_true, label %ret_false		br i1 %1, label %ret_true, label %ret_false
ret_false:		ret_false:
Show All 17 Lines	ret_false:
ret i1 false		ret i1 false
ret_true:		ret_true:
ret i1 true		ret i1 true
}		}

define zeroext i1 @test16_4(i16 zeroext %x) align 2 {		define zeroext i1 @test16_4(i16 zeroext %x) align 2 {
; CHECK-LABEL: test16_4:		; CHECK-LABEL: test16_4:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: mov w8, #29985		; CHECK-NEXT: add w8, w0, #7, lsl #12 ; =28672
; CHECK-NEXT: mov w9, #15676		; CHECK-NEXT: mov w9, #15676
; CHECK-NEXT: add w8, w0, w8		; CHECK-NEXT: add w8, w8, #1313
; CHECK-NEXT: cmp w9, w8, uxth		; CHECK-NEXT: cmp w9, w8, uxth
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%0 = add i16 %x, -35551		%0 = add i16 %x, -35551
%1 = icmp uge i16 %0, 15677		%1 = icmp uge i16 %0, 15677
br i1 %1, label %ret_true, label %ret_false		br i1 %1, label %ret_true, label %ret_false
ret_false:		ret_false:
Show All 17 Lines	ret_false:
ret i1 false		ret i1 false
ret_true:		ret_true:
ret i1 true		ret i1 true
}		}

define zeroext i1 @test16_6(i16 zeroext %x) align 2 {		define zeroext i1 @test16_6(i16 zeroext %x) align 2 {
; CHECK-LABEL: test16_6:		; CHECK-LABEL: test16_6:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: mov w8, #-32194		; CHECK-NEXT: sub w9, w0, #7, lsl #12 ; =28672
; CHECK-NEXT: mov w9, #24320		; CHECK-NEXT: mov w8, #24320
; CHECK-NEXT: add w8, w0, w8		; CHECK-NEXT: sub w9, w9, #3522
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w9, w8
; CHECK-NEXT: cset w0, hi		; CHECK-NEXT: cset w0, hi
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%0 = add i16 %x, -32194		%0 = add i16 %x, -32194
%1 = icmp uge i16 %0, -41215		%1 = icmp uge i16 %0, -41215
br i1 %1, label %ret_true, label %ret_false		br i1 %1, label %ret_true, label %ret_false
ret_false:		ret_false:
ret i1 false		ret i1 false
ret_true:		ret_true:
ret i1 true		ret i1 true
}		}

define zeroext i1 @test16_7(i16 zeroext %x) align 2 {		define zeroext i1 @test16_7(i16 zeroext %x) align 2 {
; CHECK-LABEL: test16_7:		; CHECK-LABEL: test16_7:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: mov w8, #9272		; CHECK-NEXT: add w8, w0, #2, lsl #12 ; =8192
; CHECK-NEXT: mov w9, #22619		; CHECK-NEXT: mov w9, #22619
; CHECK-NEXT: add w8, w0, w8		; CHECK-NEXT: add w8, w8, #1080
; CHECK-NEXT: cmp w9, w8, uxth		; CHECK-NEXT: cmp w9, w8, uxth
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%0 = add i16 %x, 9272		%0 = add i16 %x, 9272
%1 = icmp uge i16 %0, -42916		%1 = icmp uge i16 %0, -42916
br i1 %1, label %ret_true, label %ret_false		br i1 %1, label %ret_true, label %ret_false
ret_false:		ret_false:
Show All 22 Lines

llvm/test/CodeGen/AArch64/arm64-srl-and.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -O3 < %s \| FileCheck %s

	; This used to miscompile:			; This used to miscompile:
	; The 16-bit -1 should not become 32-bit -1 (sub w8, w8, #1).			; The 16-bit -1 should not become 32-bit -1 (sub w8, w8, #1).

	@g = global i16 0, align 4			@g = global i16 0, align 4
	define i32 @srl_and() {			define i32 @srl_and() {
	; CHECK-LABEL: srl_and:			; CHECK-LABEL: srl_and:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: adrp x8, :got:g			; CHECK-NEXT: adrp x8, :got:g
	; CHECK-NEXT: mov w9, #50			; CHECK-NEXT: mov w9, #50
	; CHECK-NEXT: ldr x8, [x8, :got_lo12:g]			; CHECK-NEXT: ldr x8, [x8, :got_lo12:g]
	; CHECK-NEXT: ldrh w8, [x8]			; CHECK-NEXT: ldrh w8, [x8]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: eor w8, w8, w9
	; CHECK-NEXT: mov w9, #65535			; CHECK-NEXT: add w8, w8, #15, lsl #12 // =61440
	; CHECK-NEXT: add w8, w8, w9			; CHECK-NEXT: add w8, w8, #4095
	; CHECK-NEXT: and w0, w8, w8, lsr #16			; CHECK-NEXT: and w0, w8, w8, lsr #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = load i16, i16* @g, align 4			%0 = load i16, i16* @g, align 4
	%1 = xor i16 %0, 50			%1 = xor i16 %0, 50
	%tobool = icmp ne i16 %1, 0			%tobool = icmp ne i16 %1, 0
	%lor.ext = zext i1 %tobool to i32			%lor.ext = zext i1 %tobool to i32
	%sub = add i16 %1, -1			%sub = add i16 %1, -1

	%srl = zext i16 %sub to i32			%srl = zext i16 %sub to i32
	%and = and i32 %srl, %lor.ext			%and = and i32 %srl, %lor.ext

	ret i32 %and			ret i32 %and
	}			}

llvm/test/CodeGen/AArch64/fast-isel-gep.ll

	Show All 37 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = getelementptr inbounds i32, i32* %a, i64 1024			%1 = getelementptr inbounds i32, i32* %a, i64 1024
	ret i32* %1			ret i32* %1
	}			}

	define i32* @test_array4(i32* %a) {			define i32* @test_array4(i32* %a) {
	; CHECK-LABEL: test_array4:			; CHECK-LABEL: test_array4:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: mov x8, #4104			; CHECK-NEXT: add x8, x0, #1, lsl #12 ; =4096
	; CHECK-NEXT: add x0, x0, x8			; CHECK-NEXT: add x0, x8, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = getelementptr inbounds i32, i32* %a, i64 1026			%1 = getelementptr inbounds i32, i32* %a, i64 1026
	ret i32* %1			ret i32* %1
	}			}

	define i32* @test_array5(i32* %a, i32 %i) {			define i32* @test_array5(i32* %a, i32 %i) {
	; CHECK-LABEL: test_array5:			; CHECK-LABEL: test_array5:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: ; kill: def $w1 killed $w1 def $x1			; CHECK-NEXT: ; kill: def $w1 killed $w1 def $x1
	; CHECK-NEXT: mov x8, #4			; CHECK-NEXT: mov x8, #4
	; CHECK-NEXT: sxtw x9, w1			; CHECK-NEXT: sxtw x9, w1
	; CHECK-NEXT: madd x0, x9, x8, x0			; CHECK-NEXT: madd x0, x9, x8, x0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = getelementptr inbounds i32, i32* %a, i32 %i			%1 = getelementptr inbounds i32, i32* %a, i32 %i
	ret i32* %1			ret i32* %1
	}			}

llvm/test/CodeGen/AArch64/nontemporal.ll

	Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines

	entry:			entry:
	store <17 x float> %v, <17 x float>* %ptr, align 4, !nontemporal !0			store <17 x float> %v, <17 x float>* %ptr, align 4, !nontemporal !0
	ret void			ret void
	}			}
	define void @test_stnp_v16i32_invalid_offset(<16 x i32> %v, <16 x i32>* %ptr) {			define void @test_stnp_v16i32_invalid_offset(<16 x i32> %v, <16 x i32>* %ptr) {
	; CHECK-LABEL: test_stnp_v16i32_invalid_offset:			; CHECK-LABEL: test_stnp_v16i32_invalid_offset:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: mov w8, #32032			; CHECK-NEXT: add x8, x0, #7, lsl #12 ; =28672
	; CHECK-NEXT: mov w9, #32000			; CHECK-NEXT: add x9, x8, #3360
	; CHECK-NEXT: add x8, x0, x8			; CHECK-NEXT: add x8, x8, #3328
	; CHECK-NEXT: add x9, x0, x9			; CHECK-NEXT: stnp q2, q3, [x9]
	; CHECK-NEXT: stnp q2, q3, [x8]			; CHECK-NEXT: stnp q0, q1, [x8]
	; CHECK-NEXT: stnp q0, q1, [x9]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	entry:			entry:
	%gep = getelementptr <16 x i32>, <16 x i32>* %ptr, i32 500			%gep = getelementptr <16 x i32>, <16 x i32>* %ptr, i32 500
	store <16 x i32> %v, <16 x i32>* %gep, align 4, !nontemporal !0			store <16 x i32> %v, <16 x i32>* %gep, align 4, !nontemporal !0
	ret void			ret void
	}			}

	Show All 31 Lines

llvm/test/CodeGen/AArch64/srem-vector-lkk.ll

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <4 x i16> %1		ret <4 x i16> %1
}		}

; Don't fold if the divisor is 2^15.		; Don't fold if the divisor is 2^15.
define <4 x i16> @dont_fold_srem_i16_smax(<4 x i16> %x) {		define <4 x i16> @dont_fold_srem_i16_smax(<4 x i16> %x) {
; CHECK-LABEL: dont_fold_srem_i16_smax:		; CHECK-LABEL: dont_fold_srem_i16_smax:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0		; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-NEXT: smov w8, v0.h[2]		; CHECK-NEXT: smov w9, v0.h[2]
; CHECK-NEXT: mov w9, #17097
; CHECK-NEXT: smov w10, v0.h[1]		; CHECK-NEXT: smov w10, v0.h[1]
; CHECK-NEXT: movk w9, #45590, lsl #16		; CHECK-NEXT: mov w8, #17097
; CHECK-NEXT: mov w11, #32767		; CHECK-NEXT: smov w11, v0.h[3]
; CHECK-NEXT: smov w12, v0.h[3]		; CHECK-NEXT: movk w8, #45590, lsl #16
; CHECK-NEXT: movi d1, #0000000000000000		; CHECK-NEXT: movi d1, #0000000000000000
; CHECK-NEXT: smull x9, w8, w9		; CHECK-NEXT: smull x8, w9, w8
; CHECK-NEXT: add w11, w10, w11		; CHECK-NEXT: add w12, w10, #7, lsl #12 // =28672
; CHECK-NEXT: cmp w10, #0		; CHECK-NEXT: cmp w10, #0
; CHECK-NEXT: lsr x9, x9, #32		; CHECK-NEXT: add w12, w12, #4095
; CHECK-NEXT: csel w11, w11, w10, lt		; CHECK-NEXT: lsr x8, x8, #32
; CHECK-NEXT: add w9, w9, w8		; CHECK-NEXT: csel w12, w12, w10, lt
; CHECK-NEXT: and w11, w11, #0xffff8000		; CHECK-NEXT: add w8, w8, w9
; CHECK-NEXT: asr w13, w9, #4		; CHECK-NEXT: and w12, w12, #0xffff8000
; CHECK-NEXT: sub w10, w10, w11		; CHECK-NEXT: asr w13, w8, #4
; CHECK-NEXT: mov w11, #47143		; CHECK-NEXT: sub w10, w10, w12
; CHECK-NEXT: add w9, w13, w9, lsr #31		; CHECK-NEXT: mov w12, #47143
		; CHECK-NEXT: add w8, w13, w8, lsr #31
; CHECK-NEXT: mov w13, #23		; CHECK-NEXT: mov w13, #23
; CHECK-NEXT: movk w11, #24749, lsl #16		; CHECK-NEXT: movk w12, #24749, lsl #16
; CHECK-NEXT: mov v1.h[1], w10		; CHECK-NEXT: mov v1.h[1], w10
; CHECK-NEXT: msub w8, w9, w13, w8		; CHECK-NEXT: msub w8, w8, w13, w9
; CHECK-NEXT: smull x9, w12, w11		; CHECK-NEXT: smull x9, w11, w12
; CHECK-NEXT: lsr x10, x9, #63		; CHECK-NEXT: lsr x10, x9, #63
; CHECK-NEXT: asr x9, x9, #43		; CHECK-NEXT: asr x9, x9, #43
; CHECK-NEXT: add w9, w9, w10		; CHECK-NEXT: add w9, w9, w10
; CHECK-NEXT: mov w10, #5423		; CHECK-NEXT: mov w10, #5423
; CHECK-NEXT: mov v1.h[2], w8		; CHECK-NEXT: mov v1.h[2], w8
; CHECK-NEXT: msub w8, w9, w10, w12		; CHECK-NEXT: msub w8, w9, w10, w11
; CHECK-NEXT: mov v1.h[3], w8		; CHECK-NEXT: mov v1.h[3], w8
; CHECK-NEXT: fmov d0, d1		; CHECK-NEXT: fmov d0, d1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = srem <4 x i16> %x, <i16 1, i16 32768, i16 23, i16 5423>		%1 = srem <4 x i16> %x, <i16 1, i16 32768, i16 23, i16 5423>
ret <4 x i16> %1		ret <4 x i16> %1
}		}

; Don't fold i64 srem.		; Don't fold i64 srem.
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/large-offset-gep.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s		; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s

%struct_type = type { [10000 x i32], i32, i32 }		%struct_type = type { [10000 x i32], i32, i32 }

define void @test1(%struct_type** %s, i32 %n) {		define void @test1(%struct_type** %s, i32 %n) {
; CHECK-LABEL: test1:		; CHECK-LABEL: test1:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov w10, #40000
; CHECK-NEXT: mov w8, wzr		; CHECK-NEXT: mov w8, wzr
; CHECK-NEXT: add x9, x9, x10		; CHECK-NEXT: add x9, x9, #9, lsl #12 // =36864
		; CHECK-NEXT: add x9, x9, #3136
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w8, w1
; CHECK-NEXT: b.ge .LBB0_2		; CHECK-NEXT: b.ge .LBB0_2
; CHECK-NEXT: .LBB0_1: // %while_body		; CHECK-NEXT: .LBB0_1: // %while_body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add w10, w8, #1		; CHECK-NEXT: add w10, w8, #1
; CHECK-NEXT: stp w10, w8, [x9]		; CHECK-NEXT: stp w10, w8, [x9]
; CHECK-NEXT: mov w8, w10		; CHECK-NEXT: mov w8, w10
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w8, w1
Show All 21 Lines	while_end:
ret void		ret void
}		}

define void @test2(%struct_type* %struct, i32 %n) {		define void @test2(%struct_type* %struct, i32 %n) {
; CHECK-LABEL: test2:		; CHECK-LABEL: test2:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: cbz x0, .LBB1_3		; CHECK-NEXT: cbz x0, .LBB1_3
; CHECK-NEXT: // %bb.1: // %while_cond.preheader		; CHECK-NEXT: // %bb.1: // %while_cond.preheader
; CHECK-NEXT: mov w9, #40000		; CHECK-NEXT: add x8, x0, #9, lsl #12 // =36864
; CHECK-NEXT: mov w8, wzr		; CHECK-NEXT: mov w9, wzr
; CHECK-NEXT: add x9, x0, x9		; CHECK-NEXT: add x8, x8, #3136
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w9, w1
; CHECK-NEXT: b.ge .LBB1_3		; CHECK-NEXT: b.ge .LBB1_3
; CHECK-NEXT: .LBB1_2: // %while_body		; CHECK-NEXT: .LBB1_2: // %while_body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add w10, w8, #1		; CHECK-NEXT: add w10, w9, #1
; CHECK-NEXT: stp w10, w8, [x9]		; CHECK-NEXT: stp w10, w9, [x8]
; CHECK-NEXT: mov w8, w10		; CHECK-NEXT: mov w9, w10
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w9, w1
; CHECK-NEXT: b.lt .LBB1_2		; CHECK-NEXT: b.lt .LBB1_2
; CHECK-NEXT: .LBB1_3: // %while_end		; CHECK-NEXT: .LBB1_3: // %while_end
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%cmp = icmp eq %struct_type* %struct, null		%cmp = icmp eq %struct_type* %struct, null
br i1 %cmp, label %while_end, label %while_cond		br i1 %cmp, label %while_end, label %while_cond

while_cond:		while_cond:
Show All 12 Lines
while_end:		while_end:
ret void		ret void
}		}

define void @test3(%struct_type* %s1, %struct_type* %s2, i1 %cond, i32 %n) {		define void @test3(%struct_type* %s1, %struct_type* %s2, i1 %cond, i32 %n) {
; CHECK-LABEL: test3:		; CHECK-LABEL: test3:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: tst w2, #0x1		; CHECK-NEXT: tst w2, #0x1
; CHECK-NEXT: csel x9, x1, x0, ne		; CHECK-NEXT: csel x8, x1, x0, ne
; CHECK-NEXT: cbz x9, .LBB2_3		; CHECK-NEXT: cbz x8, .LBB2_3
; CHECK-NEXT: // %bb.1: // %while_cond.preheader		; CHECK-NEXT: // %bb.1: // %while_cond.preheader
; CHECK-NEXT: mov w10, #40000		; CHECK-NEXT: add x8, x8, #9, lsl #12 // =36864
; CHECK-NEXT: mov w8, wzr		; CHECK-NEXT: mov w9, wzr
; CHECK-NEXT: add x9, x9, x10		; CHECK-NEXT: add x8, x8, #3136
; CHECK-NEXT: cmp w8, w3		; CHECK-NEXT: cmp w9, w3
; CHECK-NEXT: b.ge .LBB2_3		; CHECK-NEXT: b.ge .LBB2_3
; CHECK-NEXT: .LBB2_2: // %while_body		; CHECK-NEXT: .LBB2_2: // %while_body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add w10, w8, #1		; CHECK-NEXT: add w10, w9, #1
; CHECK-NEXT: stp w10, w8, [x9]		; CHECK-NEXT: stp w10, w9, [x8]
; CHECK-NEXT: mov w8, w10		; CHECK-NEXT: mov w9, w10
; CHECK-NEXT: cmp w8, w3		; CHECK-NEXT: cmp w9, w3
; CHECK-NEXT: b.lt .LBB2_2		; CHECK-NEXT: b.lt .LBB2_2
; CHECK-NEXT: .LBB2_3: // %while_end		; CHECK-NEXT: .LBB2_3: // %while_end
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
br i1 %cond, label %if_true, label %if_end		br i1 %cond, label %if_true, label %if_end

if_true:		if_true:
br label %if_end		br label %if_end
Show All 25 Lines

define void @test4(i32 %n) personality i32 (...)* @__FrameHandler {		define void @test4(i32 %n) personality i32 (...)* @__FrameHandler {
; CHECK-LABEL: test4:		; CHECK-LABEL: test4:
; CHECK: .Lfunc_begin0:		; CHECK: .Lfunc_begin0:
; CHECK-NEXT: .cfi_startproc		; CHECK-NEXT: .cfi_startproc
; CHECK-NEXT: .cfi_personality 0, __FrameHandler		; CHECK-NEXT: .cfi_personality 0, __FrameHandler
; CHECK-NEXT: .cfi_lsda 0, .Lexception0		; CHECK-NEXT: .cfi_lsda 0, .Lexception0
; CHECK-NEXT: // %bb.0: // %entry		; CHECK-NEXT: // %bb.0: // %entry
; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: str x30, [sp, #-32]! // 8-byte Folded Spill
; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK-NEXT: .cfi_def_cfa_offset 32		; CHECK-NEXT: .cfi_def_cfa_offset 32
; CHECK-NEXT: .cfi_offset w19, -8		; CHECK-NEXT: .cfi_offset w19, -8
; CHECK-NEXT: .cfi_offset w20, -16		; CHECK-NEXT: .cfi_offset w20, -16
; CHECK-NEXT: .cfi_offset w21, -24
; CHECK-NEXT: .cfi_offset w30, -32		; CHECK-NEXT: .cfi_offset w30, -32
; CHECK-NEXT: mov w19, w0		; CHECK-NEXT: mov w19, w0
; CHECK-NEXT: mov w20, wzr		; CHECK-NEXT: mov w20, wzr
; CHECK-NEXT: mov w21, #40000
; CHECK-NEXT: .LBB3_1: // %while_cond		; CHECK-NEXT: .LBB3_1: // %while_cond
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: .Ltmp0:		; CHECK-NEXT: .Ltmp0:
; CHECK-NEXT: bl foo		; CHECK-NEXT: bl foo
; CHECK-NEXT: .Ltmp1:		; CHECK-NEXT: .Ltmp1:
; CHECK-NEXT: // %bb.2: // %while_cond_x.split		; CHECK-NEXT: // %bb.2: // %while_cond_x.split
; CHECK-NEXT: // in Loop: Header=BB3_1 Depth=1		; CHECK-NEXT: // in Loop: Header=BB3_1 Depth=1
; CHECK-NEXT: add x8, x0, x21		; CHECK-NEXT: add x8, x0, #9, lsl #12 // =36864
; CHECK-NEXT: cmp w20, w19		; CHECK-NEXT: cmp w20, w19
		; CHECK-NEXT: add x8, x8, #3136
; CHECK-NEXT: str wzr, [x8]		; CHECK-NEXT: str wzr, [x8]
; CHECK-NEXT: b.ge .LBB3_4		; CHECK-NEXT: b.ge .LBB3_4
; CHECK-NEXT: // %bb.3: // %while_body		; CHECK-NEXT: // %bb.3: // %while_body
; CHECK-NEXT: // in Loop: Header=BB3_1 Depth=1		; CHECK-NEXT: // in Loop: Header=BB3_1 Depth=1
; CHECK-NEXT: add w9, w20, #1		; CHECK-NEXT: add w9, w20, #1
; CHECK-NEXT: stp w9, w20, [x8]		; CHECK-NEXT: stp w9, w20, [x8]
; CHECK-NEXT: mov w20, w9		; CHECK-NEXT: mov w20, w9
; CHECK-NEXT: b .LBB3_1		; CHECK-NEXT: b .LBB3_1
; CHECK-NEXT: .LBB3_4: // %while_end		; CHECK-NEXT: .LBB3_4: // %while_end
; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ldr x30, [sp], #32 // 8-byte Folded Reload
; CHECK-NEXT: ret		; CHECK-NEXT: ret
; CHECK-NEXT: .LBB3_5: // %cleanup		; CHECK-NEXT: .LBB3_5: // %cleanup
; CHECK-NEXT: .Ltmp2:		; CHECK-NEXT: .Ltmp2:
; CHECK-NEXT: mov x19, x0		; CHECK-NEXT: mov x19, x0
; CHECK-NEXT: bl foo2		; CHECK-NEXT: bl foo2
; CHECK-NEXT: mov x0, x19		; CHECK-NEXT: mov x0, x19
; CHECK-NEXT: bl _Unwind_Resume		; CHECK-NEXT: bl _Unwind_Resume
entry:		entry:
Show All 27 Lines
}		}

declare i32 @__FrameHandler(...)		declare i32 @__FrameHandler(...)

define void @test5([65536 x i32]** %s, i32 %n) {		define void @test5([65536 x i32]** %s, i32 %n) {
; CHECK-LABEL: test5:		; CHECK-LABEL: test5:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov w10, #14464
; CHECK-NEXT: movk w10, #1, lsl #16
; CHECK-NEXT: mov w8, wzr		; CHECK-NEXT: mov w8, wzr
; CHECK-NEXT: add x9, x9, x10		; CHECK-NEXT: add x9, x9, #19, lsl #12 // =77824
		; CHECK-NEXT: add x9, x9, #2176
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w8, w1
; CHECK-NEXT: b.ge .LBB4_2		; CHECK-NEXT: b.ge .LBB4_2
; CHECK-NEXT: .LBB4_1: // %while_body		; CHECK-NEXT: .LBB4_1: // %while_body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add w10, w8, #1		; CHECK-NEXT: add w10, w8, #1
; CHECK-NEXT: stp w10, w8, [x9]		; CHECK-NEXT: stp w10, w8, [x9]
; CHECK-NEXT: mov w8, w10		; CHECK-NEXT: mov w8, w10
; CHECK-NEXT: cmp w8, w1		; CHECK-NEXT: cmp w8, w1
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 396842

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/test/CodeGen/AArch64/aarch64-combine-addsub-24bit-imm.mir

llvm/test/CodeGen/AArch64/addsub.ll

llvm/test/CodeGen/AArch64/and-mask-removal.ll

llvm/test/CodeGen/AArch64/arm64-srl-and.ll

llvm/test/CodeGen/AArch64/fast-isel-gep.ll

llvm/test/CodeGen/AArch64/nontemporal.ll

llvm/test/CodeGen/AArch64/srem-vector-lkk.ll

llvm/test/Transforms/CodeGenPrepare/AArch64/large-offset-gep.ll

[AArch64] Combine ADD/SUB instructions when they contain a 24-bit immediate.
AbandonedPublic