This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64InstrFormats.td
-
AArch64InstrInfo.h
-
AArch64InstrInfo.cpp
8/16
AArch64MIPeepholeOpt.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
arm64-assert-zext-sext.ll
-
redundant-mov-from-zero-extend.ll
-
redundant-orrwrs-from-zero-extend.mir

Differential D110841

[AArch64] Remove redundant ORRWrs which is generated by zero-extend
ClosedPublic

Authored by jaykang10 on Sep 30 2021, 8:18 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
dmgreen
efriedma

Commits

rGa50243625930: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

Summary

There are 2 patterns for zero-extend as below.

//In the case of a 32-bit def that is known to implicitly zero-extend,
//we can use a SUBREG_TO_REG.
def : Pat<(i64 (zext def32:$src)),
          (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>;

//When we need to explicitly zero-extend, we use a 32-bit MOV instruction
//and then assert the extension has happened.
def : Pat<(i64 (zext GPR32:$src)),
          (SUBREG_TO_REG (i32 0), (ORRWrs WZR, GPR32:$src, 0), sub_32)>;

The def32 checks the $src needs explicitly zero-extend. However, it can not check the $src in other block and it adds ORRWrs conservatively in this case. This peephole optimization checks ORRWrs is for redundant zero-extend and try to remove it.

Diff Detail

Event Timeline

jaykang10 created this revision.Sep 30 2021, 8:18 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptSep 30 2021, 8:18 AM

jaykang10 requested review of this revision.Sep 30 2021, 8:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2021, 8:18 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B126581: Diff 376230.Sep 30 2021, 11:06 AM

Fixed a bug

replaceRegWith changes MI's defintion register. Keep it for SSA form until deleting MI.

Can you explain in more details what makes this valid? Does it depend on the top bits already being zero? What verifies that?

It might be useful to add mir tests too, to test specific cases that should/shouldn't be removed.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
225	Do we need to check for WZR?
231	Do we need to avoid when they are in the same block? It would be easier for testing if we didn't.
244	Why do we rule these out? Why don't we rule out anything else?
250	-> definition

In D110841#3039405, @dmgreen wrote:

Can you explain in more details what makes this valid? Does it depend on the top bits already being zero? What verifies that?

It might be useful to add mir tests too, to test specific cases that should/shouldn't be removed.

At instruction selection level, we check the operations which do not zero-out the high half of the 64-bit register using isDef32 function as below.

// Any instruction that defines a 32-bit result zeros out the high half of the
// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
// be copying from a truncate. But any other 32-bit operation will zero-extend
// up to 64 bits. AssertSext/AssertZext aren't saying anything about the upper
// 32 bits, they're probably just qualifying a CopyFromReg.
static inline bool isDef32(const SDNode &N) {
  unsigned Opc = N.getOpcode();
  return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
         Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
         Opc != ISD::AssertZext && Opc != ISD::AssertAlign &&
         Opc != ISD::FREEZE;
}

As you can see, the isDef32 checks fundamentally ISD::TRUNCATE within a basic block because the below pattern is matched with the ISD::TRUNCATE and EXTRACT_SUBREG does not guarantee the high 32 bits are zero.

// To truncate, we can simply extract from a subregister.
def : Pat<(i32 (trunc GPR64sp:$src)),
          (i32 (EXTRACT_SUBREG GPR64sp:$src, sub_32))>;

This patch checks the ORRWrs has EXTRACT_SUBREG in different basic block as operand because the existing pattern with isDef32 resolves case in which the operand is in same basic block.

Let me try to add MIR tests.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
225	You are right! We need to check it because there is below assembly pattern. Let me update the code. def : InstAlias<"mov $dst, $src", (ORRWrs GPR32:$dst, WZR, GPR32:$src, 0), 2>;
231	If the pattern is in same block, I thought it has already been handled by existing pattern with `isDef32` and we do not need to check it. def def32 : PatLeaf<(i32 GPR32:$src), [{ return isDef32(*N); }]>; // In the case of a 32-bit def that is known to implicitly zero-extend, // we can use a SUBREG_TO_REG. def : Pat<(i64 (zext def32:$src)), (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>;
244	I was not sure it is good to keep track of the operands of PHI and COPY in this patch... because it could make code complicated... for example, checking cycled phi. If possible, I would like to solve it in separate patch...
250	Sorry... let me update it.

Following comments of @dmgreen, updated patch.

Checked WZR
Added MIR tests

At instruction selection level, we check the operations which do not zero-out the high half of the 64-bit register using isDef32 function as below.
// Any instruction that defines a 32-bit result zeros out the high half of the
// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
// be copying from a truncate. But any other 32-bit operation will zero-extend
// up to 64 bits. AssertSext/AssertZext aren't saying anything about the upper
// 32 bits, they're probably just qualifying a CopyFromReg.
static inline bool isDef32(const SDNode &N) {
  unsigned Opc = N.getOpcode();
  return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
         Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
         Opc != ISD::AssertZext && Opc != ISD::AssertAlign &&
         Opc != ISD::FREEZE;
}
As you can see, the isDef32 checks fundamentally ISD::TRUNCATE within a basic block because the below pattern is matched with the ISD::TRUNCATE and EXTRACT_SUBREG does not guarantee the high 32 bits are zero.
// To truncate, we can simply extract from a subregister.
def : Pat<(i32 (trunc GPR64sp:$src)),
          (i32 (EXTRACT_SUBREG GPR64sp:$src, sub_32))>;
This patch checks the ORRWrs has EXTRACT_SUBREG in different basic block as operand because the existing pattern with isDef32 resolves case in which the operand is in same basic block.

That explains things in terms of DAG-ISel, but there are other instruction selectors and different optimization between then and here. (Plus the isDef32 has had so many bugs it's difficult to trust!)

We know that all (?) instructions that generate a W register under AArch64 will zero the upper bits of the X register. We seems to say in this patch that certain EXTRACT_SUBREG are not valid, COPY and PHI are currently excluded. Is that really all we have to worry about? Do we know that the top bits are always 0 for all other grp32 sources?

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
231	But, is that needed for this pass? I can see it is how DAG ISel works, but if there are this pattern of code in the same block, it should still work fine, shouldn't it?

In D110841#3042058, @dmgreen wrote:
At instruction selection level, we check the operations which do not zero-out the high half of the 64-bit register using isDef32 function as below.
// Any instruction that defines a 32-bit result zeros out the high half of the
// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
// be copying from a truncate. But any other 32-bit operation will zero-extend
// up to 64 bits. AssertSext/AssertZext aren't saying anything about the upper
// 32 bits, they're probably just qualifying a CopyFromReg.
static inline bool isDef32(const SDNode &N) {
  unsigned Opc = N.getOpcode();
  return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
         Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
         Opc != ISD::AssertZext && Opc != ISD::AssertAlign &&
         Opc != ISD::FREEZE;
}
As you can see, the isDef32 checks fundamentally ISD::TRUNCATE within a basic block because the below pattern is matched with the ISD::TRUNCATE and EXTRACT_SUBREG does not guarantee the high 32 bits are zero.
// To truncate, we can simply extract from a subregister.
def : Pat<(i32 (trunc GPR64sp:$src)),
          (i32 (EXTRACT_SUBREG GPR64sp:$src, sub_32))>;
This patch checks the ORRWrs has EXTRACT_SUBREG in different basic block as operand because the existing pattern with isDef32 resolves case in which the operand is in same basic block.
That explains things in terms of DAG-ISel, but there are other instruction selectors and different optimization between then and here. (Plus the isDef32 has had so many bugs it's difficult to trust!)

We know that all (?) instructions that generate a W register under AArch64 will zero the upper bits of the X register. We seems to say in this patch that certain EXTRACT_SUBREG are not valid, COPY and PHI are currently excluded. Is that really all we have to worry about? Do we know that the top bits are always 0 for all other grp32 sources?

I agree with you. I am also not sure about it...

As you mentioned, If AArch64's 32-bit form of instruction defines the source operand of zero-extend, we do not need the zero-extend.

From https://developer.arm.com/documentation/dui0801/b/BABBGCAC
When you use the 32-bit form of an instruction, the upper 32 bits of the source registers are ignored and the upper 32 bits of the destination register are set to zero.

We need to check the zero-extend's source operands which do not come from the destination register of AArch64's 32-bit form instructions. I thought we are not sure the upper 32-bits set to zero in below two cases.

Function argument - There would be copy instructions from physical register to virtual register.
EXTRACT_SUBREG

If you feel there are more cases, please let me know. Let me update code with the cases.

For keeping track of PHI and COPY's operands, if possible, I would like to handle them after we clearly understanding the cases which does not guarantee the upper 32-bit set to zero. I am sorry for that...

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
231	Yep, I agree with you. Let me remove the check for different block.

Rebased

Harbormaster completed remote builds in B128074: Diff 378618.Oct 11 2021, 4:34 AM

Sorry for Ping.

I'm generally worried about allowing unknown instructions here.

Any AArch64 instruction that produces a 32-bit result zeros the high bits, yes. But some MachineInstr opcodes aren't really instructions in this sense. You've noted EXTRACT_SUBREG, PHI, and COPY specifically. Probably missing IMPLICIT_DEF. Not sure if there are other relevant instructions; any pseudo-instruction is potentially an issue, but auditing them for AArch64, the target-specific ones mostly don't produce GPR32.

In any case, I'd be happier if we had a bit to check for a "real" instruction, in TSFlags or something like that. I don't want to worry about modifying this in the future if, for example, we end up with a FREEZE MachineInstr.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
231	Missed update?

jaykang10 mentioned this in D109963: [AArch64] Split bitmask immediate of bitwise AND operation.Oct 15 2021, 2:39 PM

In D110841#3067558, @efriedma wrote:

I'm generally worried about allowing unknown instructions here.

Any AArch64 instruction that produces a 32-bit result zeros the high bits, yes. But some MachineInstr opcodes aren't really instructions in this sense. You've noted EXTRACT_SUBREG, PHI, and COPY specifically. Probably missing IMPLICIT_DEF. Not sure if there are other relevant instructions; any pseudo-instruction is potentially an issue, but auditing them for AArch64, the target-specific ones mostly don't produce GPR32.

In any case, I'd be happier if we had a bit to check for a "real" instruction, in TSFlags or something like that. I don't want to worry about modifying this in the future if, for example, we end up with a FREEZE MachineInstr.

I agree with you. Let me try to check the real instructions.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
231	Sorry... let me remove it.

In D110841#3064537, @jaykang10 wrote:

Sorry for Ping.

I was looking a bit last week, but didn't get very far into proving this is correct. Eli's suggestions sound good to me, and having a nice big comment explaining why it is valid would be good too.

In D110841#3068234, @dmgreen wrote:

In D110841#3064537, @jaykang10 wrote:

Sorry for Ping.

I was looking a bit last week, but didn't get very far into proving this is correct. Eli's suggestions sound good to me, and having a nice big comment explaining why it is valid would be good too.

Yep, once I finish to check the real instructions, let me try to add a big comment.

Following the comment of @efriedma, updated patch

Added a bit to TSFlags for checking real AArch64 instruction

Harbormaster completed remote builds in B129377: Diff 380439.Oct 18 2021, 10:34 AM

Do we need to add an extra TS bit, or can we just use GENERIC_OP_END?

As far as I understand the opcodes are always in the order: [TargetOpcodes, G_ opcodes, A64 Pseudos, A64 instructions]. Do we need to rule out A64 pseudos? If so can we check isPseudo().

In D110841#3071072, @dmgreen wrote:

Do we need to add an extra TS bit, or can we just use GENERIC_OP_END?

As far as I understand the opcodes are always in the order: [TargetOpcodes, G_ opcodes, A64 Pseudos, A64 instructions]. Do we need to rule out A64 pseudos? If so can we check isPseudo().

Ah, you are right!!! Let me update code. Thanks @dmgreen

Following comment of @dmgreen, updated patch.

Check TargetOpcode::GENERIC_OP_END to distinguish real AArch64 instruction instead of adding a additional bit to TSFlags. It makes same effect.

Harbormaster completed remote builds in B129500: Diff 380617.Oct 19 2021, 3:04 AM

I have checked bootstrap build and run check-all with the build. It looks OK.

Thanks.

I assume you have checked through the AArch64 pseudo instructions (the ones before ABS_ZPmZ_B in build/lib/Target/AArch64/AArch64GenInstrInfo.inc) and they look OK? They will follow the same rules of producing zeroed upper bits for W register definitions.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
21	I don't think that representing this in terms of ISel patterns is useful. It should preferably be described in terms of the Machine Instructions that will be present, no matter where they come from.
llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll
40	This looks like the kind of test we could use update_llc_test_checks on.

In D110841#3074725, @dmgreen wrote:

Thanks.

I assume you have checked through the AArch64 pseudo instructions (the ones before ABS_ZPmZ_B in build/lib/Target/AArch64/AArch64GenInstrInfo.inc) and they look OK? They will follow the same rules of producing zeroed upper bits for W register definitions.

um... I have checked the pseudo MIs which inherits Pseudo class in the AArch64InstrInfo.td and AArch64InstrFormats.td files and expandMI function in AArch64ExpandPseudoInsts.cpp. I was able to see the pseudo MIs are expanded to the AArch64 MIs... and I thought they follows the rule.

If you feel some pseudo MIs do not follow the rule, please let me know. I could miss some MIs...

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
21	Yep, let me update the comment.
llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll
40	It looked it is pre-commit test with NFC tag. Let me update the expected output with update_llc_test_checks.

Following comment of @dmgreen, updated patch.

Do we need to remove Kill flags from the uses of the register we are replacing? From something like this:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-none-none-eabi"

@e = dso_local local_unnamed_addr global i16 0, align 4
@b = dso_local local_unnamed_addr global i32 0, align 4
@d = dso_local local_unnamed_addr global i32 0, align 4
@c = dso_local local_unnamed_addr global i32* null, align 8
@a = dso_local local_unnamed_addr global i32 0, align 4

define i32 @i() {
entry:
  %0 = load i32, i32* @b, align 4
  %1 = trunc i32 %0 to i16
  %conv1 = and i16 %1, 255
  %2 = load i32, i32* @d, align 4
  %tobool.not = icmp eq i32 %2, 0
  br i1 %tobool.not, label %if.end, label %if.then

  if.then:                                          ; preds = %entry
  %conv2 = zext i16 %conv1 to i64
  %3 = inttoptr i64 %conv2 to i32*
  store i32* %3, i32** @c, align 8
  br label %if.end

  if.end:                                           ; preds = %if.then, %entry
  %4 = load i32, i32* @a, align 4
  %5 = trunc i32 %4 to i16
  %conv4 = or i16 %conv1, %5
  store i16 %conv4, i16* @e, align 4
  ret i32 0
}

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
249	Can we add a dbgs() output explaining what was removed?

Harbormaster completed remote builds in B129702: Diff 380908.Oct 20 2021, 6:07 AM

In D110841#3074973, @dmgreen wrote:

Do we need to remove Kill flags from the uses of the register we are replacing? From something like this:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-none-none-eabi"

@e = dso_local local_unnamed_addr global i16 0, align 4
@b = dso_local local_unnamed_addr global i32 0, align 4
@d = dso_local local_unnamed_addr global i32 0, align 4
@c = dso_local local_unnamed_addr global i32* null, align 8
@a = dso_local local_unnamed_addr global i32 0, align 4

define i32 @i() {
entry:
  %0 = load i32, i32* @b, align 4
  %1 = trunc i32 %0 to i16
  %conv1 = and i16 %1, 255
  %2 = load i32, i32* @d, align 4
  %tobool.not = icmp eq i32 %2, 0
  br i1 %tobool.not, label %if.end, label %if.then

  if.then:                                          ; preds = %entry
  %conv2 = zext i16 %conv1 to i64
  %3 = inttoptr i64 %conv2 to i32*
  store i32* %3, i32** @c, align 8
  br label %if.end

  if.end:                                           ; preds = %if.then, %entry
  %4 = load i32, i32* @a, align 4
  %5 = trunc i32 %4 to i16
  %conv4 = or i16 %conv1, %5
  store i16 %conv4, i16* @e, align 4
  ret i32 0
}

You are right! We need to clear the kill flag of the source register of the ORRWrs. I was confused with the bitmask peephole opt case.

Let me clear the kill flag. Thanks for catching that. @dmgreen

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
249	Yep, let me add debug output.

Following comment of @dmgreen, updated patch.

Cleared kill flag of source operand of ORRWrs.

Harbormaster completed remote builds in B129722: Diff 380940.Oct 20 2021, 7:44 AM

In D110841#3075110, @jaykang10 wrote:

Following comment of @dmgreen, updated patch.

Cleared kill flag of source operand of ORRWrs.

No error from bootstrap build and check-all on AArch64 machine.

No error from bootstrap build and check-all on AArch64 machine.

Thanks. Knowing you were away for a few days I took the time to include this in some csmith testing which I happened to already be running for another reason. It didn't come up with any other problems related to this, which is a good sign (but doesn't mean something else might be wrong, just that csmith like code does OK). So LGTM.

This revision is now accepted and ready to land.Oct 25 2021, 12:13 AM

In D110841#3083244, @dmgreen wrote:

No error from bootstrap build and check-all on AArch64 machine.

Thanks. Knowing you were away for a few days I took the time to include this in some csmith testing which I happened to already be running for another reason. It didn't come up with any other problems related to this, which is a good sign (but doesn't mean something else might be wrong, just that csmith like code does OK). So LGTM.

Thanks for checking csmith @dmgreen! Let me push this patch after rebase.

Closed by commit rGa50243625930: [AArch64] Remove redundant ORRWrs which is generated by zero-extend (authored by jaykang10). · Explain WhyOct 25 2021, 1:49 AM

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rGa50243625930: [AArch64] Remove redundant ORRWrs which is generated by zero-extend.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrFormats.td

2 lines

AArch64InstrInfo.h

5 lines

AArch64InstrInfo.cpp

4 lines

AArch64MIPeepholeOpt.cpp

62 lines

test/

CodeGen/

AArch64/

arm64-assert-zext-sext.ll

3 lines

redundant-mov-from-zero-extend.ll

79 lines

redundant-orrwrs-from-zero-extend.mir

69 lines

Diff 380439

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	class AArch64Inst<Format f, string cstr> : Instruction {
field bits<32> SoftFail = Unpredictable;		field bits<32> SoftFail = Unpredictable;
let Namespace = "AArch64";		let Namespace = "AArch64";
Format F = f;		Format F = f;
bits<2> Form = F.Value;		bits<2> Form = F.Value;

// Defaults		// Defaults
bit isWhile = 0;		bit isWhile = 0;
bit isPTestLike = 0;		bit isPTestLike = 0;
		bit isRealAArch64Inst = 1;
FalseLanesEnum FalseLanes = FalseLanesNone;		FalseLanesEnum FalseLanes = FalseLanesNone;
DestructiveInstTypeEnum DestructiveInstType = NotDestructive;		DestructiveInstTypeEnum DestructiveInstType = NotDestructive;
ElementSizeEnum ElementSize = ElementSizeNone;		ElementSizeEnum ElementSize = ElementSizeNone;

		let TSFlags{11} = isRealAArch64Inst;
let TSFlags{10} = isPTestLike;		let TSFlags{10} = isPTestLike;
let TSFlags{9} = isWhile;		let TSFlags{9} = isWhile;
let TSFlags{8-7} = FalseLanes.Value;		let TSFlags{8-7} = FalseLanes.Value;
let TSFlags{6-3} = DestructiveInstType.Value;		let TSFlags{6-3} = DestructiveInstType.Value;
let TSFlags{2-0} = ElementSize.Value;		let TSFlags{2-0} = ElementSize.Value;

let Pattern = [];		let Pattern = [];
let Constraints = cstr;		let Constraints = cstr;
▲ Show 20 Lines • Show All 11,371 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	public:
/// Returns the vector element size (B, H, S or D) of an SVE opcode.		/// Returns the vector element size (B, H, S or D) of an SVE opcode.
uint64_t getElementSizeForOpcode(unsigned Opc) const;		uint64_t getElementSizeForOpcode(unsigned Opc) const;
/// Returns true if the opcode is for an SVE instruction that sets the		/// Returns true if the opcode is for an SVE instruction that sets the
/// condition codes as if it's results had been fed to a PTEST instruction		/// condition codes as if it's results had been fed to a PTEST instruction
/// along with the same general predicate.		/// along with the same general predicate.
bool isPTestLikeOpcode(unsigned Opc) const;		bool isPTestLikeOpcode(unsigned Opc) const;
/// Returns true if the opcode is for an SVE WHILE## instruction.		/// Returns true if the opcode is for an SVE WHILE## instruction.
bool isWhileOpcode(unsigned Opc) const;		bool isWhileOpcode(unsigned Opc) const;
		/// Returns true if the opcode is for an AArch64 instruction.
		bool isRealAArch64Inst(unsigned Opc) const;
/// Returns true if the instruction has a shift by immediate that can be		/// Returns true if the instruction has a shift by immediate that can be
/// executed in one cycle less.		/// executed in one cycle less.
static bool isFalkorShiftExtFast(const MachineInstr &MI);		static bool isFalkorShiftExtFast(const MachineInstr &MI);
/// Return true if the instructions is a SEH instruciton used for unwinding		/// Return true if the instructions is a SEH instruciton used for unwinding
/// on Windows.		/// on Windows.
static bool isSEHInstruction(const MachineInstr &MI);		static bool isSEHInstruction(const MachineInstr &MI);

Optional<RegImmPair> isAddImmediate(const MachineInstr &MI,		Optional<RegImmPair> isAddImmediate(const MachineInstr &MI,
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines

/// Return opcode to be used for indirect calls.		/// Return opcode to be used for indirect calls.
unsigned getBLRCallOpcode(const MachineFunction &MF);		unsigned getBLRCallOpcode(const MachineFunction &MF);

// struct TSFlags {		// struct TSFlags {
#define TSFLAG_ELEMENT_SIZE_TYPE(X) (X) // 3-bits		#define TSFLAG_ELEMENT_SIZE_TYPE(X) (X) // 3-bits
#define TSFLAG_DESTRUCTIVE_INST_TYPE(X) ((X) << 3) // 4-bits		#define TSFLAG_DESTRUCTIVE_INST_TYPE(X) ((X) << 3) // 4-bits
#define TSFLAG_FALSE_LANE_TYPE(X) ((X) << 7) // 2-bits		#define TSFLAG_FALSE_LANE_TYPE(X) ((X) << 7) // 2-bits
#define TSFLAG_INSTR_FLAGS(X) ((X) << 9) // 2-bits		#define TSFLAG_INSTR_FLAGS(X) ((X) << 9) // 3-bits
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -#define TSFLAG_INSTR_FLAGS(X) ((X) << 9) // 3-bits +#define TSFLAG_INSTR_FLAGS(X) ((X) << 9) // 3-bits Lint: Pre-merge checks: clang-format: please reformat the code ``` -#define TSFLAG_INSTR_FLAGS(X) ((X) << 9)…
// }		// }

namespace AArch64 {		namespace AArch64 {

enum ElementSizeType {		enum ElementSizeType {
ElementSizeMask = TSFLAG_ELEMENT_SIZE_TYPE(0x7),		ElementSizeMask = TSFLAG_ELEMENT_SIZE_TYPE(0x7),
ElementSizeNone = TSFLAG_ELEMENT_SIZE_TYPE(0x0),		ElementSizeNone = TSFLAG_ELEMENT_SIZE_TYPE(0x0),
ElementSizeB = TSFLAG_ELEMENT_SIZE_TYPE(0x1),		ElementSizeB = TSFLAG_ELEMENT_SIZE_TYPE(0x1),
Show All 20 Lines	enum FalseLaneType {
FalseLanesMask = TSFLAG_FALSE_LANE_TYPE(0x3),		FalseLanesMask = TSFLAG_FALSE_LANE_TYPE(0x3),
FalseLanesZero = TSFLAG_FALSE_LANE_TYPE(0x1),		FalseLanesZero = TSFLAG_FALSE_LANE_TYPE(0x1),
FalseLanesUndef = TSFLAG_FALSE_LANE_TYPE(0x2),		FalseLanesUndef = TSFLAG_FALSE_LANE_TYPE(0x2),
};		};

// NOTE: This is a bit field.		// NOTE: This is a bit field.
static const uint64_t InstrFlagIsWhile = TSFLAG_INSTR_FLAGS(0x1);		static const uint64_t InstrFlagIsWhile = TSFLAG_INSTR_FLAGS(0x1);
static const uint64_t InstrFlagIsPTestLike = TSFLAG_INSTR_FLAGS(0x2);		static const uint64_t InstrFlagIsPTestLike = TSFLAG_INSTR_FLAGS(0x2);
		static const uint64_t InstrFlagIsRealAArch64Inst = TSFLAG_INSTR_FLAGS(0x4);

#undef TSFLAG_ELEMENT_SIZE_TYPE		#undef TSFLAG_ELEMENT_SIZE_TYPE
#undef TSFLAG_DESTRUCTIVE_INST_TYPE		#undef TSFLAG_DESTRUCTIVE_INST_TYPE
#undef TSFLAG_FALSE_LANE_TYPE		#undef TSFLAG_FALSE_LANE_TYPE
#undef TSFLAG_INSTR_FLAGS		#undef TSFLAG_INSTR_FLAGS

int getSVEPseudoMap(uint16_t Opcode);		int getSVEPseudoMap(uint16_t Opcode);
int getSVERevInstr(uint16_t Opcode);		int getSVERevInstr(uint16_t Opcode);
int getSVENonRevInstr(uint16_t Opcode);		int getSVENonRevInstr(uint16_t Opcode);
}		}

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,543 Lines • ▼ Show 20 Lines
	bool AArch64InstrInfo::isPTestLikeOpcode(unsigned Opc) const {			bool AArch64InstrInfo::isPTestLikeOpcode(unsigned Opc) const {
	return get(Opc).TSFlags & AArch64::InstrFlagIsPTestLike;			return get(Opc).TSFlags & AArch64::InstrFlagIsPTestLike;
	}			}

	bool AArch64InstrInfo::isWhileOpcode(unsigned Opc) const {			bool AArch64InstrInfo::isWhileOpcode(unsigned Opc) const {
	return get(Opc).TSFlags & AArch64::InstrFlagIsWhile;			return get(Opc).TSFlags & AArch64::InstrFlagIsWhile;
	}			}

				bool AArch64InstrInfo::isRealAArch64Inst(unsigned Opc) const {
				return get(Opc).TSFlags & AArch64::InstrFlagIsRealAArch64Inst;
				}

	unsigned int			unsigned int
	AArch64InstrInfo::getTailDuplicateSize(CodeGenOpt::Level OptLevel) const {			AArch64InstrInfo::getTailDuplicateSize(CodeGenOpt::Level OptLevel) const {
	return OptLevel >= CodeGenOpt::Aggressive ? 6 : 2;			return OptLevel >= CodeGenOpt::Aggressive ? 6 : 2;
	}			}

	unsigned llvm::getBLRCallOpcode(const MachineFunction &MF) {			unsigned llvm::getBLRCallOpcode(const MachineFunction &MF) {
	if (MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr())			if (MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr())
	return AArch64::BLRNoIP;			return AArch64::BLRNoIP;
	else			else
	return AArch64::BLR;			return AArch64::BLR;
	}			}

	#define GET_INSTRINFO_HELPERS			#define GET_INSTRINFO_HELPERS
	#define GET_INSTRMAP_INFO			#define GET_INSTRMAP_INFO
	#include "AArch64GenInstrInfo.inc"			#include "AArch64GenInstrInfo.inc"

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp

Show All 9 Lines
//		//
// 1. MOVi32imm + ANDWrr ==> ANDWri + ANDWri		// 1. MOVi32imm + ANDWrr ==> ANDWri + ANDWri
// MOVi64imm + ANDXrr ==> ANDXri + ANDXri		// MOVi64imm + ANDXrr ==> ANDXri + ANDXri
//		//
// The mov pseudo instruction could be expanded to multiple mov instructions		// The mov pseudo instruction could be expanded to multiple mov instructions
// later. In this case, we could try to split the constant operand of mov		// later. In this case, we could try to split the constant operand of mov
// instruction into two bitmask immediates. It makes two AND instructions		// instruction into two bitmask immediates. It makes two AND instructions
// intead of multiple `mov` + `and` instructions.		// intead of multiple `mov` + `and` instructions.
		//
		// 2. Remove redundant ORRWrs which is generated by zero-extend.
		//
		// There are 2 patterns for zero-extend as below.
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think that representing this in terms of ISel patterns is useful. It should preferably be described in terms of the Machine Instructions that will be present, no matter where they come from. dmgreen: I don't think that representing this in terms of ISel patterns is useful. It should preferably…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me update the comment. jaykang10: Yep, let me update the comment.
		//
		// //In the case of a 32-bit def that is known to implicitly zero-extend,
		// //we can use a SUBREG_TO_REG.
		// def : Pat<(i64 (zext def32:$src)),
		// (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>;
		//
		// //When we need to explicitly zero-extend, we use a 32-bit MOV instruction
		// //and then assert the extension has happened.
		// def : Pat<(i64 (zext GPR32:$src)),
		// (SUBREG_TO_REG (i32 0), (ORRWrs WZR, GPR32:$src, 0), sub_32)>;
		//
		// The def32 checks the $src needs explicitly zero-extend. However, it can
		// not check the $src in other block and it adds ORRWrs conservatively in
		// this case. This peephole optimization checks ORRWrs is for redundant
		// zero-extend and try to remove it.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64ExpandImm.h"		#include "AArch64ExpandImm.h"
#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
Show All 13 Lines	struct AArch64MIPeepholeOpt : public MachineFunctionPass {

const AArch64InstrInfo *TII;		const AArch64InstrInfo *TII;
MachineLoopInfo *MLI;		MachineLoopInfo *MLI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;

template <typename T>		template <typename T>
bool visitAND(MachineInstr &MI,		bool visitAND(MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
		bool visitORR(MachineInstr &MI,
		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "AArch64 MI Peephole Optimization pass";		return "AArch64 MI Peephole Optimization pass";
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	bool AArch64MIPeepholeOpt::visitAND(
ToBeRemoved.insert(&MI);		ToBeRemoved.insert(&MI);
if (SubregToRegMI)		if (SubregToRegMI)
ToBeRemoved.insert(SubregToRegMI);		ToBeRemoved.insert(SubregToRegMI);
ToBeRemoved.insert(MovMI);		ToBeRemoved.insert(MovMI);

return true;		return true;
}		}

		bool AArch64MIPeepholeOpt::visitORR(
		MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved) {
		// Check this ORR comes from below zero-extend pattern.
		//
		// def : Pat<(i64 (zext GPR32:$src)),
		// (SUBREG_TO_REG (i32 0), (ORRWrs WZR, GPR32:$src, 0), sub_32)>;
		dmgreenUnsubmitted Not Done Reply Inline Actions Do we need to check for WZR? dmgreen: Do we need to check for WZR?
		jaykang10AuthorUnsubmitted Done Reply Inline Actions You are right! We need to check it because there is below assembly pattern. Let me update the code. def : InstAlias<"mov $dst, $src", (ORRWrs GPR32:$dst, WZR, GPR32:$src, 0), 2>; jaykang10: You are right! We need to check it because there is below assembly pattern. Let me update the…
		if (MI.getOperand(3).getImm() != 0)
		return false;

		if (MI.getOperand(1).getReg() != AArch64::WZR)
		return false;

		dmgreenUnsubmitted Not Done Reply Inline Actions Do we need to avoid when they are in the same block? It would be easier for testing if we didn't. dmgreen: Do we need to avoid when they are in the same block? It would be easier for testing if we…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions If the pattern is in same block, I thought it has already been handled by existing pattern with `isDef32` and we do not need to check it. def def32 : PatLeaf<(i32 GPR32:$src), [{ return isDef32(N); }]>; // In the case of a 32-bit def that is known to implicitly zero-extend, // we can use a SUBREG_TO_REG. def : Pat<(i64 (zext def32:$src)), (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>; jaykang10:* If the pattern is in same block, I thought it has already been handled by existing pattern with…
		dmgreenUnsubmitted Not Done Reply Inline Actions But, is that needed for this pass? I can see it is how DAG ISel works, but if there are this pattern of code in the same block, it should still work fine, shouldn't it? dmgreen: But, is that needed for this pass? I can see it is how DAG ISel works, but if there are this…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I agree with you. Let me remove the check for different block. jaykang10: Yep, I agree with you. Let me remove the check for different block.
		efriedmaUnsubmitted Not Done Reply Inline Actions Missed update? efriedma: Missed update?
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry... let me remove it. jaykang10: Sorry... let me remove it.
		MachineInstr *SrcMI = MRI->getUniqueVRegDef(MI.getOperand(2).getReg());
		if (!SrcMI)
		return false;

		// From https://developer.arm.com/documentation/dui0801/b/BABBGCAC
		//
		// When you use the 32-bit form of an instruction, the upper 32 bits of the
		// source registers are ignored and the upper 32 bits of the destination
		// register are set to zero.
		//
		// If AArch64's 32-bit form of instruction defines the source operand of
		// zero-extend, we do not need the zero-extend. Let's check the MI's opcode is
		// real AArch64 instruction and if it is not, do not process the opcode
		dmgreenUnsubmitted Not Done Reply Inline Actions Why do we rule these out? Why don't we rule out anything else? dmgreen: Why do we rule these out? Why don't we rule out anything else?
		jaykang10AuthorUnsubmitted Done Reply Inline Actions I was not sure it is good to keep track of the operands of PHI and COPY in this patch... because it could make code complicated... for example, checking cycled phi. If possible, I would like to solve it in separate patch... jaykang10: I was not sure it is good to keep track of the operands of PHI and COPY in this patch...
		// conservatively.
		if (!TII->isRealAArch64Inst(SrcMI->getOpcode()))
		return false;

		Register DefReg = MI.getOperand(0).getReg();
		dmgreenUnsubmitted Not Done Reply Inline Actions Can we add a dbgs() output explaining what was removed? dmgreen: Can we add a dbgs() output explaining what was removed?
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me add debug output. jaykang10: Yep, let me add debug output.
		MRI->replaceRegWith(DefReg, MI.getOperand(2).getReg());
		dmgreenUnsubmitted Not Done Reply Inline Actions -> definition dmgreen: -> definition
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry... let me update it. jaykang10: Sorry... let me update it.
		// replaceRegWith changes MI's definition register. Keep it for SSA form until
		// deleting MI.
		MI.getOperand(0).setReg(DefReg);
		ToBeRemoved.insert(&MI);

		return true;
		}

bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {		bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());		TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();

Show All 9 Lines	for (MachineInstr &MI : MBB) {
default:		default:
break;		break;
case AArch64::ANDWrr:		case AArch64::ANDWrr:
Changed = visitAND<uint32_t>(MI, ToBeRemoved);		Changed = visitAND<uint32_t>(MI, ToBeRemoved);
break;		break;
case AArch64::ANDXrr:		case AArch64::ANDXrr:
Changed = visitAND<uint64_t>(MI, ToBeRemoved);		Changed = visitAND<uint64_t>(MI, ToBeRemoved);
break;		break;
		case AArch64::ORRWrs:
		Changed = visitORR(MI, ToBeRemoved);
}		}
}		}
}		}

for (MachineInstr *MI : ToBeRemoved)		for (MachineInstr *MI : ToBeRemoved)
MI->eraseFromParent();		MI->eraseFromParent();

return Changed;		return Changed;
}		}

FunctionPass *llvm::createAArch64MIPeepholeOptPass() {		FunctionPass *llvm::createAArch64MIPeepholeOptPass() {
return new AArch64MIPeepholeOpt();		return new AArch64MIPeepholeOpt();
}		}

llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll

Show All 31 Lines	entry:
%conv.i = sext i8 %a to i32		%conv.i = sext i8 %a to i32
%cmp = icmp eq i32 %n, 0		%cmp = icmp eq i32 %n, 0
br i1 %cmp, label %if.then, label %if.end		br i1 %cmp, label %if.then, label %if.end

if.then: ; preds = %entry		if.then: ; preds = %entry
%conv1 = zext i32 %conv.i to i64		%conv1 = zext i32 %conv.i to i64
%div = udiv i64 2036854775807, %conv1		%div = udiv i64 2036854775807, %conv1
br label %if.end		br label %if.end
; CHECK: // %if.then
dmgreenUnsubmitted Not Done Reply Inline Actions This looks like the kind of test we could use update_llc_test_checks on. dmgreen: This looks like the kind of test we could use update_llc_test_checks on.
jaykang10AuthorUnsubmitted Done Reply Inline Actions It looked it is pre-commit test with NFC tag. Let me update the expected output with update_llc_test_checks. jaykang10: It looked it is pre-commit test with NFC tag. Let me update the expected output with…
; CHECK: mov w{{[0-9]+}}, w{{[0-9]+}}
; CHECK: udiv x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}

if.end: ; preds = %if.then, %entry		if.end: ; preds = %if.then, %entry
%i1 = phi i64 [ %div, %if.then ], [ 0, %entry ]		%i1 = phi i64 [ %div, %if.then ], [ 0, %entry ]
%call1.i = tail call i32 @test1(i64 %i1)		%call1.i = tail call i32 @test1(i64 %i1)
ret i32 0		ret i32 0
}		}

llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O3 -mtriple=aarch64-linux-gnu < %s \| FileCheck %s

				define i32 @test(i32 %input, i32 %n, i32 %a) {
				; CHECK-LABEL: test:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: cbz w1, .LBB0_2
				; CHECK-NEXT: // %bb.1:
				; CHECK-NEXT: mov w0, wzr
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_2: // %bb.0
				; CHECK-NEXT: add w8, w0, w1
				; CHECK-NEXT: mov w0, #100
				; CHECK-NEXT: cmp w8, #4
				; CHECK-NEXT: b.hi .LBB0_5
				; CHECK-NEXT: // %bb.3: // %bb.0
				; CHECK-NEXT: adrp x9, .LJTI0_0
				; CHECK-NEXT: add x9, x9, :lo12:.LJTI0_0
				; CHECK-NEXT: adr x10, .LBB0_4
				; CHECK-NEXT: ldrb w11, [x9, x8]
				; CHECK-NEXT: add x10, x10, x11, lsl #2
				; CHECK-NEXT: br x10
				; CHECK-NEXT: .LBB0_4: // %sw.bb
				; CHECK-NEXT: add w0, w2, #1
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_5: // %bb.0
				; CHECK-NEXT: cmp w8, #200
				; CHECK-NEXT: b.ne .LBB0_10
				; CHECK-NEXT: // %bb.6: // %sw.bb7
				; CHECK-NEXT: add w0, w2, #7
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_7: // %sw.bb1
				; CHECK-NEXT: add w0, w2, #3
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_8: // %sw.bb3
				; CHECK-NEXT: add w0, w2, #4
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_9: // %sw.bb5
				; CHECK-NEXT: add w0, w2, #5
				; CHECK-NEXT: .LBB0_10: // %return
				; CHECK-NEXT: ret
				entry:
				%b = add nsw i32 %input, %n
				%cmp = icmp eq i32 %n, 0
				br i1 %cmp, label %bb.0, label %return

				bb.0:
				switch i32 %b, label %return [
				i32 0, label %sw.bb
				i32 1, label %sw.bb1
				i32 2, label %sw.bb3
				i32 4, label %sw.bb5
				i32 200, label %sw.bb7
				]

				sw.bb:
				%add = add nsw i32 %a, 1
				br label %return

				sw.bb1:
				%add2 = add nsw i32 %a, 3
				br label %return

				sw.bb3:
				%add4 = add nsw i32 %a, 4
				br label %return

				sw.bb5:
				%add6 = add nsw i32 %a, 5
				br label %return

				sw.bb7:
				%add8 = add nsw i32 %a, 7
				br label %return

				return:
				%retval.0 = phi i32 [ %add8, %sw.bb7 ], [ %add6, %sw.bb5 ], [ %add4, %sw.bb3 ], [ %add2, %sw.bb1 ], [ %add, %sw.bb ], [ 100, %bb.0 ], [ 0, %entry ]
				ret i32 %retval.0
				}

llvm/test/CodeGen/AArch64/redundant-orrwrs-from-zero-extend.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -run-pass aarch64-mi-peephole-opt -verify-machineinstrs -o - %s \| FileCheck %s
				---
				name: test1
				# CHECK-LABEL: name: test1
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr32 }
				- { id: 1, class: gpr32 }
				- { id: 2, class: gpr32 }
				- { id: 3, class: gpr32 }
				- { id: 4, class: gpr64 }
				body: \|
				bb.0:
				liveins: $w0, $w1

				%0:gpr32 = COPY $w0
				%1:gpr32 = COPY $w1
				B %bb.1

				bb.1:
				%2:gpr32 = nsw ADDWrr %0, %1
				B %bb.2

				bb.2:
				; CHECK-LABEL: bb.2:
				; CHECK-NOT: %3:gpr32 = ORRWrs $wzr, %2, 0
				; The ORRWrs should be removed.
				%3:gpr32 = ORRWrs $wzr, %2, 0
				%4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32
				B %bb.3

				bb.3:
				$x0 = COPY %4
				RET_ReallyLR implicit $x0
				...
				---
				name: test2
				# CHECK-LABEL: name: test2
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64 }
				- { id: 1, class: gpr32 }
				- { id: 2, class: gpr32 }
				- { id: 3, class: gpr64 }
				body: \|
				bb.0:
				liveins: $x0

				%0:gpr64 = COPY $x0
				B %bb.1

				bb.1:
				%1:gpr32 = EXTRACT_SUBREG %0, %subreg.sub_32
				B %bb.2

				bb.2:
				; CHECK-LABEL: bb.2:
				; CHECK: %2:gpr32 = ORRWrs $wzr, %1, 0
				; The ORRWrs should not be removed.
				%2:gpr32 = ORRWrs $wzr, %1, 0
				%3:gpr64 = SUBREG_TO_REG 0, %2, %subreg.sub_32
				B %bb.3

				bb.3:
				$x0 = COPY %3
				RET_ReallyLR implicit $x0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Remove redundant ORRWrs which is generated by zero-extendClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 380439

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.h

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp

llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll

llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll

llvm/test/CodeGen/AArch64/redundant-orrwrs-from-zero-extend.mir

[AArch64] Remove redundant ORRWrs which is generated by zero-extend
ClosedPublic