This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Coalesce Copy Zero during instruction selection
ClosedPublic

Authored by haicheng on Jul 31 2017, 12:31 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
gberry
MatzeB
qcolombet
rengolin
javed.absar
sebpop
kristof.beyls

Commits

rGaed6e52b3c3f: [AArch64] Coalesce Copy Zero during instruction selection
rG50692a203c98: [AArch64] Fix a typo in isExtFreeImpl()
rL325459: [AArch64] Coalesce Copy Zero during instruction selection

Summary

Now, move constant zero is lowered into two MIRs after instruction selection

v1 = copy wzr/xzr
v2 = copy v1

These two copies are coalesced in a later pass.

One problem of this is in Machine-Sink pass which runs before the copy propogation pass. Machine-sink can break a critical edge if at least two cheap MIRs can be sinked to that path. Thus, we may have a MBB which has only one mov wzr/xzr instruction. This can make block placement difficult to do the layout. For example, the test case below, copy-zero-reg.ll, has a loop unrolled by two. Sinking the mov wzr/xzr makes it impossible to find a fallthrough for every MBB and the currently generated code has a block looks like this

// BB#1:
	mov	 w9, wzr
	cbnz	w8, .LBB0_5
	b	.LBB0_6

This patch coalesce two COPYs during instruction selection. Below is the performance impacted by this patch

spec2000/vpr	+1.4%
spec2006/libquantum	+4.9%
spec2006/perlbench	+1.2%
spec2017/blender	+3.5%
spec2017/deepsjeng	-1.1%

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng created this revision.Jul 31 2017, 12:31 PM

Herald added subscribers: kristof.beyls, javed.absar, mcrosier and 2 others. · View Herald TranscriptJul 31 2017, 12:31 PM

haicheng edited the summary of this revision. (Show Details)Jul 31 2017, 12:31 PM

haicheng edited the summary of this revision. (Show Details)

haicheng edited the summary of this revision. (Show Details)Jul 31 2017, 4:12 PM

haicheng added a subscriber: llvm-commits.Aug 1 2017, 1:41 PM

Closed by commit rL309748: [AArch64] Fix a typo in isExtFreeImpl() (authored by haicheng). · Explain WhyAug 1 2017, 2:27 PM

This revision was automatically updated to reflect the committed changes.

Sorry. This patch is not committed. I made a mistake when committing another patch D36174.

Upload the correct patch. Please take a look. Sorry for the confusion.

haicheng removed a commit: rL309748: [AArch64] Fix a typo in isExtFreeImpl().Aug 1 2017, 5:51 PM

gberry added inline comments.Aug 1 2017, 6:18 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2769	Typo: 'conatants' -> 'constants'
test/CodeGen/AArch64/arm64-addr-type-promotion.ll
31	Is this a regression?

haicheng added inline comments.Aug 1 2017, 7:31 PM

test/CodeGen/AArch64/arm64-addr-type-promotion.ll
31	This corresponds to block if.end25. Do you think it is better to create a new block to sink mov wzr here? I think neither sinking or not is a clear win since it depends on which path the control flow takes. Also, the cost of mov wzr can be as cheap as zero. So, I am bias to keep the current CFG.

gberry added inline comments.Aug 2 2017, 1:15 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	Would it not make sense to replace any use of constant 0 with wzr/xzr when it is legal?
test/CodeGen/AArch64/arm64-addr-type-promotion.ll
31	That seems fine, it just wasn't clear from the CHECK diffs if this was a new 'mov'

haicheng added inline comments.Aug 2 2017, 1:43 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	I agree with you. I started with replacing any constant 0 with wzr/xzr, but triggered a lot of assertions. For example, wzr/xzr is not expected to appear as the condition of a conditional branch. Then, I narrowed down to CopyToReg only, but still triggered some assertions when copying wzr/xzr to another physical register. Now, I narrow down to the most common situation, copying constant zero to a virtual reg, no assertion is trigged and performance looks good.

haicheng added inline comments.Aug 9 2017, 12:18 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	And I check if all the uses are copies to virtual regs and call RAUW to replace because SDUse::set() is a private method. Do you think it is worthwhile to change the API to be public? It does not make big difference to the performance since most of the time constant 0 has only one use.

Fix a typo. Please take a look. Thank you.

gberry added inline comments.Aug 9 2017, 2:10 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	Yeah, that's why I said "when it is legal" :) It made be too much of a pain to determine legality here, but what about handling cases where only some of the uses are virt reg copies (and just using xzr/wzr directly in those cases)?
2768	No, I'm not suggesting you change SDUse (that is private for a reason). If you want to do this experiment you would need to replace the users themselves with new nodes that read XZR/WZR directly. I think either approach is okay here, but you should probably wait for someone else to approve it.

haicheng added inline comments.Aug 11 2017, 7:21 PM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	I tried and it seems I cannot replace the users themselves with new nodes that read XZR/WZR directly. The new node I can create here is CopyFromReg and its type is MVT::i32/i64. The user is CopyToReg and its type is MVT::Other. The types do not match and I cannot do the replacement. Another potential issues is that the users can be used in the other MBB because it is lowered from a PHINode, but use_iterator seems not include this usage. I think I would stick to the current approach which just replace #0 with XZR/WZR because it is simple and conservative but covers the most common situations. If I miss any coalescing opportunities, they would be coalesced anyway in the later pass.

gberry added inline comments.Aug 14 2017, 11:38 AM

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2768	I'm not sure I fully understand the problem you're running into trying to do this forward substitution on only some uses. If you wanted to try that you would leave this node alone and instead replace the use nodes that are vreg COPYs that are reading the COPY of XZR/WZR.

Something like the below is what I meant by transforming the uses instead. The FIXME comment needs to be addressed, but I think it should do what you want and catch the cases where not all uses are vreg COPYs:

In AArch64DAGToDAGISel::Select, instead of chaning the Constant case, add the following:

case ISD::CopyToReg: {
  // Special case for copy of zero to avoid a double copy.
  // FIXME: check dst is virt reg and regclass is okay
  SDNode *CopyVal = Node->getOperand(2).getNode();
  if (ConstantSDNode *CopyValConst = dyn_cast<ConstantSDNode>(CopyVal))
    if (CopyValConst->isNullValue()) {
      unsigned ZeroReg;
      EVT ZeroVT = CopyValConst->getValueType(0);
      if (ZeroVT == MVT::i32)
        ZeroReg = AArch64::WZR;
      else if (ZeroVT == MVT::i64)
        ZeroReg = AArch64::XZR;
      else
        break;

      SDValue ZeroRegVal = CurDAG->getRegister(ZeroReg, ZeroVT);
      SDValue New = CurDAG->getNode(ISD::CopyToReg, SDLoc(Node), MVT::Other,
                                    Node->getOperand(0), Node->getOperand(1),
                                    ZeroRegVal);
      ReplaceNode(Node, New.getNode());
      return;
    }
  break;
}

Use Geoff's code. Please take a look.

Bail out if the copy is used with glue. Please take a look.

Kindly ping

evandro added a subscriber: evandro.Sep 27 2017, 12:52 PM

Kindly Ping (#2)

Someone else should probably approve this since I wrote some of the code.

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2788	Style nit: you can reduce the level of indentation below by inverting these if tests and doing an early break after each one.
2794	This could use a comment explaining why it is being rejected.
test/CodeGen/AArch64/copy-zero-reg.ll
4	This description seems a little vague. Can you spell out which block you don't want the mov to appear in? Also, it seems like you might want a negative CHECK-NOT below to make sure there isn't another mov?

Add the support of "glue".

haicheng marked an inline comment as not done.Oct 9 2017, 11:17 AM

haicheng marked an inline comment as done.Oct 20 2017, 5:49 AM

Kindly Ping

This looks pretty good to me (with one minor comment), but someone else should probably approve it.

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2809	I think you can get rid of this if/else by re-writing as: SDValue New = CurDAG->getNode(ISD::CopyToReg, SDLoc(Node), Node->getVTList(), makeArrayRef(Ops, NumOperands));

Address Geoff's comments. Thank you.

Kindly Ping

Kindly Ping (#2)

Piggyback on @gberry's review, I can't see anything wrong with it, LGTM. Thanks!

This revision is now accepted and ready to land.Nov 10 2017, 10:31 PM

Closed by commit rL325459: [AArch64] Coalesce Copy Zero during instruction selection (authored by haicheng). · Explain WhyFeb 18 2018, 5:54 AM

This revision was automatically updated to reflect the committed changes.

I am planning to revert this change. This works with tests like test/CodeGen/AArch64/copy-zero-reg.ll. However, if there are multiple branches, this patch degrades in performance due to large number of mov instructions in each fall through. Here's an example of test case, where it degrades in performance.

target triple = "aarch64-none-linux-gnu"

%struct.foo = type {i8* }

; Function Attrs: nounwind
define void @func(i32* nocapture readonly %initval, i32* nocapture readonly %compptr, i16* nocapture readonly %coef_block) local_unnamed_addr {
entry:
  br label %for.body

for.body:                                         ; preds = %for.inc, %entry
  %inptr.0106 = phi i16* [ %coef_block, %entry ], [ %inptr.1, %for.inc ]
  %ctr.0105 = phi i32 [ 8, %entry ], [ %dec, %for.inc ]
  %qptr.0104 = phi i32* [ %compptr, %entry ], [ %qptr.1, %for.inc ]
  %wsptr.0103 = phi i32* [ %initval, %entry ], [ %wsptr.1, %for.inc ]
  %arrayidx = getelementptr inbounds i16, i16* %inptr.0106, i64 8
  %0 = load i16, i16* %arrayidx, align 2
  %arrayidx3 = getelementptr inbounds i16, i16* %inptr.0106, i64 16
  %1 = load i16, i16* %arrayidx3, align 2
  %2 = or i16 %1, %0
  %3 = icmp eq i16 %2, 0
  br i1 %3, label %land.lhs.true7, label %if.end

land.lhs.true7:                                   ; preds = %for.body
  %arrayidx8 = getelementptr inbounds i16, i16* %inptr.0106, i64 24
  %true7 = load i16, i16* %arrayidx8, align 2
  %cmp10 = icmp eq i16 %true7, 0
  br i1 %cmp10, label %land.lhs.true12, label %if.end

land.lhs.true12:                                  ; preds = %land.lhs.true7
  %arrayidx13 = getelementptr inbounds i16, i16* %inptr.0106, i64 32
  %true12 = load i16, i16* %arrayidx13, align 2
  %cmp15 = icmp eq i16 %true12, 0
  br i1 %cmp15, label %land.lhs.true17, label %if.end

land.lhs.true17:                                  ; preds = %land.lhs.true12
  %arrayidx18 = getelementptr inbounds i16, i16* %inptr.0106, i64 40
  %true17 = load i16, i16* %arrayidx18, align 2
  %cmp20 = icmp eq i16 %true17, 0
  br i1 %cmp20, label %land.lhs.true22, label %if.end

land.lhs.true22:                                  ; preds = %land.lhs.true17
  %arrayidx23 = getelementptr inbounds i16, i16* %inptr.0106, i64 48
  %true22 = load i16, i16* %arrayidx23, align 2
  %cmp25 = icmp eq i16 %true22, 0
  br i1 %cmp25, label %land.lhs.true27, label %if.end

land.lhs.true27:                                  ; preds = %land.lhs.true22
  %arrayidx28 = getelementptr inbounds i16, i16* %inptr.0106, i64 56
  %true27 = load i16, i16* %arrayidx28, align 2
  %cmp30 = icmp eq i16 %true27, 0
  br i1 %cmp30, label %if.then, label %if.end

if.then:                                          ; preds = %land.lhs.true27
  %4 = load i16, i16* %inptr.0106, align 2
  %conv33 = sext i16 %4 to i32
  %5 = load i32, i32* %qptr.0104, align 4
  %mul = shl nsw i32 %conv33, 2
  %shl = mul i32 %mul, %5
  store i32 %shl, i32* %wsptr.0103, align 4
  br label %for.inc

if.end:                                           ; preds = %land.lhs.true27, %land.lhs.true22, %land.lhs.true17, %land.lhs.true12, %land.lhs.true7, %for.body
  %6 = phi i16 [ 0, %land.lhs.true27 ], [ 0, %land.lhs.true22 ], [ 0, %land.lhs.true17 ], [ 0, %land.lhs.true12 ], [ 0, %land.lhs.true7 ], [ %1, %for.body ]
  %conv40 = sext i16 %6 to i32
  %arrayidx41 = getelementptr inbounds i32, i32* %qptr.0104, i64 16
  %7 = load i32, i32* %arrayidx41, align 4
  %mul42 = mul nsw i32 %7, %conv40
  store i32 %mul42, i32* %wsptr.0103, align 4
  br label %for.inc

for.inc:                                          ; preds = %if.end, %if.then
  %conv54.sink = phi i32 [ %mul42, %if.end ], [ %shl, %if.then ]
  %arrayidx55 = getelementptr inbounds i32, i32* %wsptr.0103, i64 56
  store i32 %conv54.sink, i32* %arrayidx55, align 4
  %wsptr.1 = getelementptr inbounds i32, i32* %wsptr.0103, i64 1
  %qptr.1 = getelementptr inbounds i32, i32* %qptr.0104, i64 1
  %inptr.1 = getelementptr inbounds i16, i16* %inptr.0106, i64 1
  %dec = add nsw i32 %ctr.0105, -1
  %cmp = icmp ugt i32 %ctr.0105, 1
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.inc
  ret void
}

This is slightly smaller version of libjpeg code.

The intention of this patch is to remove a BB that has one mov instruction. In order to do that, this patch pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction, like in copy-reg-zero.ll

If we have multiple fall throughs like in above test case, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one mov instruction, we should not pessimize Machine Sinking at all, and find some other solution.

This revision is now accepted and ready to land.Jun 20 2018, 1:43 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 20 2018, 1:43 PM

Herald added a subscriber: dmgreen. · View Herald Transcript

SirishP added reviewers: sebpop, kristof.beyls.Jun 20 2018, 1:44 PM

Closed by commit rGaed6e52b3c3f: [AArch64] Coalesce Copy Zero during instruction selection (authored by haicheng). · Explain WhyOct 7 2019, 4:46 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 4:46 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

35 lines

test/

CodeGen/

AArch64/

arm64-addr-type-promotion.ll

1 line

arm64-cse.ll

2 lines

copy-zero-reg.ll

47 lines

i128-fast-isel-fallback.ll

2 lines

Diff 118145

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,759 Lines • ▼ Show 20 Lines	case ISD::EXTRACT_VECTOR_ELT: {
DEBUG(dbgs() << "ISEL: Custom selection!\n=> ");		DEBUG(dbgs() << "ISEL: Custom selection!\n=> ");
DEBUG(Extract->dumpr(CurDAG));		DEBUG(Extract->dumpr(CurDAG));
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");
ReplaceNode(Node, Extract.getNode());		ReplaceNode(Node, Extract.getNode());
return;		return;
}		}
case ISD::Constant: {		case ISD::Constant: {
// Materialize zero constants as copies from WZR/XZR. This allows		// Materialize zero constants as copies from WZR/XZR. This allows
// the coalescer to propagate these into other instructions.		// the coalescer to propagate these into other instructions.
		gberryUnsubmitted Not Done Reply Inline Actions Would it not make sense to replace any use of constant 0 with wzr/xzr when it is legal? gberry: Would it not make sense to replace any use of constant 0 with wzr/xzr when it is legal?
		haichengAuthorUnsubmitted Not Done Reply Inline Actions I agree with you. I started with replacing any constant 0 with wzr/xzr, but triggered a lot of assertions. For example, wzr/xzr is not expected to appear as the condition of a conditional branch. Then, I narrowed down to CopyToReg only, but still triggered some assertions when copying wzr/xzr to another physical register. Now, I narrow down to the most common situation, copying constant zero to a virtual reg, no assertion is trigged and performance looks good. haicheng: I agree with you. I started with replacing any constant 0 with wzr/xzr, but triggered a lot of…
		gberryUnsubmitted Not Done Reply Inline Actions Yeah, that's why I said "when it is legal" :) It made be too much of a pain to determine legality here, but what about handling cases where only some of the uses are virt reg copies (and just using xzr/wzr directly in those cases)? gberry: Yeah, that's why I said "when it is legal" :) It made be too much of a pain to determine…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions And I check if all the uses are copies to virtual regs and call RAUW to replace because SDUse::set() is a private method. Do you think it is worthwhile to change the API to be public? It does not make big difference to the performance since most of the time constant 0 has only one use. haicheng: And I check if all the uses are copies to virtual regs and call RAUW to replace because SDUse…
		gberryUnsubmitted Not Done Reply Inline Actions No, I'm not suggesting you change SDUse (that is private for a reason). If you want to do this experiment you would need to replace the users themselves with new nodes that read XZR/WZR directly. I think either approach is okay here, but you should probably wait for someone else to approve it. gberry: No, I'm not suggesting you change SDUse (that is private for a reason). If you want to do this…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions I tried and it seems I cannot replace the users themselves with new nodes that read XZR/WZR directly. The new node I can create here is CopyFromReg and its type is MVT::i32/i64. The user is CopyToReg and its type is MVT::Other. The types do not match and I cannot do the replacement. Another potential issues is that the users can be used in the other MBB because it is lowered from a PHINode, but use_iterator seems not include this usage. I think I would stick to the current approach which just replace #0 with XZR/WZR because it is simple and conservative but covers the most common situations. If I miss any coalescing opportunities, they would be coalesced anyway in the later pass. haicheng: I tried and it seems I cannot replace the users themselves with new nodes that read XZR/WZR…
		gberryUnsubmitted Not Done Reply Inline Actions I'm not sure I fully understand the problem you're running into trying to do this forward substitution on only some uses. If you wanted to try that you would leave this node alone and instead replace the use nodes that are vreg COPYs that are reading the COPY of XZR/WZR. gberry: I'm not sure I fully understand the problem you're running into trying to do this forward…
ConstantSDNode *ConstNode = cast<ConstantSDNode>(Node);		ConstantSDNode *ConstNode = cast<ConstantSDNode>(Node);
		gberryUnsubmitted Not Done Reply Inline Actions Typo: 'conatants' -> 'constants' gberry: Typo: 'conatants' -> 'constants'
if (ConstNode->isNullValue()) {		if (ConstNode->isNullValue()) {
if (VT == MVT::i32) {		if (VT == MVT::i32) {
SDValue New = CurDAG->getCopyFromReg(		SDValue New = CurDAG->getCopyFromReg(
CurDAG->getEntryNode(), SDLoc(Node), AArch64::WZR, MVT::i32);		CurDAG->getEntryNode(), SDLoc(Node), AArch64::WZR, MVT::i32);
ReplaceNode(Node, New.getNode());		ReplaceNode(Node, New.getNode());
return;		return;
} else if (VT == MVT::i64) {		} else if (VT == MVT::i64) {
SDValue New = CurDAG->getCopyFromReg(		SDValue New = CurDAG->getCopyFromReg(
CurDAG->getEntryNode(), SDLoc(Node), AArch64::XZR, MVT::i64);		CurDAG->getEntryNode(), SDLoc(Node), AArch64::XZR, MVT::i64);
ReplaceNode(Node, New.getNode());		ReplaceNode(Node, New.getNode());
return;		return;
}		}
}		}
break;		break;
}		}
		case ISD::CopyToReg: {
		// Special case for copy of zero to avoid a double copy.
		SDNode *CopyVal = Node->getOperand(2).getNode();
		ConstantSDNode *CopyValConst = dyn_cast<ConstantSDNode>(CopyVal);
		gberryUnsubmitted Done Reply Inline Actions Style nit: you can reduce the level of indentation below by inverting these if tests and doing an early break after each one. gberry: Style nit: you can reduce the level of indentation below by inverting these if tests and doing…
		if (!CopyValConst \|\| !CopyValConst->isNullValue())
		break;
		const SDValue &Dest = Node->getOperand(1);
		if (!TargetRegisterInfo::isVirtualRegister(
		cast<RegisterSDNode>(Dest)->getReg()))
		break;
		gberryUnsubmitted Not Done Reply Inline Actions This could use a comment explaining why it is being rejected. gberry: This could use a comment explaining why it is being rejected.
		unsigned ZeroReg;
		EVT ZeroVT = CopyValConst->getValueType(0);
		if (ZeroVT == MVT::i32)
		ZeroReg = AArch64::WZR;
		else if (ZeroVT == MVT::i64)
		ZeroReg = AArch64::XZR;
		else
		break;
		unsigned NumOperands = Node->getNumOperands();
		SDValue ZeroRegVal = CurDAG->getRegister(ZeroReg, ZeroVT);
		// Replace the source operand (#0) with ZeroRegVal.
		SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1), ZeroRegVal,
		(NumOperands == 4) ? Node->getOperand(3) : SDValue()};
		SDValue New;
		if (Node->getValueType(Node->getNumValues() - 1) == MVT::Glue)
		gberryUnsubmitted Not Done Reply Inline Actions I think you can get rid of this if/else by re-writing as: SDValue New = CurDAG->getNode(ISD::CopyToReg, SDLoc(Node), Node->getVTList(), makeArrayRef(Ops, NumOperands)); gberry: I think you can get rid of this if/else by re-writing as: ``` SDValue New = CurDAG->getNode…
		New =
		CurDAG->getNode(ISD::CopyToReg, SDLoc(Node), {MVT::Other, MVT::Glue},
		makeArrayRef(Ops, NumOperands));
		else
		New = CurDAG->getNode(ISD::CopyToReg, SDLoc(Node), MVT::Other,
		makeArrayRef(Ops, NumOperands));
		ReplaceNode(Node, New.getNode());
		return;
		}
case ISD::FrameIndex: {		case ISD::FrameIndex: {
// Selects to ADDXri FI, 0 which in turn will become ADDXri SP, imm.		// Selects to ADDXri FI, 0 which in turn will become ADDXri SP, imm.
int FI = cast<FrameIndexSDNode>(Node)->getIndex();		int FI = cast<FrameIndexSDNode>(Node)->getIndex();
unsigned Shifter = AArch64_AM::getShifterImm(AArch64_AM::LSL, 0);		unsigned Shifter = AArch64_AM::getShifterImm(AArch64_AM::LSL, 0);
const TargetLowering *TLI = getTargetLowering();		const TargetLowering *TLI = getTargetLowering();
SDValue TFI = CurDAG->getTargetFrameIndex(		SDValue TFI = CurDAG->getTargetFrameIndex(
FI, TLI->getPointerTy(CurDAG->getDataLayout()));		FI, TLI->getPointerTy(CurDAG->getDataLayout()));
SDLoc DL(Node);		SDLoc DL(Node);
▲ Show 20 Lines • Show All 1,229 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-addr-type-promotion.ll

	Show All 22 Lines
	; CHECK-NEXT: add [[BLOCKBASE1:x[0-9]+]], [[BLOCKBASE]], [[I1]]			; CHECK-NEXT: add [[BLOCKBASE1:x[0-9]+]], [[BLOCKBASE]], [[I1]]
	; CHECK-NEXT: ldrb [[LOADEDVAL1:w[0-9]+]], {{\[}}[[BLOCKBASE1]], #1]			; CHECK-NEXT: ldrb [[LOADEDVAL1:w[0-9]+]], {{\[}}[[BLOCKBASE1]], #1]
	; CHECK-NEXT: ldrb [[LOADEDVAL2:w[0-9]+]], {{\[}}[[BLOCKBASE2]], #1]			; CHECK-NEXT: ldrb [[LOADEDVAL2:w[0-9]+]], {{\[}}[[BLOCKBASE2]], #1]
	; CHECK-NEXT: cmp [[LOADEDVAL1]], [[LOADEDVAL2]]			; CHECK-NEXT: cmp [[LOADEDVAL1]], [[LOADEDVAL2]]
	; CHECK-NEXT: b.ne			; CHECK-NEXT: b.ne
	; Next BB			; Next BB
	; CHECK: ldrb [[LOADEDVAL3:w[0-9]+]], {{\[}}[[BLOCKBASE1]], #2]			; CHECK: ldrb [[LOADEDVAL3:w[0-9]+]], {{\[}}[[BLOCKBASE1]], #2]
	; CHECK-NEXT: ldrb [[LOADEDVAL4:w[0-9]+]], {{\[}}[[BLOCKBASE2]], #2]			; CHECK-NEXT: ldrb [[LOADEDVAL4:w[0-9]+]], {{\[}}[[BLOCKBASE2]], #2]
				; CHECK-NEXT: mov w0, wzr
				gberryUnsubmitted Not Done Reply Inline Actions Is this a regression? gberry: Is this a regression?
				haichengAuthorUnsubmitted Not Done Reply Inline Actions This corresponds to block if.end25. Do you think it is better to create a new block to sink mov wzr here? I think neither sinking or not is a clear win since it depends on which path the control flow takes. Also, the cost of mov wzr can be as cheap as zero. So, I am bias to keep the current CFG. haicheng: This corresponds to block if.end25. Do you think it is better to create a new block to sink…
				gberryUnsubmitted Not Done Reply Inline Actions That seems fine, it just wasn't clear from the CHECK diffs if this was a new 'mov' gberry: That seems fine, it just wasn't clear from the CHECK diffs if this was a new 'mov'
	; CHECK-NEXT: cmp [[LOADEDVAL3]], [[LOADEDVAL4]]			; CHECK-NEXT: cmp [[LOADEDVAL3]], [[LOADEDVAL4]]
	entry:			entry:
	%idxprom = sext i32 %i1 to i64			%idxprom = sext i32 %i1 to i64
	%tmp = load i8, i8* @block, align 8			%tmp = load i8, i8* @block, align 8
	%arrayidx = getelementptr inbounds i8, i8* %tmp, i64 %idxprom			%arrayidx = getelementptr inbounds i8, i8* %tmp, i64 %idxprom
	%tmp1 = load i8, i8* %arrayidx, align 1			%tmp1 = load i8, i8* %arrayidx, align 1
	%idxprom1 = sext i32 %i2 to i64			%idxprom1 = sext i32 %i2 to i64
	%arrayidx2 = getelementptr inbounds i8, i8* %tmp, i64 %idxprom1			%arrayidx2 = getelementptr inbounds i8, i8* %tmp, i64 %idxprom1
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-cse.ll

	; RUN: llc -O3 < %s -aarch64-enable-atomic-cfg-tidy=0 -aarch64-enable-gep-opt=false -verify-machineinstrs \| FileCheck %s			; RUN: llc -O3 < %s -aarch64-enable-atomic-cfg-tidy=0 -aarch64-enable-gep-opt=false -verify-machineinstrs \| FileCheck %s
	target triple = "arm64-apple-ios"			target triple = "arm64-apple-ios"

	; rdar://12462006			; rdar://12462006
	; CSE between "icmp reg reg" and "sub reg reg".			; CSE between "icmp reg reg" and "sub reg reg".
	; Both can be in the same basic block or in different basic blocks.			; Both can be in the same basic block or in different basic blocks.
	define i8* @t1(i8* %base, i32* nocapture %offset, i32 %size) nounwind {			define i8* @t1(i8* %base, i32* nocapture %offset, i32 %size) nounwind {
	entry:			entry:
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: subs			; CHECK: subs
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK-NOT: sub			; CHECK-NOT: sub
	; CHECK: b.ge			; CHECK: b.lt
	; CHECK: sub			; CHECK: sub
	; CHECK: sub			; CHECK: sub
	; CHECK-NOT: sub			; CHECK-NOT: sub
	; CHECK: ret			; CHECK: ret
	%0 = load i32, i32* %offset, align 4			%0 = load i32, i32* %offset, align 4
	%cmp = icmp slt i32 %0, %size			%cmp = icmp slt i32 %0, %size
	%s = sub nsw i32 %0, %size			%s = sub nsw i32 %0, %size
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %return, label %if.end
	Show All 38 Lines

test/CodeGen/AArch64/copy-zero-reg.ll

This file was added.

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \| FileCheck %s

				; Verify there is no tiny block having only one mov wzr instruction between for.body.lr.ph and sw.epilog.loopexit
				define void @unroll_by_2(i32 %trip_count, i32* %p) {
				gberryUnsubmitted Done Reply Inline Actions This description seems a little vague. Can you spell out which block you don't want the mov to appear in? Also, it seems like you might want a negative CHECK-NOT below to make sure there isn't another mov? gberry: This description seems a little vague. Can you spell out which block you don't want the mov to…
				; CHECK-LABEL: unroll_by_2
				; CHECK: // %for.body.lr.ph
				; CHECK: mov w{{[0-9]+}}, wzr
				; CHECK: b.eq
				; CHECK-NOT: mov w{{[0-9]+}}, wzr
				; CHECK: // %for.body.lr.ph.new
				; CHECK: // %for.body
				; CHECK: // %sw.epilog.loopexit
				; CHECK: // %for.body.epil
				; CHECK: // %exit
				; CHECK-NEXT: ret
				for.body.lr.ph:
				%xtraiter = and i32 %trip_count, 1
				%cmp = icmp eq i32 %trip_count, 1
				br i1 %cmp, label %sw.epilog.loopexit, label %for.body.lr.ph.new

				for.body.lr.ph.new:
				%unroll_iter = sub nsw i32 %trip_count, %xtraiter
				br label %for.body

				for.body:
				%indvars = phi i32 [ 0, %for.body.lr.ph.new ], [ %indvars.next, %for.body ]
				%niter = phi i32 [ %unroll_iter, %for.body.lr.ph.new ], [ %niter.nsub, %for.body ]
				%array = getelementptr inbounds i32, i32 * %p, i32 %indvars
				store i32 %niter, i32* %array
				%indvars.next = add i32 %indvars, 2
				%niter.nsub = add i32 %niter, -2
				%niter.ncmp = icmp eq i32 %niter.nsub, 0
				br i1 %niter.ncmp, label %sw.epilog.loopexit, label %for.body

				sw.epilog.loopexit:
				%indvars.unr = phi i32 [ 0, %for.body.lr.ph ], [ %indvars.next, %for.body ]
				%lcmp.mod = icmp eq i32 %xtraiter, 0
				br i1 %lcmp.mod, label %exit, label %for.body.epil

				for.body.epil:
				%array.epil = getelementptr inbounds i32, i32* %p, i32 %indvars.unr
				store i32 %indvars.unr, i32* %array.epil
				br label %exit

				exit:
				ret void
				}

test/CodeGen/AArch64/i128-fast-isel-fallback.ll

	; RUN: llc -O0 -mtriple=arm64-apple-ios7.0 -mcpu=generic < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=arm64-apple-ios7.0 -mcpu=generic < %s \| FileCheck %s

	; Function Attrs: nounwind ssp			; Function Attrs: nounwind ssp
	define void @test1() {			define void @test1() {
	%1 = sext i32 0 to i128			%1 = sext i32 0 to i128
	call void @test2(i128 %1)			call void @test2(i128 %1)
	ret void			ret void

	; The i128 is 0 so the we can test to make sure it is propogated into the x			; The i128 is 0 so the we can test to make sure it is propogated into the x
	; registers that make up the i128 pair			; registers that make up the i128 pair

	; CHECK: mov x0, xzr			; CHECK: mov x0, xzr
	; CHECK: mov x1, x0			; CHECK: mov x1, xzr
	; CHECK: bl _test2			; CHECK: bl _test2

	}			}

	declare void @test2(i128)			declare void @test2(i128)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Coalesce Copy Zero during instruction selectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 118145

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

test/CodeGen/AArch64/arm64-addr-type-promotion.ll

test/CodeGen/AArch64/arm64-cse.ll

test/CodeGen/AArch64/copy-zero-reg.ll

test/CodeGen/AArch64/i128-fast-isel-fallback.ll

[AArch64] Coalesce Copy Zero during instruction selection
ClosedPublic